OF ENCYCLOPEDIA ARTIFICIALINTELLIGENCE VOLUME1
BOARD EDITORIAL SaulAmarel of NewJersey TheStateUniversity Rutgers, NicholasFindler Arizona StateUniversitY
fohn McDermott lon UniversitY Carnegie-Mel lack Minker of MarYland UniversitY DonaldE. Walker Be l l C ommuni cati onsR esearch
David [. Waltz nesCorPoration Machi Thinking editor, BarbaraChernow Developmental
About the editor
E d i t o r - i n - C h i eS f tuorl C. Shop i r ob e g o n hi ste o ch i n g coreerof IndionoUniversity in 1972 oftereorningo BSof MITin 1966o n d o P h Din co mp u te r s c i e n c eo f t h e U n i v e r s i t o yf Wisconsinin 1971, He movedto SUNYof Buffoloin 1978where he iscurrentlyfulfprofessor ond choirmon of the Deportment of Computer Science.He is o member of the Associotionfor ComputingMochinery, the Associotionfor Computotionol Linguistics, the Instituteof Elect r i c o l o n d E l e c t r o n i cE s ngineers,the Societyfor the Study of ArtificiolIntelligence, ond the Societyfor the Interdisciplinory Studyof Mind. Hisreseorch interests include ortificiolintelligence,knowle d g e r e p r e s e n to ti o ni,n fe rence, end noturol-longuoge understonding.
ENCYCLOPEDIA OF ARTIFICIALINTELLIGENCE VOLUME1 StuortC. Shopiro, Editor-in-Chief Dovid Eckrofh, Monogingeditor Editor Developmental Editofial Services, George A. Vollosi, Chernow
Wiley-lntersciencePublication
fohn Wiley & Sons New York
/
Chichester /
Brisbane /
Toronto
/
Singapore
Copyright O 1987 by John Wiley & Sons,Inc. All rights reserved.Published simultaneouslyin canada. Reproduction or translation of any part of this work beyond that permitted by sections 102 or 10g of the 19z6 united States copyright Act without the permission of the copyright owner is unlawful. Requestsfor permission or further information should be addressedto the Permissions Department, John Wiley & Sons,fnc. Library of congress Cataloging in publication Data: Encyclopediaof artificial intelligence. "A Wiley-Intersciencepublication." 1. Artificial intelligence-Dictionaries. I. Shapiro, Stuart Charles. II. Eckroth, David. 006.3,03,2L Q335.E53 1987 86_26739 (set) ISBN 0-471-80748-6 (Vol. l) ISBN 0-471-62974-x Printed in the United States of America 109876543
i-!v**i- *" :F i" ;t: f:. l t r:.! j;' .....:....,,
i:::: ....J
L. r,r,. " -^t L.,...-r.,...i
i ::ti,3';.' V r, ;i"
EDITORIAL STAFF E d i t o r - i n - C h i e fS: t u a r tC . S h a p i r o Managing Editor: David Eckroth EditorialManager: Carole Schwager EditorialSupervisor:Robert Colden
ProductionManager: JenetMclver ProductionSupervisor:RoseAnn Campise P r o d u c t i o nA i d e : J e a nS p r a n g e r I n d e x e r :D i a n a W i t t
CONTRIBUTORS Marc Abrams,Universityof Maryland,Coltegepark, MD, coMpurER BruceW. Ballard,AT&T Bell Laboratories, Murray Hill, NJ, C9MpUTASYSTEMS T I O N A LL I N C U I S T I C S PhillipL. Ackerman,University of Minnesota, Minneapolis, MN, INTELLI- RananB. Banerji,St.Joseph'sUniversity,philadelphia,pA, CAME pLAyGENCE I N G ; M I N I M A XP R O C E D U R E sanjayaAddanki, IBM Corporation,yorktown Heights,Ny, coNNEC- StephenT. Barnard,SRI International, Menlo Park,CA, STEREO VISION TIONISM Harry G. Barrow, SchlumbergerPalo Alto Research,Palo Alto, CA, Gul Agha,Massachusetts Institute of Technology,Cambridge, MA, ACTOR MATCHINC FORMALISMS David R. Barstow,schlumberger-Doll, Ridgefiefd, cr, PROCRAMMINC Ashok K. Agrawala,Universityof Maryland,Collegepark, MD, coMASSISTANTS PUTERSYSTEMS MadeleineBates,Bolt, Beranek& Newman,Cambridge,MA, NATURALPhilip E. Agre, Massachusetts Instituteof Technology,Cambridge,MA, LANCUAGEINTERFACES CONTROLSTRUCTURES Antal K. Bejczy,Jet PropulsionLaboratory,Pasadena, CA, TELEOPERANarendraAhuja,University of lllinois,Urbana,lL, DOT-PATTERN ANALYTORS SIST ; E X T U RAEN A L Y S I S Robertc. Berwick,Massachusetts Instituteof Technology,Cambridge, MA, faniceS. Aikins,AION Corporation,PatoAlto, CA, ACENDA-BASED SySCRAMMAR,TRANSFORMATIONAL TEMS Alan w. Biermann,Duke University,Durham,NC, AUTOMATICpRoselim G. Akl, sRl International, Menlo park, cA, cHECKERS-PLAY|Nc G R A M M I N C PROCRAMS ThomasO. Binford, StanfordUniversity,Stanford,CA, CENERALIZED famesF. Allen, Universityof Rochester, Rochester, Ny, spEECHACTS C Y LN I D E RR E P R E S E N T A T I O N fonathanAllen, Massachusetts Instituteof Technology,Cambridge,MA, Roberto Bisiani, Carnegie-MeflonUniversity, pittsburgh, pA, BEAM S P E E CR HE C O C N I T I O N S;P E E C S HY N T H E S I S SEARCH PeterK. Aflen,University pA, MULTISEN- Piero P. Bonissone,General of Pennsyfvania, Philadelphia, Efectric, schenectady,Ny, REAS9N lNC, SORINTECRATION PLAUSIBLE sergiof . Alvarado,Universityof California,LosAngeles CA,scRlprs E. f. Briscoe,Universityof Lancaster, , cambridge,uK, spEECHuNDERSaulAmarel,Rutgers University, New Brunswick, NJ,PROBLEM SOLVINC S T A ND I N C charlesAmes,4g-BYaleAvenue,Eggertsville, Ny, MUSlc, Al lN christopherM. Brown, Universityof Rochester, Rochester, Ny, HoucH RobertA. Amsler,BellCommunications Research, Morristown,NJ,LITERTRANSFORM A T U R EO F A I BertramBruce, Bolt Beranek& Newman,Cambridge,MA, DlscouRSE DanaAngluin,YaleUniversity, New Haven,cT, lNDucTlvE INFERENCE UNDERSTANDING; CRAMMAR,CASE Kulbirs. Arora,stateUniversityof New york, Buffalo,Ny, BELLE; BoRIS; MauriceBruynoo8he,KatholiekeUniversiteit Leuven,Heverlee,Belgium, C A D U C E U SE; P I S T L E U ; R I S K OF;o L ; F R L ;M E R L I NM ; SMALAPR9P; BACKTRACKI NG; COROUTI N ES N O A H ; P A N D E M O N I U M ;P A R R Y ;P H R A N A N D P H R E D ;R o S I E ; RichardR. Burton,XeroxPaloAlto Research Center,PaloAlto,CA, CRAMSNIFFER MAR, SEMANTIC peqn-syJyqllE RuzenaBaicsy,Universityof p3, MULT|SEN?hil_a_d_elBNa, pA, LEARNfaime G. Carbonell,Carnegie-Mef lon University,Pittsburgh, -! \ soRTNTECRAION wlTl ;;r,r\\efN I N C , M A C H I N EN ; ATURAL.LANCUAC UEN D E R S T A N D I N C GUGGENHEIh,IM AMORTALLIBRARY MONMOUTTICOLLEGE WEST LONC BRAhICH, NEW JERSEY UTI6I
tvtONhnOUTH UNlVEffiSi]"f LrtsFrAt':i'.,'n bW.gTLONGBRAFICH,NJ ON64
Vi
CONTRIBUTORS
fohn Case, State Universityof New York, Buffalo, NY, RECURSION; T U R I N CM A C H I N E lN, MANIPULATORS RaymondCipra,PurdueUniversity,West Lafayette, Civil, Lisboa,Portugal, Nacionalde Engenharia HelderCoelho,Laboratorio CRAMMAR,DEFIN ITE-CLAUSE HaroldCohen,Universityof California,La Jolla,CA, ARTS,Al lN Amherst,MA, DISTRIBDaniel D. Corkill, Universityof Massachusetts, SOLVINC UTEDPROBLEM , CA, LISPMAMace Creeger,LISPMachinesCompany,Los Angeles CHINES fames L. Crowfey, Carnegie-MellonUniversity,Pittsburgh,PA, PATH AVOIDANCE PLANNINGAND OBSTACLE Richard E. Cullingford, Ceorgia Instituteof Technology,Atlanta, CA, SCRIPTS MA, LISPMACHINES GeorgeCuret,LISPMachinesCompany,Cambridge, G. R. Dattatreya,Universityof Maryland,CollegePark,MD, PATTERN RECOCNITION COMErnestDavis,New York University,New York, NY, REASONINC, MONSENSE EXLarryS. Davis,Universityof Maryland,CollegePark,MD, FEATURE TRACTION APDC, MILITARY, Washington, Laboratory, LauraDavis,NavalResearch IN PLICATIONS Martin Davis,New York University,New York, NY, CHURCH'STHESIS fohan de Kleer, Xerox Palo Alto ResearchCenter,Palo Alto, CA, BACKPHYSICS QUALITATIVE TRACKINC,DEPENDENCY.DIRECTED; Congress, States Assessment-United Technology of Office Dray, fim OF Al Washington , DC, SOCIALISSUES GavanDuffy, Universityof Texas,Austin,TX, HERMENEUTICS MichaelG. Dyer, Universityof California,LosAngeles,CA, SCRIPTS George \ry. Ernst, Case-WesternReserveUniversity,Cleveland,OH, M E A N S - E N DASN A L Y S I S MA, CONCambridge, ThinkingMachinesCorporation, Carl R. Feynman, N E C T I O NM A C H I N E VISION Menlo Park,CA, STEREO Martin A. Fischler,SRIInternational, MILITARY, VA, Mclean, Corporation, Research Planning Franklin, fude IN APPLICATIONS Peter \ry. Frey, NorthwesternUniversity, Evanston,lL, HORIZON
Lawrencef. Henschen,NorthwesternUniversity,Evanston,lL, INFERPROVINC E N C ER ; E A S O N I N CT;H E O R E M Instituteof Technology,Cambridge,MA, ACCarl Hewitt, Massachusetts TOR FORMALISMS MA, Cambridge, Instituteof Technology, EllenC. Hildreth,Massachusetts OPTICALFLOW EDGEDETECTION; fane C. Hill, SmithCollege,Northampton,MA, LANCUAGEACQUISITION Murray Hill, NJ, DEEPSTRUCDonald Hindle, AT&T Bell Laboratories, T UR E University,Pittsburgh,PA, BOLTZGeoffrey Hinton, Carnegie-Mellon MANN MACHINE GraemeHirst, Universityof Toronto,Toronto,Ontario,SEMANTICS C. f . Hogger,Universityof London,London,UK, LOCICPROCRAMMINC SpringHouse,PA, CHEMISBruceA. Hohne,Rohmand HaasCompdfrY, TRY,AI IN famesHollenb€rg,New EnglandMedicalCenter,Boston,MA, DECISION THEORY Keith f. Holyoak,Universityof California,LosAngeles,CA, COGNITiVE PSYCHOLOCY of lllinois,Urbana,lL, MOTIONANALYSIS ThomasS. Huang,University fonathanf. Hull, StateUniversityof New York, Buffalo,NY, CHARACTER
RECOCNITION Instituteof Technology,Carnbridge,MA, RogerHurwitz, Massachusetts HERMENEUTICS DC, HUMANWashington, Laboratory, Robertf . K. facob,NavalResearch INTERACTION COMPUTER Murray Hill, NJ, COMPUTAMark A. fones, AT&T Bell Laboratories, T I O N A LL I N C U I S T I C S Philadelphia,PA, CRAMAravind K. foshi, Universityof Pennsylvania, MAR, PHRASE-STRUCTURE University,Pittsburgh,PA, COLORVlTakeoKanade,Carnegie-Mellon SION LaveenN. Kanal,Universityof Maryland,CollegePark,MD, PATTERN RECOGNITION lNG. P. Kearsley,ParkRow Software,La Jolla,CA, COMPUTER-AIDED S T R U C T I O NI N , TELLIGENT NY, REPInstituteof Technology,Rochester, RobertP. Keough,Rochester RE-FRAME WI RESENTATION, EFFECT Instituteof Technology,Cambridge,MA, Samuelf . Keyser,Massachusetts RichardP. Gabriel,Lucid,Inc., Menlo Park,CA, LISP P H O N E M E S APPLICATIONS LAW CA, Annev.d.L.Gardner,286SelbyLane,Atherton, Ann Arbor, Ml, COGNITIVE ScottR. Garrigttr,LehighUniversity,Bethlehem,PA, ROBOTS,ANTHRO- David E. Kieras, UniversitYof Michigan, M O D E L I N G POMORPHIC New Haven,CT, ROBOT-CONCeraldGazdar,Universityof Sussex,Brighton,UK, GRAMMAR,CENER- Daniel E. Koditschek,Yale University, SYSTEMS TROL RE ALIZEDPHRASESTRUCTU HEULosAngeles,CA,SEARCH; of California, TAKER; RichardE. Korf,University lamesGeller,StateUniversityof New York, Buffalo,NY, ADVICE RISTICS OPS'5; LOCO; MICROPLANNER; INTELLECT; ELIZA;EPAM;HACKER; STU. Kimmo Koskenniemi,Universityof Helsinki, Helsinki, Finland,MORSNOBAL-4; SNCPS; SHRDLU;SIMULA;SMALLTALK; SCHOLAR; PHOLOCY DENT A. Kowalski,Universityof London, London, UK, LOCIC PROPATTERN Robert MN, Minneapolis, Minnesota, of Maria L. Gini, University CRAMMINC REDUCTION MATCHINC;PROBLEM of Toronto,Toronto,Ontario,REPRESENTARichardD. Greenblatt,LISPMachinesCompany,Cambridge,MA, LISP BryanM. Kramer,University K N O W L E D C E TION, MACHINES CAUSAL of Texas,Austin,TX, REASONING, David D. Grossman,IBM Corporation,YorktownHeights,NY, AUTOMA- BenjaminKuipers,University CasimirA. Kulikowski,RutgersUniversity,New Brunswick,NJ, DOMAIN TION,INDUSTRIAL HarrisonHall, Universityof Delaware,Newark,DE, PHENOMENOLOCY K N O W L E D G E of Texas,Austin,TX, SEARCH,BRANCH-ANDShoshanaL. Hardt, StateUniversityof New York, Buffalo,NY, CONCEP- vipin Kumar,University D E PTH-FIRST S E A R C H , B O U N D ; YH ; Y S I C SN,A I V E T U A L D E P E N D E N CP East37th Ave., Eugene,OR, SELF-REPLICATION 290 Laing, Richard AUTONOCA, Diego, San Center, Systems Ocean Naval Harmon, Y. Scott MACHINE Pat Langley,Universityof California,lrvine,CA, LEARNING, ROBOTS,MOBILE MOUS VEHICLES; PA, CREATIVITY MichaelLebowitz,ColumbiaUniversity,New York,NY, MEMORYORCAPittsburgh, University, Carnegie-Mellon Hayes, R. fohn NIZATIONPACKETS PA, NATURALUniversity,Pittsburgh, Philip f. Hayes,Carnegie-Mellon Amherst,MA, EMOTION G. Lehnert,Universityof Massachusetts, Wendy D E R S T A N D I N C N U LANCUACE A N A L Y S I S S T O R Y M O D E L I N C ; BLACKBOARD CA, Alto, BarbaraHayes-Roth,StanfordUniversity,Palo Larry f . Leifer,StanfordUniversity,Stanford,CA, PROSTHESES SYSTEMS M. Lesgold,Universityof Pittsburgh,Pittsburgh,PA, EDUCATION SYSAlan EXPERT CA, Alto, Frederick Hayes-Roth, Teknowledge, lnc., Palo APPLICATIONS TEMS; RULE.BASEDSYSTEMS Amherst,MA, DISTRIBUTED Universityof Massachusetts, R. Lesser, Austin Henderson, Xerox Palo Alto ResearchCenter, PaloAlto,CA, OFFICE Victor SOLVINC PROBLEM AUTOMATION
CONTRIBUTORS
vii
, CA, MEDICALADVICE Gfenn f. Rennels,StanfordUniversity,Stanford SYSTEMS ElaineA. Rich, Microelectronicsand ComputerTechnologyCorporation INTELLICENCE (MCC),Austin,TX, ARTIFICIAL ChristopherK. Riesbeck,Yale University,New Haven,CT, PARSINC,EXPECTATION.DRIVEN fay Rosenberg,StateUniversityof New York, Buffalo,NY, BASEBALL; FOLOOPS;MACHACK-6;POP-2;REASONINC, 4.5; KAISSA; CHESS SHAKEY CUS-OF-ATTENTION ; REF-ARF; Paul S. Rosenbloom,StanfordUniversity,PaloAlto, CA, SEARCH,BESTFIRST MA, DISCOURSE Remkof . H. Scha,Bolt Beranek& Newman,Cambridge, METHODS CHESS REVIBELIEF Portugal, U N D E R S T A N D I N G Lisboa, Tecnico, Superior Instituto Martins, P. foao LenhartK. Schubert,Universityof Alberta,Edmonton,Alberta,MEMORY, SION PA, DESEMANTIC Pittsburgh, University, Carnegie-Mellon McClelfand, L. fames facobT. Schwarlz,New YorkUniversity,New York,NY, LIMITSOF ARTIMONS Drew V. McDermott, Yale UniversitY,New Haven,CT, REASONINC, F I C I A LI N T E L L I C E N C E PA,COLORVlUniversity,Pittsburgh, StevenA. Shafer,Carnegie-Mellon TEMPORAL REASONING, SPATIAL; NATUMA, S I O N Amherst, Massachusetts, of University McDonald, David D. StuartC. Shapiro,StateUniversityof New York, Buffalo,NY, PROCESSUACECENERATION RAL.LANG INC, BOTTOM.UPAND TOP-DOWN CA, COMAngeles, Los California, of University Michel A. Melkanoff, David E. Shaw,ColumbiaUniversity,New York, NY, NON-VON P U T E R . A I D EDDE S I G N PaloAlto, CA, PROSystems, Inc., Cincinnati,OH, BeauA. Sheil,XeroxArtificialIntelligence Associates M. EugeneMerchant,Metcut Research C R A M M I N CE N V I R O N M E N T S RING MANU FACTU COMPUTER-INTECRATED Laboratory,lbaraki,Japan,PROXIMITY RyszardS. Michalski,Universityof lllinois,Urbana,lL, CLUSTERING; YoshiakiShirai,Electrotechnical SENSINC CONCEPTLEARNINC TEMPOYoav Shoham,Yale University,New Haven,CT, REASONINC, famesH. Moor, DartmouthCollege,Hanover,NH, TURINCTEST RAL APPLICATIONS MILITARY VA, Mclean, Corp., Mitre PaulMorawski, EdwardH. Shortliffe,StanfordUniversity,Stanford,CA, MEDICALADVICE IN METASYSTEMS NY, Buffalo, York, New of University State Morgado, Ernesto RandallShumaker,Naval ResearchLaboratory,Washington,DC, MlLlK N O W L E D C E- R , U L E SA, N D - R E A S O N I N C IN TARY,APPLICATIONS MargaretC. Moser, Bolt, Beranek& Newman,Cambridge,MA, CRAMMN, ALPHA-BETA lamesR. Slagle,Universityof Minnesota,Minneapolis, MAR, CASE REDUCTION PROBLEM MATCHINC; PATTERN PRUNING; REPRESENTAOntario, Toronto, Toronto, of University fohn Mylopoulos, Steven L. Small, Universityof Rochester,Rochester,NY, PARSINC, T I O N ,K N O W L E D C E WORD-EXPERT ANTHROROBOTS, PA, Bethlehem, University, Lehigh Nagel, RogerN. Brian C. Smith, Xerox Palo Alto ResearchCenter,Palo Alto, CA, SELFPOMORPHIC REFERENCE WA, LlNCUlSSeattle, Frederickf. Newmeyer,Universityof Washington, INFERof Maryland,CollegePark,MD, INDUCTIVE CarlSmith,University AND PERFORMANCE TICCOMPETENCE ENCE Menlo Park,CA, ROBOTICS David Nitzan,SRIInternational, NETWORKS Thornwood,NY, SEMANTIC REASON- lohn K. Sowa,IBM Corporation, VA, EPISTEMOLOCY; faneT. Nutter,VirginiaTech,Blacksburg, INFORMAUK, of Cambridge, Cambridge, Karen University Sparck fones, ING,DEFAULT NY, SENTION RETRIEVAL Kennethf. Overton, Ceneral ElectricCompany,Schenectady, SargurN. Srihari,StateUniversityof New York, Amherst,NY, VITERBI SORS ALGORITHM Instituteof Technology,Cambridge,MA, SeymourPapert,Massachusetts of lllinois,Urbana,lL, CLUSTERING COMPUTERSIN EDUCATION,CONCEPTUALISSUES;PERCEP. RobertStepp,University Salvatoref . Stolfo,ColumbiaUniversity,New York, NY, DADO TRON RohitParikh,City Universityof New York,New York,NY, MODAL LOCIC William R. Swartout,Universityof SouthernCalifornia,Marinadel R"y, CA, EXPLANATION StephenG. Pauker,New EnglandMedicalCenter,Boston,MA, DECISION Ming RueyTaie, StateUniversityof New York, Buffalo,NY, AM; DENTHEORY T ;A C S Y M AM; Y C I N ;P A M ; D R A L ;E L I ;E M Y C I NC; U I D O N ;I N T E R N I SM fudeaPearl,Universityof California,LosAngeles,CA, AND/ORCRAPHS; X-CON SAM;SOPHIE; POLITICS; PROLOC;PROSPECTOR; BAYESIANDECISIONMETHODS; BRANCHINCFACTOR;CAME fay M. Tenenbaum,SchlumbergerPalo Alto Research,Palo Alto, CA, TREES DonafdPerlis,Universityof Maryland,CollegePark,MD, CIRCUMSCRIP- M A T C H I N C MENUInc., Dallas,TX, ELLIPSIS; Harry Tennant,TexasInstruments, T I O N ; R E A S O N I N CN, O N M O N O T O N I C BASEDNATURALLANCUACE StanleyR. Petrick,IBM Corporation,YorktownHeights,NY, PARSINC PaloAlto Research Center,PaloAlto, SpringHouse,PA,CHEMIS- DemetriTerzopoulos,Schlumberger ThomasH. Pierce,Rohmand HaasCompdoY, CA, VISUALDEPTHMAP T R Y ,A I I N PA, INHERIUniversity,Pittsburgh, lra Pohl, Universityof California,SantaCruz, CA, SEARCH,BIDIREC- David S. Touretzky,Carnegie-Mellon TANCEHIERARCHY TIONAL Livia Polanyi, Bolt Beranek& Newman,Cambridge,MA, DISCOURSE fohn K. Tsotsos,Universityof Toronto,Toronto,Ontario,IMACEUNDERSTANDINC UNDERSTANDINC ANALUniversityof lllinois,Urbana,lL, DOT-PATTERN Keith E. Price, Universityof SouthernCalifornia,Los Angeles,CA, RE- MihranTuceryatr, YSIS C I O N - B A S ES DE C M E N T A T I O N NY, PHILOSOPHICAL University,Syracuse, ZenonW. Pylyshyn,The Universityof WesternOntario,London,Ontario, RobertVan Gulick,Syracuse QUESTTONS SCIENCE COCNITIVE Everberg, Belgium,BACKWilliam f. Rapapofr,StateUniversityof New York, Buffalo,NY, BELIEF Raf Venken,BelgianInstituteof Management, T R A C K I N CC; O R O U T I N E S LOCIC,PROPOSITIONAL LOCIC;LOCIC,PREDICATE; $Y$TEMS; Pasadena, CA, PLANNINC StevenA. Vere,JetPropulsionLaboratory, PaloAlto, CA, A* ALCORITHM BertramRaphael,Hewlett-Packard,
lnstituteof Technolagy , Cambridge,MA, Henry Lieberman,Massachusetts LANCUACES,OBJECT-ORIENTED MEMORY G. lack Lipovski,Universityof Texas,Austin,TX, ASSOCIATIVE Donaldw. Loveland,Duke University,Durham,NC, COMPLETENESS Alan K. Mackworth,Universityof BritishColumbia,Vancouver,British SATISFACTION Columbia,CONSTRAINT Anthony S. Maida, The PennsylvaniaStateUniversity,UniversityPark, FRAMETHEORY Pennsylvania, Instituteof Technology,Cambridge,MA, lohn C. Mall€ry, Massachusetts HERMENEUTICS Alberta,COMPUTER of Alberta,Edmondton, TonyA. Marsland,University
Viii
CONTRIBUTORS
PaloAlto Research Center,PaloAlto, CA, R. Verott, lJniversityof New Mexico,Albuquerque,NM, RESOLUTION, Andrew Witkin, Schlumberger METHODS SCALESPACE BINARY CA, CYBERNETICS Robertf. Woodham,Universityof BritishColumbia,Vancouver,British Heinz von Foerster,1 EdenWest Road,Pescadero, DeborahWalters,StateUniversityof New York, Buffalo,NY, REPRESEN- Columbia,SHAPEANALYSIS TATION,ANALOCUE William A. Woods, AppliedExpert Systems,Inc. and HarvardUniversity, AUGMENTED TRANSITION NETWORK; MA,GRAMMAR, SEMANCambridge, Cambridge, MA, WALTZ DavidL. Waltz, ThinkingMachinesCorporation, TICS,PROCEDURAL FILTERINC Argonne,lL, RESOLUTION, Mitchell Wand, NortheasternUniversity,Boston, MA, LAMBDA CAL- LawrenceWos,ArgonneNationalLaboratory, B I N A R Y CULUS MEMORY A. HanyongYuhan,StateUniversityof New York, Buffalo,NY, CONMichaelf. Watkins,Rice University,Houston,TX, EPISODIC ; R L ;L I F E RL; U ; E A R S A IYI ; K L - O N EK ; P S ;H A R P Y H N I V E RF ; R U M PG PA, QUESPhiladelphia, Bonnie[. Webber,Universityof Pennsylvania, ; TRIPS ; I R ;S L I PS N A R ;P L A N E SP; L A N N E RS; A I N T S TION ANSWERING Yorick Wilks, New Mexico StateUniversity,LasCruces,NM, MACHINE StevenW. Zucker,McCill University,Montreal,Quebec,VlSlON,EARLY TRANSLATION ; PRIMITIVES
REVIEWERS f . K. Aggarwal,Universityof Texas,Austin,TX Washington,DC famesF. Albus,NationalBureauof Standards, NY Rochester, Rochester, of Allen, University fames Instituteof Technology,Cambridge,MA fonathanAllen, Massachusetts SaulAmarel,RutgersUniversity,New Brunswick,NJ Menlo Park,CA D. E. Appelt,SRIInternational, MichaelArbib, Universityof California,SanDiego,CA PA Philadelphia, NormanBadler,Universityof Pennsylvania, PA Philadelphia, RuzenaBajcsy,Universityof Pennsylvania, RobertBalzer,Universityof SouthernCalifornia,Marinadel Ray,CA NY Rochester, Universityof Rochester, Amit Bandyopadhyay, PA RananB. Banerji,St.Joseph'sUniversity,Philadelphia, Cambridge, MadeleineBates,Bolt, Beranekand Newman Laboratories, MA GerardoBeni,Universityof California,SantaBarbara,CA Menlo Park,CA fared Bernstein,SRIInternational, DonaldBerwick,HarvardCommunityHealthPlan,Cambridge,MA lnstituteof Technology,Cambridge,MA RobertBerwick,Massachusetts Alan Biermann,Duke University,Durham,NC and ComputerTechnologyCorporation Woody Bledsoe,Microelectronics (MCC),Austin,TX Instituteof Technology,Cambridge,MA Ned Block,Massachusetts Center,PaloAlto, CA Daniel Bobrow,XeroxPaloAlto Research MargaretA. Boden,Universityof Sussex,Brighton,UK State,MS StateUniversity,Mississippi LoisBogges,Mississippi Instituteof Technology,Cambridge,MA Michael Brady,Massachusetts Instituteof Technology,Cambridge,MA RodneyBrooks,Massachusetts Chris Brown, Universityof Rochester,Rochester,NY Center,PaloAlto, CA lohn S. Brown, XeroxPaloAlto Research BertramBruce,Bolt, Beranek& Newman,Cambridge,MA Leuven,Heverlee,Belgium MauriceBruynooghe,KatholiekeUniversiteit BruceBuchanan,StanfordUniversity,Stanford,CA Arthur Burks,Universityof Michigan,Ann Arbor, Ml Holmdel,NJ David Burr, Bell Laboratories, PA University,Pittsburgh, laime Carbonell,Carnegie-Mellon EugeneCharniak,Brown University,Providence,Rl MurrayHill, NJ KennethW. Church,AT&T Bell Laboratories, K. L. Clark,QueenMary College,London,UK l. C. Colson,IBM Corporation,Austin,TX LawrenceDavis,Universityof Maryland,CollegePark,MD Martin Davis,New York University,New York, NY Center,PaloAlto, CA fohan de Kleer,XeroxPaloAlto Research Daniel Dennett,TuftsUniversity,Medford,MA Atomique,Cif sur Yvette,France a L'Energie f . Detriche,Commissariat PA University,Pittsburgh, fohn Doyle,Carnegie-Mellon Hubert Dreyfus,Universityof California,Berkeley,CA Menlo Park,CA RichardDuda,Syntelligence, MichaelDyer, Universityof California,LosAngeles,CA Alberto Elses,Carnegie-Mellon University,Pittsburgh,PA E. Aflen Emerson,Universityof Texas,Austin,TX GeorgeW. Ernst,CaseWesternReserveUniversity,Cleveland,OH RichardFateman,Universityof California,Berkeley,CA
NY ferry Feldman,Universityof Rochester,Rochester, NichofasFindler, ArizonaStateUniversity,Tempe,AZ HarveyFineberg,HarvardSchoolof PublicHealth,Boston,MA Fernando,Flores,Logonet,Berkeley,CA PA University,Pittsburgh, Mark Fox,Carnegie-Mellon EugeneC. Freuder,Universityof New Hampshire,Durham,NH PeterW. Frey,NorthwesternUniversity,Evanston,lL foyce Friedman,221Mt. AuburnSt.,Cambridge,MA GeraldGazdar,Universityof Sussex,Brighton,UK MichaelGeorgeff,SRIInternational,Menlo Park,CA Center,PaloAlto, CA Adele Goldberg,XeroxPaloAlto Research RichardGreenblatt,LispMachinelnc., Cambridge,MA MD EvonC. Greanias,IBM Corporation,Caithersburg, Instituteof Technology,Cambridge,MA l,V.E. L. Grimson,Massachusetts David Grossman,IBM Corporation,YorktownHeights,NY Center,PaloAlto, CA Chris Halvorsen,XeroxPaloAlto Research Amherst,MA A. R. Hanson,Universityof Massachusetts, Ann Arbor,Ml RobertHarlick,MachineVisionInternational, Center,San Diego,CA Scott Harmon, Naval Ocean Research RobertM. Harnish,Universityof Arizona,Tucson,AZ Menlo Park,CA PeterHart, Syntelligence, lohn Haugeland,Universityof Pittsburgh,Pittsburgh,PA StanfordUniversity,Stanford,CA BarbaraHayes-Roth, TecknowledgeInc., PaloAlto, CA FrederickHayes-Roth, University,Fairfield,lA ChrisHaynes,MaharishiInternational Gary Hendrix,Symantec,Cupertino,CA Instituteof Technology,Cambridge,MA Carl Hewitt, Massachusetts Instituteof Technology,Cambridge,MA EllenC. Hildreth, Massachusetts Cambridge,MA W. DanielHillis,ThinkingMachinesCorporation, University,Pittsburgh,PA GeoffreyHinton, Carnegie-Mellon GraemeHirst, Universityof Toronto,Toronto,Ontario f. R. Hobbs,Ablex Publishing,Norwood,NJ Keith Holyoak,Universityof California,LosAngeles,CA Instituteof Technology,Cambridge,MA BertholdHorn, Massachusetts RobertA. Hummel, New York University,New York, NY Menlo Park,CA David lsrael,SRIInternational, Rayfackendoff,BrandeisUniversity,Waltham,MA PA Philadelphia, Aravindfoshi, Universityof Pennsylvania, University,Pittsburgh,PA TakeoKanade,Carnegie-Mellon LaveenKanal,Universityof Maryland,CollegePark,MD RobertKling,Universityof California,lrvine,CA fanet Kolodner,CeorgiaInstituteof Technology,Atlanta,GA William Kornfeld,QuintasCorporation,PaloAlto, CA of Helsinki,Helsinki,Finland KimmoKoskenniemi, University RobertKowalski,Universityof London,London,UK BenjaminKuipers,Universityof Texas,Austin,TX Vipin Kumar,Universityof Texas,Austin,TX MichaelLebowitz,ColumbiaUniversity,New York, NY Amherst,MA Wendy Lehnert,Universityof Massachusetts, and ComputerTechnologyCorporation DouglasB. Lenat,Microelectronics (MCC),Austin,TX Atomique,Cif sur Yvette,France a L'Energie B. Lesigne, Commissariat
REVIEWERS Naomi Sager,New York University,New York, NY Amherst,MA Victor Lesser,Universityof Massachusetts, G. Salton,CornellUniversity,lthaca,NY NJ Hill, Murray DianeLitman,AT&T Bell Laboratories, Ericf . Sandewall,LinkoepingUniversity,Linkoeping,Sweden Ray Liuzzi, CriffithsAir ForceBase,Rome,NY L. K. Schubert,Universityof Alberta,Edmonton,Alberta Donald Loveland,Duke University,Durham, NC University,Pittsburgh,PA Instituteof Technology,Cambridge, StevenShafer,Carnegie-Mellon Massachusetts TomasLozano-Perez, lbaraki,Japan Laboratories, Electrotechnical Shirai, Yoshiaki MA Park,MD College Maryland, of University Ben Shneidermar, SC Clemson, University, Luh, Clemson fohn TX Austin, Texas, of University British CoVancouver, Simmons, Robert Alan Mackworth,Universityof BritishColumbia, PA University,Pittsburgh, HerbertSimon,Carnegie-Mellon lumbia StateUniversity,UniversityPark,PA famesR. Slagl€,Universityof Minnesota,Minneapolis,MN AnthonyS. Maida,The Pennsylvania Rochester,NY SteveSmall,Universityof Rochester, Instituteof Technology,Cambridge,MA fohn Mallery, Massachusetts Center,PaloAlto, CA Brian Smith,XeroxPaloAlto Research Instituteof Technology,Cambridge,MA David McAllister,Massachusetts Park,MD College Maryland, of lL University Smith, Chicago, Carl of Chicago, University McCawley, famesD. PA DouglasR. Smith, KestrelInstitute,PaloAlto, CA University,Pittsburgh, fames[. McClelland,Carnegie-Mellon famesSolberg,PurdueUniversity,West Lafayette,lN Drew McDermott,Yale University,New Haven,CT Lowell, MA Inc., Cincinnati,OH ThomasM. Sommer,Wang Laboratories, Associates, EugeneMerchant,MetcutResearch California,Marinadel Ray, Southern of University Sondheimer, Norman lL Urbana, lllinois, of RyszardS. Michalski,University CA lack Minker, Universityof Maryland,CollegePark,MD FrankSonnenberg,New EnglandMedicalCenter,Boston,MA lnstituteof Technology,Cambridge,MA Marvin Minsky,Massachusetts PA fohn F. Sowa,IBM Corporation,New York, NY University,Pittsburgh, HansMoravec,Carnegie-Mellon Cambridge,MA Guy Steele,ThinkingMachinesCorporation, RogerNagel,LehighUniversity,Lehigh,PA Center,PaloAlto, CA Marc Stefik,XeroxPaloAlto Research Dana Nau, Universityof Maryland,CollegePark,MD salvatorestolfo, columbia University,New York, NY Frederickf. Newmeyer,Universityof Washington,Seattle,WA ComputerAided SystemsFacility,Palo Marty Tenenbaum,Schlumberger CA La of California, University Jolla, DonaldNorman, Alto, CA fane T. Nutter,TulaneUniversity,New Orleans,LA Inc., Austin,TX Harry Tennant,TexasInstruments, GregOden, Universityof Wisconsin,Madison,Wl Center,PaloAlto, PaloAlto Research DemetriTerzopoulos,Schlumberger A. L. Pai,ArizonaStateUniversity,Tempe, AZ CA MA Boston, Center, Medical StevenPauker,New England Henry Thompson,Universityof Edinburgh,Edinburgh,uK fudeaPearl,Universityof California,LosAngeles,CA PA University,Pittsburgh, David Touretzky,Carnegie-Mellon PORTUCAL Lisbon, Lisboa, Nova de L. M. Pereira,Universidade tohn K. Tsotsos,Universityof Toronto,Toronto,Ontario DonaldPerlis,Universityof Maryland,CollegePark,MD EndetTulving,Universityof Toronto,Toronto,Ontario Menlo Park,CA RayPerrault,SRIInternational, Vass,Universityof Pittsburgh,Pittsburgh,PA NY Heights, Yorktown fames IBM Corporation, Petrick, Stanley CA vere, JetPropulsionLaboratory,Pasadena, MA steven Amherst, Gerry Pocock,Universityof Massachusetts, New York, Buffalo,NY of University State MA Walters, Deborah Cambridge, TechnologY of Institute , Massachusetts Poggio, Tomasso Cambridge,MA DavidWaltz, ThinkingMachinesCorporation, lra Pohl,Universityof California,SantaCruz,CA Boston,MA University, Northeastern wand, Mitchell Ontario London, Ontario, Western of University Pylyshyn, Zenon DavidWarren,QuintasCorporation,PaloAlto, CA William f. Rapaport,StateUniversityof New York, Buffalo,NY DonaldWaterman,RandCorporation,SantaMonica,CA CA Alto, Palo Hewlett-Packard, Raphael, Bertram PA Phitadelphia, VA Bonniewebber, Universityof Pennsylvania, CharlesReiger,Vidar SystemsCorporation,Herndon, Brunswick,NJ New University, Rutgers Columbia weiss, British Vancouver, shalom Columbia, British of RayReiter,University TX Elaine Rich, Microelectronicsand ComputerTechnologyCorporation CraigWilcox, Universityof Texas,Austin, Yorick Wilks, New Mexico StateUniversity,LasCruces,NM (MCC),Austin,TX CT Ridgefield, PeterWill, Schlumberger-Doll, Charfesf . Rieger,1OO2BroadmoorCircle,SilverSpring,MD center,PaloAlto, CA Research Alto Palo schlumberger CT witkin, Haven, New Andrew University, Yale ChristopherK. Riesbeck, MA Cambridge, Systems, MA William Woods,Applied Expert lnstituteof TechnologyPress,Cambridge, CurtisRoads,Massachusetts lL Argonne, Laboratories, National Argonne CA Wos, Stanford, Larry University, Paul Rosenbloom,Stanford PA University,Pittsburgh, steven Tucker,Mccill University,Montreal,Quebec Alexanderl. Rudnicky,Carnegie-Mellon CA Alto, Palo Inc., EarlSacerdoti,Teknowledge,
GUESTFOREWORD branch of AI, part of the new field of cognitive Artificial Intelligence (AI) is a domain of research, application, Z. The second science,is aimed at programs that simulate the actual proand instruction concerned with programming computers to cessesthat human beings use in their intelligent behavior. perform in ways that, if observed in human beings, would be These simulation programs are intended as theories (sysregarded as intelligent. Thus intelligence is attributed to hutems of difference equations) describing and explaining hu*"tt beings when they play chessor solve the Tower of Hanoi man performances. They are tested by comparing the compuzzle. A computer that can perform one of these tasks even puter output, second-by-secondwhen possible,with human moderately *ett is regarded as an example of artificial intelliLehavio" lo determine whether both the result and also the gence. actual behavior paths of computer and person are closely the Research in AI began in the mid-1950s, shortly afber similar. security wartime first digital computers emerged from their nuout carry primarily to designed wraps. The computer was Early research in AI was directed mainly at studying well*rri.ul computations in an efficient way. But it was soon obstructured puzzle-Iike tasks, where human behavior in the served (the English logician, A. M. Turing, was perhaps the Iaboratory could be compared with the traces of the computer first to make this observation) that computers were not limited This work produced a basic understanding of probprograms. all of processing general to numbers, but were capableof quite i.* solving as (nonrandom) search guided by heuristics or kinds of symbols or patterns, Iiteral and diagrammatic as well rules of thumb. It confirmed Duncker's* early emphasis upon as numerical. AI progTams exploit these capabilities. as a central tool for solving problems. A digital computer is an example of a physical symbol sys- means-endsanalysis into domains like chess-playing and expanded research As (reading); outputtem, a system that is capable of inputting diagnosis, two tasks that have been prominent in the ting (writing); organizing (associating); storing, copying, and med.icat grew that successfultask performance de.o*puring symbols; and of branching-following different literature, "nid"nce to large bodiesof knowledgeby a process access rapid pends on .orrrr.s of action depending on whether a comparison of sym(often called "intuition"). Experiments recognition cue of bols led to judging them to be the same or different. The fundain such domains is capable of expert human the that showed just it capabilities the mental hypothesis of AI is that these are patterns-using chunks-familiar 50,000 or more requires to exhibit "intelligence." Two corollaries follow from recognizing in long-term stored information access to recognition th; hypothesis. First, since computers demonstrably have this physician recogthe Thus, patterns. the to relevant memory to these capabilities, they are capable of being programmed and symptoffis, disease to colTesponding patterns nizes beof people capable are behave intelligently. Second,since their diseases, the about his knowledge having intelligently, their brains are (at least) physical sym- thereby gains accessto treatment, and further diagnostic tests. bol systems. Research in the cognitive science branch of AI up to the The fundamental hypothesis of AI and its corollaries are (1986) has placed particular emphasis on problem present deterbe to empirical hypotheses, whose truth or falsity are or the organi zation of long-term memory (semantic solving, at aimed Research test. empirical mined by experiment and memory), and on learning processes. testing them leads to the two main branches of AI: From the beginnirg, research in both branches of AI was by the invention of programming languages espefacilitated aimed 1. AI in the narrow senseis a part of computer science, to their needs.The so-calledlist processinglanadapted cially be can computers which over tasks of at exploring the range programmed to behave intelligently. It makes no claims guages, first developedin 1956, allowed for flexible, associative organization of memory and convenient representation of that computer intelligence imitates human intelligence in such psychologicalconceptsas directed associationsand scheto responses produces intelligent its processes-only that it Around 1970, production-system languages were develmas. for may, category programs in this AI the task demands. sophistiexample, use rapid arithmetic processesat a rate that peo- oped, whose basic instruction format represents a and stimuli betwen connection of the elaboration cated ple are incapable of. Thus, an AI chess program may exa choosing game tree before of the plore a million branches move, while a human grandmaster seldom explores more {'
Xii
FOREWORD GUEST
responses,and provides a direct representation of the recognition processmentioned above. (The condition part of each production, when it matches the information held in short-term memory, causes an associated action to be performed. Upon matching the conditions of a production in an act of recognition the action may simply be to retrieve associatedinformation from memory, or it may be an actual motor response.) Production-system languages have proved to be convenient for research on learning, becauseprograms can be written in a format that, in appropriate circumstances,simply createsnew productions that are thereby annexed to the program and are executable. For example, programs have been written that learn to solve equations in algebra, by examining worked-out examples of solutions and then manufacturing new productions based in the processesobservedin the examples. AI has been most successful,up to now, in dealing with socalled higher mental processes,including language. Progress has been slower in imitating the sophisticated sensory and pattern-extraction processesof the human eye and ear and in linking these with motor processes(robotics).Researchprog-
reSScontinues, however, on all fronts, with some degree of specialization of groups concernedwith problem-solving and memory, with sensory pattern recognition, and with robotics, respectively. AI research is to be found primarily in computer sciencedepartments and psychologydepartments, but also to some extent in linguistics and in an increasing number of departments where AI techniques are being applied to disciplinary problems (e.g.,architectural design, discoveryof reaction pathsfor chemical synthesis, aids to expository writing, drawing, musical composition). The introduction of AI methods and techniques was a principal factor in bringing about the so-called cogRitive revolution in psychology, in the 1960s and 1970s, and the new methodologies of computer simulation and analysis of verbal protocols are now vital tools of research in experimental psycholory. H. A. Snuou Carnegie-Mellon
UniversitY
FOREWORD EDITOR'S The Encycloped,iaof Artificial Intetligence defines the discipline of Artin.iut Intelligence (AI) by bringing together the core of knowledge from all its fields and related disciplines. The articles u.r. written primarily for the professional from another discipline who is seeking an understanding of AI, and secondarily fir the lay reader who wants an overview of the entire field or information on one specific aspect.The Encycloped,iaclarifies and corrects misperceptionsas well as provides understanding of AI. a proper T6e object of research in AI is to discover how to program a computer to perform the remarkable functions that make up human intefigence. This work leads not only to increasingly useful compui.rr, but also to an enhanced understanding of human cognitive processes,of what it is that we mean by ,,intelligence" and what the mechanismsare that are required to produce it. AI is surely one of the most exciting scientific and commercial enterprises of our century. It's limits are yet to be discovered. The Encycloped.iahas significant contributions to the AI literature, not only becauseit brings many disciplines into one comprehensive reference, but also because it contains many Iandmark articles, such as: Blackboard Systems; Computer ChessMethods; Cogrritive Psychology;Grammar (Augmented Tyansition Netwoik; case; Definite-clause; Generalized phrase-Structure; Phrase-Structure; Semantic; and Transfor-
mational); Limits of AI; Lisp; Natural-Language (Generation; Interfaces; and Understanding); Path Planning and obstacle Avoidance; Reasoning (causal; commonsense;Default; Nonmonotonic; plausible; Resource-Limited; spatial; and remporal); Robotics; Search (Best-First; Bidirectional; Branch-andBound; and Depth-First); and Social Issues of AI. AII of the material is specifically written for the Encyclopedia. In addition, the Encyclopediahas separate articles on various game-playing progTaffis,vision, speechunderstanding, image understanding, matchitg, multisensor integration, and parsing, as well as many short articles' The articles and the authors invited to write them were chosenwith the cooperation of an editorial advisory board of distinguished authorities. The author of each article is a recognized research expert on the topic. Each article has a bibliography and extensive cross-referencesto other articles. The ,."drr may start with almost any article and be led by crossreferences to almost every other article in the Encyclopedia. There are more than 450 tables and figures. Stressing readability, accur dcy, and completenessof facts as well as overall usefulness of material, this great work brings you the result of years of labor and exPerience. Sruent C. SnePIRo SUNY at Buffalo
PREFACE
I becameinvolved in the project to develop thi s Encyclopedia of Artificial Intelligence in the spring of 1983, when I was approachedby Barbara Chernow, who had already had prelimittuty discussions with Martin Grayson of John Wiley & Sons and several prominent AI researchers and educators. A1though I was *uttt"d by several people that this would involve much more work than I could imagine (and they were right), the opportunity to help create a definitive and comprehensive view of tn" field, authored by a wide variety of experts, each writing on his or her own areaof expertise, and the promise of significant help from Wiley's Encyclopedia Department (this promise was more than fulfiIled) was more than I could resist. barbara and I put together the editorial board, and the board and I drew up the list of entries and the people we felt could best write the articles. David Eckroth joined the project as the managing editor and has done a massive amount of work to see it through to Publication. AI is a relatively young field, and is still rife with controversy about what it is and about what constitutes good and valuable research. Some researchers felt that an encyclopedia was premature. There was controversy about the selection of
articles, some mild, some quite heated. Nevertheless, I was extremely gratified with the number of people who were willing to take time from their already busy schedulesto write and to review articles. Those involved constitute a significant percentage of all active AI researchers, from all the different "camps" and the major research institutes and universities. Now (summer 1986),&sAI celebratesits thirtieth birthday, we offer this snapshot and prospectus of our field. I am grateful to many people whose efforts have gone into making this Encyclopedia; Barb ata Chernow and Martin Grayson, who started it; the members of the editorial board, who defined it; David Eckroth, who managed it all; the authors and reviewers, who created it; Elizabeth Harrison, Karen Thomsen, Beryl Matshiqi, Sally Elder, and Lynda Spahr, David's and my secretaries, who kept us all organized; and Caren, my wife, whose support and encouragement got me through. Sruanr C. SnaPIRo SUNY at Buffalo
AND ACRONYMS ABBREVIATIONS AA AAAI AAR AC ACH Ack ACL ACM ACT ADJ AFCET AFIPS AGE AGV AI AIM AI/PL AIRPLAN AISB AJCL AKO ALCS ALLC ALPAC ALU AM AML AMRF AMS APIC APL APSG AR ARC ARMA
ACT assisters American Association for Artificial Intelligence Associationfor Automated Reasoning applicability conditions Association for Computersand the Humanities acknowledge A ssociationfor Computational Ling uistics Association for Computing Machinery accumulation time; actions or abstract nouns; Adaptive Control of Thought adjective AssociationFrancaise pour la Cybernetique Econorniqueet Technique American Federation of Information ProcessingSocieties attempt to generalize automatic guided vehicle artificial intelligence artificial intelligence in medicine AI Programming Language planning military air-traffic movement Societyfor the Study of Artificial Intelligenceand Simulation of Behauior American Journal of Computatianal Lin' guistics a kind of Analogue Concept Learning System Association for Literary and Linguistic Computing Aduisory Automated Language P roces,sing Committee arithmetic and logic unit Automated Mathematician a manufacturing language Automated Manufacturing Research Facility American Mathematical Society Automatic Programming Information Center a programming language augmented phrase-structure grammar autoregressive Association pour la RechercheCognitiue autoregressive/moving average
ARPA ARPANET ASCII ASEE ATE ATC AT/I ATN AU AUX B&B BC BCD BHFFA BIM BIP BIT BNF bpa bps BRDF BSC BTN C ca
CA CACM CADAM CAD/CAM CAE CAI CAP CAR CASNET CASREP CAT
Advanced Research Projects Agency, now called DARPA ARPA's telecommunication network American Standard Code for Information Interchange American Societyfor Engineering Education automatic test equipment Air Traffic Control Advice Taker/Inquirer augmented transition network argument unit auxiliary branch-and-bound behaviorally correct binary coded decimal bidirectional heuristic front-to-front algorithm Belgian Institute of Management Basic Instructional Progtam built-in test Backus Normal (Naur) Form basic probability assignment bits per second bidirectional reflectancedistribution function Binary Synchronous Communication Basic Transition Network CONTACT; a popular programming language circa Concept Analyze\ Chemical Abstracts Communications of the Association for Com' puting Machinery computer-augmented design and manufacturing computer-aided design/computer-aided manufacturing computer-assistedengineering computer-assisted instruction control agreement principle contents of the addresspart of register number Causal Association Network Casualty Report Computer Aided Tomography; category
xvlll
AND ACRONYMS ABBREVIATIONS
CATV CC CCD CCITT CCTA CD CDR CD.ROM CF CFG CFL CF.PSG CG CHI C3I CIE CIM CIRP CK CKY CL CLS CM CMU CNC CNET CNF Coax COLING COMPCON CPS CPVR CPU CRC CRIB CRT CSCSI CSG CSL CSMA CSP CSS CTM CWA CWR DAG DARPA DBMS DCE DCG DCL
Community Antenna television system conceptual cohesiveness charge couple device Consultiue CornmitteeInternational for TeIepathy and TelegraPhY Central Computer and Telecommunications Agency conceptual dependency;collision detection contents of the decrement part of register number Compact disk read-only memory certainty factor; context-free context-free grammar context-free language context-free phase-structure grammar causal graph computer-human interfaces cornmand,control, communications, and intelligence International Commissionon lllumination computer-integrated manufacturing CotlegeInternationale de Recherchespour la Production control knowledge Cocke, Kasami, and Younger computational linguistics Concept Learning SYstem Connection Machine Carnegie-Mellon UniversitY Computer Numerical Controls Centre National d'Etudes des Telecommunications conjunctive normal form coaxial cable International Conferenceon Computational Lingui.s/ics Computer SocietyInternational Conference constraint-satisfaction problem Camputer Vision and Pattern Recognition central Processingunit cyclical redundancYcheck computer retrieval incidence bank cathode-ray tube Canad,ian Societyfor Computational Studies of Intelligence context-sensitive gTammar; constructive solid geometrY concept-learning Program carrier sense-multiPleaccess Communicating Synthetic Processes Cognitiue ScienceSocietY computational theorY of mind closed-worldassumPtion contents of the word in register number directed acYclic graPh DefenseAduanced ResearchProjects Agency (DOD) database-management sYstems data circuit-terminating equipment; data communication equiPment definite-clause grammar Department of Computational Logic
DCS dcu DDL DDM DDP DET DFA DFID DH DI/DO DL DLC DLPA DNF DO DOD DOF DOG DP DPS DRA DRS D-S DSS DT DTC DTE DU DVA DWIM E.... EBCDIC ECC EDC EDM EDP EEG e.g. EGI EIU EKG EL ELI E.MOP EMYCIN EPAM ER ES EST EX FAA FA/C FALOSY FCR FDM FEP FEM
Department of ComPuter Science discourse constituent unit data definition language dynamic discourse model distributed data Processing determiner deterministic finite state automaton depth-first iterative-deePening direct header digital input/outPut default logic digital logic circuit; data link control decoupling, Iinearization, and poles assignment disjunctive normal form derivation origin US Department of Defense degree of freedom difference of Gaussians data processing;dynamic programming Distributed Planning SYstems D ata-Representation Advisor Discourse Representation Structure Dempster-Shafer decision support sYstem decision tree Derivational Theory of Complexity data terminal equiPment DiscourseUnit dictionary Viterbi algorithm do what I mean episode extended binary-coded decimal interchange code error-correcting code error-detecting code electron-densitYmaP electronic data Processing electroencephalogram exempli gratia, for examPle extended Gaussian image Economist Intelligence Unit Electrocardiogram electronics laboratorY English-language interPreter episodic memory-org antzat'ionpacket Empty MYCIN Elementary Perceiver and Memorizet entity-relationshiP expert sYstem Extended standard Theory; Expert system Technolory explanatory Federal Aviation Administration functionally accurate, cooperative fault localization sYstem feature cooccurrencerestriction frequency division multiPlexing front-end processor;Finite Element Program Finite Element Method
ABBREVIATIONS AND ACRONYMS FES FFP FFT FIS FJCC FLPL FMS fopc forcel FRL FRUMP FSA FSD FUG g G GB GBT GC GDN GIMADS GKS GPF GPRS GPS GPSG GPSS GT HAM HASP HDLC HFC HG HT HV HWIM IATG IC ICAI ICI ICMC ICOT ICPR ICU ID IDAX i.e. IEEE iff IFIP
Functional Electrical Stimulation foot-feature principle fast Fourier transform fault-isolation system FalI Joint Computer Conference Fortran list-processinglanguage flexible manufacturing system first-order predicate calculus force element frame representation language Fast Reading, Understanding, and Memory Program finite state automaton functional sequencediagram; feature specification default Functional Unification Grammar gram gTammar; general general background Government-Binding Theory generalized cone; generalized cylinder goal-dependencynetwork Generic Integrated Maintenance Diagnostics Graphics Kernel System generalized potential field generalized production system general problem solver; global positioning satellite generalized phrase-structure grammar general-purposesimulation system group technology Human Associative Memory high-altitude sounding projectile; Heuristic Adaptive Surveillance Project high-level data link control head-feature convention Head Grammar Hough transform hidden variable Hear What I Mean Intelligent Automatic Test Generation instantaneous configuration; integrated circuit intelligent computer-assistedinstruction intelligent communications interface International Computer Music Conference Institute for New Generation Computing Technolosy International Conferenceon Pattern Recognitian intensive care unit immediate dominance iterative-deepening Ax id est, that is Institute of Electrical and Electronics Engineers if and only if International Federation for Information Processing
IGES IH IJCAI
xix
ruS ry
Intermediate Graphics-ExchangeStandard instrumental header International Joint Conferenceon Artificial Intelligence International Joint Conferenceon Pattern Recognition Integrated Knowledge-BasedModeling System Integrated Knowledge-BasedSystem integrated maintenance-information system information-management system inflection inertial navigation systems input/output Integrated Partial Parser Intelligence Quotient infrared information retrieval; industrial robot instrumented remote center of compliance Institute of Radio Engineers (later IEEE) Institut de Recherched'Informatique et d'Automatique indirect speechact Information SciencesInstitute Information Specification image understanding image-understanding system intransitive verb
JIRA JPL
JapaneseRobotic Industries Association Jet Propulsion Laboratory
KB KE kHz KNOBS KR KRL KS KSL KSAR KWIC KWOC
knowledge base knowledge engineering kilohertz Knowledge-basedSystem Knowledge Representation Knowledge-representationLanguage knowledge source Knowledge ScienceLaboratory knowledge-sourceactivation record keyword in-context keyword out-of-context
IJCPR IKBM IKBS IMIS IMS INFL INS
Ito IPP IQ ir IR IRCC IRE IRIA ISA NI ISPEC
ru
L LAS LCC LDS LED LF LFG L(G) LGN LH LHASA LHS LIFO LIL LIPS LISP LP LPC
language Language-acquisition System location-centered,cooperative(mode) legal (product liability) decisions light-emitting diode logical form Lexical-functional Grammar string language lateral geniculate nucleus locale header Logic and Heuristics Applied to synthetic Analysis left-hand side (or system) last-in, first-out Lexical Interaction Language Logical Inferences Per Second List-processing Langu age linear precedence linear productive coding
xx
AND ACRONYMS ABBREVIATIONS
LPE LR LSI LT LTM
Iarge processingelement long range Iarge-scale integration Logic Theory long-term memory
m M.... MA MAP MASES MATADOR MB MBR MD MFP MG MGCI MGU MIFASS
meter Maincon moving average Manufacturing Automation Protocol Microcomputer Advice and Selection Material Advice Organizer measure of belief multiple belief reasoner measure of disbelief morph-fitting program metamorphosis grammar most general common instance most general unifier Marine Integrated Fire and Air Support System multiple interaction, multiple data million instructions Per second man-machine environment man-machine interaction memory-management unit Memory-organization Packet consistent modal oPerator modus ponens message-processing Program material requirements Planning mass spectr(al,um); millisecond (10-3 s) modifier-structure grammar multiple single instruction, multiple data machine translation model theoretic semantics microsecond(L0-Gs)
MIMD MIPS MME MMI MMU MOP Mp MP MPP MRP ms MSG MSIMD MT MTS ts N na NACCC NAFIPS Nak Nand NASA NBS NC NFA NL NLI NLMenu NLP NLU nm NML nmr Nor NP
noun not available North American Computer ChessChampionship Ir{orthAmerican Fuzzy Information Process' ing SocietY negative acknowledgement not and National Aeronautics and SpaceAdministration National Bureau of Standards numerically controlled nondeterministic finite state automaton natural language natural-language interface menu-based natural-langu age understanding natural-langu age Processing natural- langu age understandin g nanometer (10-e m) nonmonotonic logic nuclear magnetic resonance not or noun phrase; class of functions that ate nondeterministically computable in a polynomial amount of time
NPR ns NSF NT NTPM NTSC
any NP-hard problem that is also NP if a quick polynomial time program exists, then everything in NP is computable in a polynomially bounded amount of time proper noun nanosecond(10-es) National ScienceFoundation narrower term normalized-texture-property map National TeleuisionFilm Council
OCR ONR OpEd OPM OS OSI OT OV
optical character recognition Office of Naval Research opinions to/from the editor operations per minute origin set Open Systems Interconnection origin tag open variable
P PA PABX PAS PC PD PDE PDN PE PF PG PH PIM PIP Pixel PI PLA PLNLP
preposition; pressure Programmer's Apprentice; PP assisters private automatic branch exchange phase array sYstem personal computer; printed circuit problem data domain partial differential equations public data network; public display network processingelement phonetic form puzzle grammar precondition header Parallel Inference Machine Present Illness Program picture element plausibility programrning logic arcay Programming Language for Natural-Language Processing parallel-marker-propagation machine Prenex Normal Form program operation mode Lnil of phrase; ascent to the upper network prepositional phrase; Picture Producer preposition Conferenceon Pattern Recognition and Im' age Processing pronoun problem reduction representation production sYstem phrase-structure grammar Personal Sequential Inference portable Standard LisP patient-specific model procedural semantic network proper analYsis of a tree point in time Pulmonary Function Program Plot Unit GraPh Generation principal variation search
NP-complete NP-hard
PMPM PNF POM POP PP PREP PRIP PRO PRR PS PSG PSI PSL PSM PSN Pt T''TIM PUFP PUGG PVS Q/A QCPE
Qo qv
question answering Quantum Chemistry Program Exchange Quasi-optimizer quod, uid'e,which see (o cross reference)
AND ACRONYMS ABBREVIATIONS RAM RAND RBS RCC R&D REL rf r-f RHS RIA RISC ROM RPS RPV RS RT s S SAINT SAM SAT SC SCA SCARA SCI SCS SD SDL SECS SEG SIAP SIGART SIGMOD SIGPLAN SIMD SIMULA SIPE SIR SL SLS SME S-MOP SNA SNePS SNF SPD SPE SPIE S-R SR SRI SRL STRIPS
random-accessmemory research and development rule-based system remote center compliance device research and development Rapidly Extensible Language radio frequency (noun) radio-frequency (adj.; right-hand side (or system) Robot Institute of America; Robotic Indus' tries Association Reduced-Instruction-SetComputer read-only memory robot-programming system remotely piloted vehicle restriction set related term second sentence;specific Symbolic Automatic Integration Script-Applier Mechanism symmetric axis transform situation calculus; schemaof problem conditions sensor-controlled automation selective compliance-assemblyrobot arm Strategic Computing Initiative Societyfor Computer Simulation structural description Sense-discrimination Langu age Simulation and Evaluation of Chemical Synthesis Sequenceof Events Generator Surveillance Integration Automation Project Special Interest Group (of the ACM) on AI Special Interest Group (of the ACM) on Management of Data Special Interest Group (of the ACM) oru Programming Languages single-instruction multiple data Simulation Language System for Interactive Planning and Execution Semantic Information Retrieval support list; Linear resolution with Selection function smoothed local symmetries Society of Manufacturing Engineers Simple Memory Organization Packet Systems Network Architecture Semantic Network ProcessingSystem Skolem Normal Form spectral power distribution small processingelement Society of Photo-Optical Instrumentation Engineers stimulus-response short range Stanford Research Institute Semantic Representation Language System for Theorem Proving in Problem Solving
xxi
SUS SWM SYNCHEM SYNTHEX
speech-understanding system Shapiro, Wand, and Martins Synthetic Chemistry System Systern Synthesis Expert
t T TAG TATR TAU TCP/IP
tree temperature tree-adjoining grammar Tactical Air Targeting Thematic Abstraction Unit transmission control protocol/Internet protocol top-down induction of decision trees a telecommunications network tree grammar; transformation grammar Workshop on Theoretical Issuesin Natura'lLanguage Processing Teachable Language Comprehender threshold logic unit truth-maintenance system test program set transformational question answering temporal reasoning transitive verb
TDIDT TELENET TG TINLAP TLC TLU TMS TPS TQA TR TV
UPI UR USE USPS uv
user-interface management system ultra-large-scale integration (seebottom of page) United Press International unit resulting preferred term United States Postal Service ultraviolet
V VC VDU VLR VLSI VP V/R VT
verb; volume Virtual Copy visual display unit very long range very large-scale integration verb phrase Valve Restriction verb transitive
WEP wff wfp WM WORM WTA WYSIWYG
word expert pars(ing, er) well-forrned formula well-formed proposition working memory write once, red many times Winner Take All What You See Is What You Get
X XG
noun, verb, or prepositional phrase extraposition grammar
UIMS ULSI
Fifth Generation Computers (see Computer systems; Logic programming)-the computer technology of the next decade First Second Third Fourth
vacuum-tube-based transistor-based IC-based microprocessors(LIS and VLSD, up to two million transistors per chip
Fif'fh Lr,F#Hh:l3j:tffix{t#3i*
OF ENCYCLOPEDIA I NTELLIGENCE ARTIFICIAL VOLUME1
A- ALGORITHM Problem-solving (qv) approachesusually are either purely formal (e.g., dynamic programming) and therefore neglect available data that doesnot fit the chosenmathematical framework or purely heuristic (e.g., GPS) (see Heuristics) and therefore cannot be proven to be generally valid. People who use automated problem-solving techniques often have to modify results derived by formal methods,thereby losing precision,in order to take advantage of additional "informal" sources of knowledge. The A. Algorithm introduced in 1968 (1) provides an innovative way to embed heuristic knowledge directly into a formal mathematical search process. A* is a procedure for analyzing graphs, a type of formal model. However, in addition to processinginformation in the graph itself, A* prescribes how to use additional knowledge about the problem situation from which the graph was derived. As a result, A. often uses far less computational effort than traditional algorithms that achieve the same results.
and the arcs railways; or the nodes may be positions in a game (seeGame trees) and the arcs the legal moves;and so on. Many problems can be posedin the following general form: Given a Saph, find a preferred solution path-and do so with a minimum amount of computational effort.
IntroducingAdditional Knowledge In addition to the nodes, arcs, and costs that comprise a conventional graph, A* usesone more kind of data: a number nf") associatedwith each node that is an estimate of a lower bound on the cost of getting from that node to a goal node. If the nodesrepresent cities and the arc costsare railroad miles , h(n) might be airline distance from city n to the goal city; if the ttod"r are puzzleposition , fr,(n) might be the minimum number of moves before the pazzle can possibly be solved-for example, h(Start) for a tic-tac-toe game is 3. These estimates are usually based on logical or physical knowledge that is not otherwise represented in the graph.
The Classof ProblemsAddressed A* is an algorithm for finding a path in a graph (a network of nodes connectedby arcs). Each node in the graph may have any number of successornodes, indicated by directed arcs drawn from the node to its successors.Each arc has a number associatedwith it that represents the cost of traversing that arc. A path is a sequenceof connectednodes.A solution path is any path whose first node is a designated Start node and whose last node is one of a designated set of Goal nodes.The cost of a path is the sum of the costs of the arcs in the path. A preferred path is a path with the lowest possiblecost of getting from its first to its last node. Figure 1 shows a graph with many solution paths; for example, (Start, n2, n4, G1) is a solution path whose cost ts 24. For this graph the preferred solution path is (Start, n1, n3, G1), whose cost is 9. This kind of formal model may be used in a variety of situations. For example, the nodes of a graph may represent cities
The A- Algorithm The following is a simplified statement of the A. Algorithm. (SeeRef. 1 for a more precise statement.) 1. Let g(n) represent the cost of a preferred path from Start to node n, and set g(Start) - 0. Let OPEN be a list of nodes that initially contains only the Start node. Calculate the estimate h(Start). 2. Select the node N on the list OPEN for which the quantity tS(N) + h(N)l is smallest. If N is a goal node,the path to ^l/ is a preferred solution path, and its cost itg(N). If there are no OPEN nodes, there is no solution path in the graph. 3. Remove N from OPEN. Find all the successorsof N, and add them to OPEN. For each successor,S,let g(S) - g(N) + (cost on arc from N to S). Calculate h(S). 4. Go to step 2.
Propertiesof A. It can be shown that A* has the following properties: 1. Let h(n) be the actual cost of a preferred path from n to a goal node.If h(n) < h(n) for all rr, th"n A.ls guaranteedto find a preferred solution path if one exists. 2. A* is the best possible algorithm in the sensethat no other algorithm with accessto the same amount of "additional knowledge" can do any less work than A* and still be sure of finding a preferred solution.
FinalComments
Figure 1. Graph example.
Reference 2 has more precise (and complex) explanations of the properties of A* as well as recent extensions and refinements of this approach. An application of A. to critical path scheduling is discussedin Ref. 3.
2
ACTOR FORMALISMS
BTBLIOGRAPHY 1. p. E. Hart, N. J. Nilsson, and B. Raphael, "A formal basis for the heuristic determination of minimum cost paths", IEEE Trans. Sysf. Sci. Cybern SSC-4(2),100-107 (1968). 2. J. Pearl, Heuristics,Addison-wesley, Reading, MA, 1984. g. R. Marcus, "An application of artificial intelligence to operations research", Contmun. ACM 27(L0), I044-L052 (1984)' B. RIpHAEL Hewlett-Packard
ACOUSTIC ANALYSIS. See Speech understanding.
ACTORFORMALISMS Actors provide a conceptual basis for the development and continola research of open systems-i.e., open-ended,continuously evolving systerns (1,2). Dealing with issues surrounding open systems is important to progress in the fields of AI database theory, which have so far been based on the "rl closed-worldassumPtion. Actors: Definition The actor model is a paradigm of concurrent computation in open systems. Actors unify functional programming and object-oriented programming. An actor carries out its computaiio' only in response to accepting a communication that can cause it to take the following primitive actions: Send communications to other actors' Create new actors. specify a replacement actor to processthe next communication. All actions specifiedby an actor's behavior are carried out use concurrently. In particurar, computation is speededup to The process: available resonr.L, by pipelining the replacement the new actor may accept the next communication even as other creating or communications sending actor it repl".L* is actors. The Nature of Actor Communication.Actors use buffered, asynchronous communication. Each actor has a mail address tfrat may be freely communicated to other actors resulting in a dynamic interconnection network topology on actors. To send an actor a communication, its addressmust be specifiedas the target of the communication. The mail system guarantees delivery of pending communications after a finite, but arbitrary delay. Control structures (qv) are viewed as patterns of message passing (g) using the dynamic creation of actors called customers(seeControl structures). Customersprovide a paralIel analog to continuations. Transactions.In transactional terms communications can be either requests or responses. Each request eventually results in a unique responseand the pair is considereda transaction. A request may activate several requests, which has the
result that transactions are often nested. Transactions provide a high-level view of events since one can view the transactions at successively finer levels of gfanularity. Important debugging tools for large actor systemshave been basedon an analysis of the transactional structure. Theoreticallssues The actor model incorporates the laws of parallel processing (4) and, in contrast to models such as Petri nets and data flow, accountsfor the causal effects of a computation on the dynamic structure of the system (5,6). The actor model also addresses problems of distributed computing (see Distributed problem solving), such as mutual exclusion, divergence, and deadlock (7). Guaranteeof Mail Delivery. The guarantee of mail delivery provides a form of fairness. An important consequenceof the guarantee is that potentiatly infinite processescan nevertheless be made to halt. This can be useful for halting processes (for maintenance and upgradi*g) that may otherwise function for an arbitrarily long period of time. Examples of such processesinclude operating systems and servers (7). Abstraction and Compositionality.An actor system is anaIyzed in terms of transitions between configurations from some ,ri"*point. In order to build large systems,one must be able to program and composeindependent modules. Messagepassing is used to achieve the parallel composition of independent ac' tor systems. Information hiding (abstraction) in independent modules is essential so that, upon composition, aII internal actions in the systems composedneed not be considered.Such abstraction is achieved by defining a set of receptionist actors (which can accept communications from outside the configuration). When actors in one system receive the mail addressesof the receptionists in another, the two systems are in effect composed(7). lmplementationsand Applications Actors are best suited to programming intelligent systems. The Apiary (8,9) has been developedby the Message-Passing Semantics Group at the Artificial Intelligence Laboratory at Massachusetts Institute of Technology. The Apiary network architecture supports dynamic resource management using such techniq,r"r as load balancing and real-time garb age collection (10). Several high-level actor languages (11) have also been developed to facilitate further research in human interaction with actor communities. Act3 (12) is the latest of these' embodying basic control structures, resource management tools, and a descriPtion sYstem (13,L4). probably alt attempted axiomatizations of large real systems necessarily contain conflicting information and contradictory beliefs. It follows that in the context of real-world systerns, logical proof is an inadequate tool for reasoning 1qv); instead due processreasoning involving different sides of beliefs, goals, and hypothesis needsto be used.Actors serve as an ideal tool for modeling real systemssince they do not impose a priori consistency requirements and can therefore accommodate distinct viewPoints (15).
AGENDA-BASED SYSTEMS BIBLIOGRAPHY 1. C. Hewitt and P. de Jong, Analyzing the Rolesof Descriptions and Actions in Open Systems,Proceedingsof the Third National Conferenceon Artificial Intettigence, AAAI, washington, DC, August 1983.pp. 162-L67. 2. C. Hewitt and H. Lieberman, Design Issuesin Parallel Architectures for Artificial Intelligence, A.I. Memo 750, MIT Artificial Intelligence Laboratory, 1988. 3. C. E. Hewitt, Viewing control structures as patterns of passing messages,J. A/ 8(g), g2g-964 (June Ig77). 4. C. Hewitt and H. Baker, Laws for Communicating Parallel processes,1977 IFIP CongressProceedings,IFIP, August Lg77, pp. 987-992. 5. w. D. clinger, Foundations of Actor semantics,AI-TR- 6gg, MIT Artificial Intelligence Laboratory, May 1991. 6' G. Agha, Actors.'A Model of Concurrent Computation in Distributed Systems,M.I.T. press, Cambridg", MA, 1gg7. 7' G- Agh a, Actors: A Model of Concurrent Computation in Distributed Systems,M.I.T. Press, Cambridg", MA, 19g6. 8' H. Lieberman, An Object-OrientedSimulator for the Apisry, proceedingsof the Third AAAI conference, AAAI, washington, DC, August, 1983. pp. 24L-246. 9. C. E. Hewitt, The Apiary Network Architecture for Knowledgeable Systems, Conference Record of the 1980 Lisp Conference, stanford university, Stanford, cA, August 1gg0, pp. roz_11g. 10. H. Lieberman and C. Hewitt, A real time garbage collector based on the lifetimes of objects,CACM 26(6), 4Lg-429 (June lggg). 11. D. Theriault, Issues in the Design and Implementation of Aet 2, Technical Report 728, MIT Artificial Inieiligence Laborarory, June 1983. t2. c. Hewitt, T. Reinhardt, G. Agha, and G. Attardi, Linguistic support of Receptionists for Shared Resources,Proceed.ings of the NsF/sERc seminar on concurretuc!, LNCS springer-verlag, New York, BBO-S59,1984. 13' G' R. Barber, Office Semantics,Ph.D. Thesis, Massachusetts Institute of Technolory, Lg8Z. L4. c. Hewitt, G. Attardi, and M. simi, Knowledge Embedding with a Description System, Proceedings of the Fiist National Annual Conferenceon Artificial Intelligence,American Associationfor Artificial Intelligence, stanford, cA, pp. 157 -164,1gg0. 15. c. Hewitt, The challenge of open systems, Byte l0(4), z2g_242 (April 198b).
3
Mccarthy and P. J. Hayes, "some philosophical problems from the Standpoint of Artificial Intelligence," Mach. Intell.4, 463-502 (1969)1. J. Gpu,nn SUNY at Buffalo ADVISORYSYSTEMS. See Expert systems.
AGENDA-BASED SYSTEMS An agenda, or job list, has become one of the most popular methods used in AI systemsto expresscontrol of the inference (qv) processbecauseit does so in explicit, modular steps. The agenda itself is a data structure whose entries are commonly called "tasks." Each task is some piece of work to be accomplished during the problem-solving (qv) process.The principal advantage of listing tasks explicitly orr.an agenda is that it allows the inference processto reason about the best sequence for pending tasks in order more intelligently to choosethe next task to be attempted. An agendatask entry may also'include a sourcefor the task (i.e., where did it comefrom?), a reason for placing the task on the agenda (i,e., whY should the system execute this task?), and a priority for executittg the task (i.e., how important is it to execute this task now?). Tasks may be placed on an agenda as a side effect of executing other tasks or may be placed there directly by some other means, for example, the action of a production rule (see Rule-based systems) might place new tasks on the agenda. Tasks are removed from agendaseither according to some algorithm based on the task ,.u*orrs or priorities, or the agenda can function simply as a stack, with the task that has been most recently plac"d-o.t the agenda being the first one removed. Al AgendaSystems
One of the earliest AI systemsto use an agendawas one of the DENDRAL (1) programs used to elucidate molecular structures. DEryRAL's agenda is used by the "predictor" to keep track of information about fragment ions waiting to be processedby a set of rules. Initially, the agenda .oirtuir6 C. Hswrrr only chemical representations of unfrag*.ttt d molecular ions. G. AcHe Massachusetts Institute of Technology Rules are applied to simulate fragmentation of these ions, and representations of the resulting fragmented ions are then added to the agenda. The use of an agenda allows a breadthfirst search behavior in which the primary ion fragmentations are analyzed first. A history of where each ion originated (how ADVICETAKER it came to be placed on the agenda) is saved and printed in a summary' but no interactive explanation is available (see Program proposed by J. McCarthy intended to show common Chemistry, AI in). senseand improvable behavior by using declarative and imIn AM (2), a system for generating mathematical discovperative sentencesas the representation, and immediate de- eries, an agenda was used principally to manage a huge task duction as the reasoning mechanism (seeJ. McCarthy, ,,proselection problem. The agenda permitted each o1*"rry"plausigrams with common sense," in Mechanization of iirought ble tasks to be evaluated prioi to selection and execution of Py2cegses, Her Majesty'gl!"_tionery office, London, pp. Ts--gl, that task having the most significant potential. Tasks were 1959; also reprinted in M. Minsky (ed.), Semanticliiormation selectedon the basis of a computed prio"ity. The reasons assoProcessing,MIT Press, cambridge, Massachusetts,pp. 408ciated with tasks in AM's agenda were useful in computing 4L0,1968). The sentencerepresentation suggestedfor the Adscores in order to determine the top-priority task. Figure 1 vice Taker is the forerunner of the Situationat Calculus lseeJ. shows a typical entry from AM,s "g"rrd"
4
PRUNING ALPHA.BETA
of equalityof lists. Task: Fill-in generalizations Priority: 850 Reasons: of equalityof 400: No knownnontrivial generalizations lists. 600: Equality of lists rarely returns true on random examples. 200: Focusof attention:AM recentlyworkedon the conceptof equalitYof lists.
Recommendations
Control in systems in which explanation, Iook-ahead, or any of the other features just described are not needed may be just as well represented implicitly within the inference engine. Using an agenda requires more memory (to represent the necessary data structures) and more processing time (to evaluate and choose tasks and to explain agenda activity and task choices to users). The choice of whether to use an agenda depends on the nature of the application and the requirements for explaining entrY. Figure 1. SampleAM agenda control processes to the user. For example, agendas are recommended for applications that have a large task selection probThe agenda used in the CENTAUR system (3) for proto- lem as in AM or for those that perform a breadth-first exploratype-directed reasoning was designedto allow an easily acces- tion of the solution space as in DENDRAL. They are also rilt" and explicit representation of control steps.Each control recommended for systems like CENTAUR, which attempt to step in CENTAUR is placedon the agendaas a task sothat the explain control stePs to the user.
system can reason about all tasks remaining to be executed. Tasks are executedin a LIFO order and are removed only as a result of executing them. The sourcesand reasonsof a task are BIBLIOGRAPHY defined for purposes of understanding system performance, and an inteiactive explanation (qv) facility is provided that 1. R. Lind$3y, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, prints a description of the task being executedand the reason DENDRAL, McGraw-Hill, New York, 1980' lor choosing that task upon user request. Figure 2 shows a Z. D. B. Lenat, AM: An Artificiat IntelligenceApproach to Discoueryin sample task from CENTAUR's agenda. Mathematics as Heuristic Search, STAN-CS-76-570 (AIM-286), The agendasin GUS (4) and KRL (5) are used as part of the Stanford UniversitY, JulY 1976. central control processbut not to explain reasonitg, as is done B. J. S. Aikins, Prototypesand Production Rules: A KnowledgeRepre' in CENTAUR. In GUS the agenda is used to decide what sentation for Computer Consultations, STAN-CS-80-814(HPP-80processes on potential puts system The I7), Stanford University, August 1980' should be done next. it examines which in cycle in a operates then and the agenda 4. D. G. Bobrow, R. Kaplan, M. Kay, D. Norman' H. Thompson'and T' Winograd, "GIJS, a frame-driven Dialog System,"Artif. Intell. S(2)' this agenda, choosesthe next job to be done, and then doesit. all queues, with of list L55-r73, 1977. In KRL the agenda is a priority-ordered process on any before run queue priority b. D. G. Bobrow and T. Winograd, "An overview of KRL, a knowledge processeson a higher representationlanguage, cog. sci. 1(1), 3-46 $977). queue. prioritY a lower J. S. AmINs Aion Corporation
Benefitsof AgendaScheme As is illustrated in the systemsjust discussed,agendasmay be used for many different reasons. One motivation for placing tasks on an agenda is that it allows a system to "look ahead" and seewhat tasks are remaining to be executedand to reason about those tasks. This feature is used in CENTAUR, for example, to help decide which production rules can be usefully applied to solve pending control tasks. Thus, the state of the agenda at any time shows exactly which tasks remain to be executed.A secondadvantage of having an agenda is that it provides a means for printing an ongoing record of the sysi.*,, tasks and the reasons for considering them, which is done,for example, in both DENDRAL and CENTAUR. A third advantage of agendas is that they force key steps in the system's execution to be defined as single tasks, resulting in a steps highly modular, cleanly structured system.Isolating key in the execution of a system in turn makes it easier to explain what the system is doing at any time. This explanation feature is used .*i.rrrively in CENTAUR, which must interact with medically expert users to solve actual human pathology problems, in which the consequencesof mistakes are important.
PRUNING ALPHA.BETA
The alpha-beta procedure is equivalent to the depth-first minimax procedure (qr) in the sense that each choosesthe same movsas the other when given the same top position, termination criteria, and evaluation function. The alpha-beta procedure prunes subtrees off the search tree by knowing that they cannot lead to a good solution (seeGame trees). An algorithm for the alpha-beta procedure is given in Figure 1' To understand the alpha-beta procedure,it is necessaryto understand the depth-first minimax procedure, which is a search procedure that combines an evaluation function, a depth-first generation procedure (see Search, depth-first), and the minimu* bu.king-up procedure to search for a good move (see in a two-person adversary game like checkers or chess 2 Game playing). suppose that the game tree of Figure had been gin6 implicitly. A binary tree has been given for simplicity, but u [r." of any branching factor applies here. The gendepth-first minimax procedure starts with position P and depth(see Fig.3a).The P111 and P11, Pl, positions erates list. Task: Orderthe hYPothesis first minimax procedure then uses its evaluation function to source: Task addingnew prototypesto the hypothesislist. p111, which has a value of 4.rt then generReason: Because new prototypes have been added to the hypothesis evaluate position pLLz aceording is ordered it gets that the value 1. The better value of P111 and to see and ates list, it should be checked pt11,namel y , 4, is backed up to P11-.The proceduregenerates to which prototype best fits the facts' pIZ. It generates and evaluates P121 and likewisePL22 (see Figure 2. Sample CENTAUR agenda entry'
ALPHA.BETAPRUNINC
exceedalpha, it makes an alpha cutoff; that is, the procedure doesnot bother to generate more successorsof the predecessor of the max position but generates the next successorof the predecessorof the predecessorof the max position. Thus, after obtaining v2L : 2, the procedure finds an alpha cutoff; that is, the procedure does not bother to generate positions P22, P23, P24, . . . (and their successors)but generates next P3. Since rTLi: A,; minimax backing-up procedureis being used and v2L : 2, the for i :- 1 to d do the depth-first minimax procedure would obtain a backed-up begin - 4. -Alphabeta(p;, value of v2 that did not exceed2, which is worse than vl t :: B, m)i Without generating any more positions below P2, Max will not if t> mthenm:- t; chooseto move to P2. This shows that the further generation if m>- B thengotodone end; below P2 performed by the depth-first minimax procedure is a done:Alphabeta:: rrl waste of time. Similarly, in Figure 5, after obtaining v1l : 4, end the alpha-beta procedure sets the parameter B - 4 at P12. In end; general, when the procedure finds that the (possibly backedup) value of a min position is greater than or equal to beta, it Figure 1. Algorithm for the alpha-betaprocedure. makes a beta cutoff; it does not bother to generate more successorsof the predecessorof the min position but generatesthe Fig. 3b). To PL2, it backsup the better value of P121 and Pt22, next successorof the predecessorof the predecessorof the min namely, 8. It then backs up to P1 the better of Pll and Plz, position. Thus, after obtaining vl21 - 8, the proceduremakes which is also 4. It generates P2 and PzL It generates and a beta cutoff; that is, the procedure doesnot bother to generate evaluates P211 and P2L2 (see FiS. 3c). It continues until it PL22, PL23, PL24, . . . but generates P13 next. obtains the result shown in Figure 3 d. It choosesto move to A uniform game tree of depth d and branching factor (qv) b position P1, which has a higher value than that of position P2. contains bd leaf nodes,all of which would be examined by the The alpha-beta procedure almost always choosesits move ordinary minimax procedure.The number of nodesthat must after it has generated only a small fraction of the tree that be generated and evaluated in the best-caseperformance of would be generatedby the equivalent depth-first minimax pro- the alpha-beta procedure (when the tree is perfectly ordered) is - 1 for d odd. In the cedure when choosing the identical move. Thus, the alpha- Zbdn - 1 for d even, and $@+Lt2) a b@-L)tz beta procedurecan save a great deal of time in the search.The worst-case performance, alpha-beta examines all the leaf two very simple examples given below show why the alpha- nodes.The expectednumber of terminal nodes examined in a beta procedure chooses the same move as the equivalent random totally dependentuniform game tree is approximately depth-first minimax procedure although generating fewer moves. The alpha-beta procedure is used in the tree at levels - Hd*t - Hfi + Ht, * Hubtdtzi ? lrZ 16wrz) deeper than 2 or 3. The reader is urged to think about such b - H'u' extensions. In Figure 4 the alpha-beta procedure has already * lU obtained the (possibly backed-up) value vl _ 4. The alpha- whereHu :1 * !, * least parameter which is the 4 P2, e at procedure a sets beta For information on the historical development of the alphathat Max (square,the maximizing player) can be held to; beta is the most that Min (circle, the minimizrng player) can be beta procedure and an excellent analysis, see Ref. 3. For a held to. When the procedure finds that the (possibly backed- branching factor analysis, seeRef. 4. For an evaluation of the up) value of a max position (Max's turn to move) does not overall effectiveness of the procedure, see Ref. 1.
function Alphabeta (p: position; a: integer; B: integer): integer; var ffi, i, t, d: integer; begin determine the successorpositions Pt,' ' ' , Pa of position p; if d - 0 then Alphabeta :: f(p) else begin
Figure 2. Game tree with minimax backed-upvalues.
PRUNING ALPHA.BETA
(b)
Figure 3. Depth-first minimax procedure'
Figure 4. Alpha-beta procedure finds an alpha cutoff'
Figgre 5. Alpha-beta procedure finds a beta cutoff.
AND/ORGRAPHS BIBLIOGRAPHY
7
Applications
AND/OR graphs are typically used to represent the record of search (qv) in a problem-solving system that attempts to find a solution by a problem reduction (qv) method (2). Generally, AND/OR graphs are suited to problems in which the final solution is conveniently represented as a tree or a graph rather than an ordered sequenceof actions (3). Stratery-seeking tasks are typical examples of this class of problems,where the AND links represent changes in the problem situation causedby external, uncontrolled conditions, and the OR links represent alternative ways of reacting to such changes. In planning, the uncontrolled conditions could be the possible J. Sr,acr,r Universityof Minnesota outcomesof an uncertain event or the results of a given test. In games those conditions are created by the legal moves available to the adversary. In program synthesis they consist of the results of applying certain computations to unspecifieddata. AM Another important class of problems suitable for AND/OR A knowledge-basedsystem that conjectures interesting con- graph representations includes cases in which the solution cepts in elementary mathematics, written in LgTGby D. Lenat required is an unordered set of actions. In symbolic integration at the Stanford Heuristic Programming Project. AM demon- (4), for example, certain legal transformations (e.g., integrastrates that some aspects of creative research can be effec- tion by parts, long division of polynomials, etc.) sptit the intetively modeledas heuristic search(seeD. B. Lenat, "AM: Dis- grand into sums of expressionsto be integrated separately in covery in Mathematics as Heuristic Search," in R. Davis and any order. The set of applicable transformations will be repreD. B. Lenat (ed.),Knowledge-BasedSystemsin Artificial Intelsented as OR links emanating from the node representingthe ligence,McGraw-Hill, New York, 1980,pp. g-228). integrand, and the AND links represent individual summands within the integrand, all of which must eventually be integrated. M. R. Tam SUNY at Buffalo The tasks of logical reasoning (seeReasonirg) and theorem proving (qv) also give rise to AND/OR structures (5). One begins with a set of axioms and a set of inference rules that ANALOGIES.See Learning. allows, in each step, deduction of a new statement from a subset of axioms and previously deducedstatements. The new statement is added to the database,and the processcontinues AND/ORGRAPHS until the desired conclusion (e.g.,the theorem) is derived. The solution object pursued by the search is a plan specifying in each step which of the inference rules is to be applied to which Definition subset of statements in the database and whal the deduced An AND/OR graph is an explicit representation of the relastatement is. This plan, again, is best structured as an unortionship between all situations and options that may be en- dered tree becausewhen a certain conclusionis derived from a counteredin the solution of decomposableproblems (1), that is, given subset of statements, the internal order in which these those made up of independently soluble constituents (seeFiS. statements were themselves derived is of no consequenceas 1). The nodes in an AND/OR graph represent subproblemsto long as they reside in the database at the appropriate time. be solved or subgoals to be achieved, with the top node (ro) Thus, the solution structure is a tree, and hence,the approprirepresenting the specification of the overall problem. The ate search space would be an AND/OR graph (6). gacxward nodes are connectedby two types of directed links: OR links reasoning from a theorem to a set of axioms involves a search that represent alternative options of handling the problem spaceof identical structure. node from which they emanate and AND links that connect a AND/OR graphs are also suitable for representing probparent problem node to the individual subproblemsof which it lems in which the solution sought is an ordered ,rqrr!*e of is composed.All these subproblemsmust be solved before the actions, 8s long as the searchfor somesubsequences that make parent problem is considered solved, thus, the AND links up the solution can be conducted in any order. A classical pointing toward these subproblems are normally shown eon- example is the Tower-of-Hanoi puzzle (3) where the three nectedby arcs, &s in Figure 1. A terminal node(Fig. ra, n1, rLZ, main subgoals,that is, clearing the largest disk, moving that and n) (having no successors)in an AND/OR graph repre- disk to a given Peg, and placing the other disks on top of tfr" sents either a primitive problem, whose solution is readily largest one, must be executedin a certain order but th; search available, or a subproblem that cannot be decomposedany for their solutions can be conductedin any order. further. The former is labeled solved, the latter unsolvable. A complete solution is represented by an AND/OR subgraph, called a solution graph (seeFig. Lb,c), having the fol- SearchingAND/OR Graphs lowing properties:it containsthe top node (ns), all its terminal AND/OR gtaphs lend themselves to systematic search methnodesare SOLVED, and if it contains an AND link L, it must ods such as backtrackitg, depth-first, breadth-first, and varialso contain the entire group of AND links that are sibtings of ous forms of heuristic best-first algorithms (1,g). The basic L (e.g.,see links L and L' in Fig. Ib). rationale behind heuristic search methods is that the exami1. J. Slagle and J. Dixon, "Experiments with some programs that searchgame trees," JACM,lG(2), L8g-207 (April 1969). 2. J. Slagle and J. Dixon, "Experiments with the m & n Tree-searching Program," CACM,13(3), L47-1b4 (March 1920). 3. D. E. Knuth and R. N. Moore, "An analysis of alpha-betapruning," Artif. Intell. 6, 293-328 (192b). 4. J. Pearl, "The solution for the branching factor of the alpha-beta pruning algorithm and its optimality," CACM z5(B),s59-b64 (August 1982).
ART, AL IN
(c)
(b)
(a)
Figure l. (o) AND/OR graph and (6,c) two of its solution graphs. Solved terminal nodes are marked as black dots.
nation of alternative solution candidates (OR links) should start with the candidate most likely to succeed, although the examination of subgoals within each candidate (AND links) should begin with the one most likely to fail. Heuristic estimates of these likelihoods are often used to guide the search so that a solution graph can be found after exploring only a small portion of the AND/OR graph that underlies a given problem. tft" algorithm AO*, for example, estimates the costs of the solution gtuphs rooted at the various candidate nodes and is guaranteed to find a cheapest solution if all cost estimates are optimistic (Z). Game-playing (qr) strategies, on the other hlnd, are guided by estimating the strengths of positions a few moves ahead and norrnally employ backtracking (qv) search (3). with irrevocable pruning (see Alpha-beta pruning)
Art, AI in AARON is in reality a family of programs that now dates back more than 10 years. One of the earlier versions of the is describedin the literature by its inventor (1). program AeRON is distinguished for its autonomous creative behavior: it makes original drawings but has neither drawings nor parts of drawings stored in it, and it neither requires nor input to account for the differences between its draw"r."pts ings (see Fig. 1). Not a single image was made by hand in developing the program, and there is no prototypical image from which others are derived by permutation. when a drawing machine is in use the user tells the program the paper size, then simply tells it to start: all subsequentdecisionsare under the program's control, up to and including the decisionto stop.
BIBLIOGRAPHY 1. N. Nilsson , Prinniptes of Artificial Intelligence, Tioga, Palo Alto, cA, 1.980. Z. N. Nillson, Problern Soluing Methods in. Artificial Intelligence, McGraw-Hill, New York, 1971. g. J. Pearl, Heuristics: Intelligent Search Strategies fo, Computer Problem soluing, Addison-wesl"y, ReadinS,MA, 1984. 4. J. Moses,"symbolic integration: The stormy decade,"CACM, 14(8), 548-560 (1971). b. G. J. vanderBrug and J. Minker, "State space,problem-reduction, and theorem-pto',rittg-some relationships," CACM 18(2)' 107-115 (1975). 6. C. L. Chang and J. R. Slagle, "An admissibteand optimal algorithm (197L)for searching AND/OR gpaphs," Artif. Intell. 2, Lt7-128 AND/ 7. A. Bagchi and A. Mahanti, "Admissible heuristic search in (1983). oR graphs," Theoretical Comp. Sci. 24(2),207-2L9 J. Pnenl UCLA
Figure 1
ARTIFICIALINTELLIGENCE
AARON is knowledge based, its knowledge falling principally into two categories. The first concerns the things it is drawing. The version of AARON (1985) knows, in declarative terms, about the human body: the sizesand connectivity of its parts and the range of relative movement of which the parts are capable. (If the subject's arm is raised above the head, for example, which side of the hand doesthe viewer see?)AARON also knows the principles by which the parts of the body move in accord with each other: When standing on one foot, for example, an arm is used to balance an extended leg. This knowledge is in procedural form embodying several hundred rules. These two sources of knowledge alone would only make possiblethe construction of simple stick figures; but the artistic sophistication of AARON's drawings requires knowledge not only of what it is drawing but also of how to draw. This knowledge is not object specific.AARON is capableof drawing anything that can be given an object-specificrepresentation like that of the figure. The object-specificknowledge is usedby AARON to construct, in "imagination," a core figure that it then "fleshes-out" into the drawing as the viewer seesit.
History
For centuries people have been fascinated by intelligence and with attempts to mechantze tt. Many of these attempts predate even the idea of the digital computer. It is interesting to note that Ada Augusta, CountessLovelace (the 27 -year-olddaughter of Lord Byron), in her notes in 1842 on Charles Babbage's earlier lectures, pointed out that Babbage's machine, if it could be built, would be capableof processingnot just numbers but anything that could be reduced to a set of symbols (1). The modern scienceof AI began seriously in the 1950s.In the summer of 1956, at a summer conferenceat Dartmouth, 10 of the peoplewho would lead the field in its early years assembled to consider the new subject. During the 1950ssomeof the first AI programs, which played chess and proved theoreffis, were written. By that time, Alan Turing had already proposed the Turing Test (qv) (2) as a method for deciding whether machines can think. Essentially, a machine will pass the Turing test if it can fool a human observer into believing he or she is communicating with another person. (The test is to be conductedover a teletype line so appearance,voice, etc., cannot influence the outcome.)The Turing test is an interesting idea, but unfortunately, it is too simplistic to be a good measure of the technical progress of AI. Finding better measures BIBLIOGRAPHY continues to be a difficult problem facing AI research. Starting in the early sixties, several efforts emerged to 1. H. Cohen,What Is an Image?Proceedingsof the Sixth International programs to do a variety of tasks, primarily oneswe now build Joint Conferenceon Artificial Intelligence,Tokyo, Japan, pp. 1028call commonsensetasks. Among these efforts were GPS [the 1 0 5 7 ,1 9 7 9 . General Problem Solver, which was built by Allen Newell, J. C. Shaw, and Herbert Simon at Carnegie-Mellon University H. CoHnr.r (3)1,STRIPS [a general problem solver for the SHAKEY robot University of California, San Diego at the Stanford Research Institute (4)1, chess and checkers programs and machine translation (qv) systems designed to INTELLICENCE translate text from one natural langu age into another. None ARTIFICIAL of these efforts was particularly successfulif successis meant AI is the study of ways in which computers can be made to to indicate the construction of programs that solvedsignificant perform cognitive tasks, at which, at present, people are bet- problems. The machine translation effort was explicitly termiter. Examples of problems that fall under the eagis of AI in- nated becauseit appeared to be attacking an impossible probclude commonsensetasks, such as understanding English, rec- lem. But if the notion of successfulincludes contributions to ognizing scenes,finding a way to reach an object that is far the understanding of the processesunderlying intelligent overhead,heavy, and fragile, and making senseof the plot of a problem solvirg, more credit must be given to these early efmystery novel. In addition, AI includes expert tasks, such as forts becauseout of them came two important ideas. The first diagnosing diseases, designing computer systems, locating is an understanding of the role of search in problem solvitg; mineral deposits, and planning scientific experiments. The and the secondis an appreciation of the key role played by techniques that AI applies to solving these problems are repre- knowledge in controlling the search. This secondidea arose sentation and inference methods for handling the relevant directly out of these early systems'failure to perform well. The knowledge and search-basedproblem-solving methods for ex- reason that they could not solve any very difficult problems is ploiting that knowledge. Although the tasks with which AI is now understoodto be their lack of knowledge. In other words, concernedmay seem to form a very heterogeneousset, they each of these programs exploited a general problem-solving are, in fact, related through their common reliance on tech- mechanism, but that mechanism had accessonly to a very niques for manipulating knowledge and conducting search. small amount of knowledge about the domain in which it Both the scientific and the engineering disciplines of AI are worked. For example, the early machine translation systems based on the physical symbol system hypothesis,which states had only the basic syntactic rules of the languages and wordthat the processesthat are required to produce intelligent for-word translation dictionaries. But they had no knowledge action can be simulated with a collection of physical symbols of the content of the text that was to be translated. and a set of mechanisms that produce a series, over time, of Game-playing (qv) prograffis, particularly for chess(5) and structures built from those symbols. The digital computer checkers(6), received a great deal of interest during this early serves as the tool with which the symbol structures are built period. There were two reasons (besidessimply the interest and manipulated. In an AI program symbol structures are many people have in games) that games were one of the first used to represent both general knowledge about a problem domains in which AI was applied. One is that games are undomain (such as understanding newspaperstories or perform- Iike many other domains, in which it is difficult to measure ing medical diagnosis) and specificknowledge about the solu- accurately the successof a problem-solving program or to comtion to the current problem. pare the relative performance of two programs. In the games
10
INTELLIGENCE ARTIFICIAL
research laboratories, where people are attempting to find domain all that is necessary is to play the programs either unsolved probleffis, particularly against human opponents or against each other and see who long-term answers to many knowledge and large, unre*ittr. The secondreason gamesappearedan attractive domain ot.i that involve commonsense same time many developthe At domains. problem is that, becausethe rules for games are quite simple, it seemed stricted an attempt to apply the in underway now proare a efforts mental that the knowledge problem could be avoided. Perhaps to the problems discovered been already have game that could the techniques of rules gram whose only knowledge was the can provide adequate near-term Ielect good moves simply by search through a tree of possible for which those techniques mc)vesequencesto find one that would cause the program to solutions. From the inception of AI in the fifties two different goals win. Although in principle this may be true, in practice, it fails the peoplewho built the programs and the theories. sequences move motivated of to work becauseof the enormous number that desire to understand better the way the human of the was estimate One one For example, that must be considered. mind. of course,goodhuman chessplayers mind works by building programs that modeled that number for chessis 3b100. could that programs building in interest an do not consider anywhere near that number of possibilities The other was peoby done be to had time, the at that, tasks set perform useful the becausethey exploit additional knowledge to constrain Over work. AI motivate to continue interests are these programs pf.. of Both chess current of moves they examine. Although ift. years the two have proved not so much to competeagainst better players than were their early predecessors(primarily to complement each other as the search becausemachines are now faster and so more possibilities can each other but rather lead to effective mechanical problem can that hubest for techniques be considered),they neverthelesscannot yet beat the the in continues. even So solving knowledge. man players with their additional caseof g"*es, knowledge must be added to search to produce successfulprograms. ProblemSolvingand Search By the mid-L9?0s the need for proglams to possessknowlAI programs are designedto solveproblems.But beforea probedge about their domains was becoming recognized. Terry be solved, ii must be precisely defined. A powerful, wlnograd,s SHRDLU (D accepted English commands and lem can general-purposeframework in which problem definitions can used them to manipulate and answer questions about a set of (seeSearch). constructed is the state spacesearchparadigm blocks drawn on a screen.Knowledge about the configuration be parts: the state A problem definition in this paradigm has fi.ve of the blocks was used to disambiguate the English inputs rules the (or goal) state(s), ,pur., the initial state(s),the final when necessary. But the blocks-world domain of SHRDLU to state one from (or transitions possible) define the legal was too trivial to place substantial semantic demands on any that the about advice provide that rules and ?acts the and program. MYCIN (8), however, was designed and imple- the next, most useful way to move from one state to another. For exam*.rrt"d by E. H. Shortliffe to do a more ambitious task, diagple, Figure 1 shows a sketch of a state spacesearch definition npsing and recommending treatment of infectious blood dis(seecomof the informal problem specification, "Play chess" eases.Since most people cannot do this task becausethey lack is instatement original the Although puter chess method). the appropriate knowledge, the need for knowledge in this win), to try to ought one that mention (e.g., to fails it complete .*p.rt domain was more obvious than it was in the commoninput the as pro- the lormal definition is complete enough to serve ,urr. domains of most earlier progIaffis, and the MYCIN deto a problem-solving program. when a problem has been base knowledge a gram consisted of both a code portion and the through searching by solved be can ii fined in this way, that was used by the code.The knowledge basecontained a set is state spaceuntii a path from an initial state to a goal state knowldiagnostic the described that rules of condition-action found. edgethat a human physician might possess.Although MYCIN Each of the rules that describetransitions from one state to it demonstrated lab, \ryasnever used outside of the research in another has two parts, a left side that describesthe states that tasks that had previously been done only by highly describes that side right a and apptied be mav rule which the trained human experts might be accomptishedby machines. the new state that will b" grtterated if the rule is applied. expert of technology current the toward step first Tlhis was the are The two largest parts of a state spaceproblem definition resystem (qv) development, in which programs that solve the of statement th9 figure, the in usually the last two shown siricted problems in a wide array of scientific, engineering, legal moves through the space and the additional heuristic albuilt being are arenas problem commercial, and military the information that describesthe most useful moves through most dailY. knowledge the of most space.These two componentscontain By the end of the seventiesAI was a thriving researcharea the problem-solving program has available to it. sometirat laboratories. industrial in academia and in a small number of is clear, in the times, th; distinction between these two components Ijut it had had tittle impact outside of those labs. Then, the legal in which playing, chess of case the in example, for as, early eighties, the first expert systemsthat performed complex heuristic knowlthe moves are a small, well,defined set, and the Probably appear. to began fashion cost-effective a in tasks of playing exlifetimes several devel- edge could in principle reflect most important of these was R1 (g) (now called xcoN), less clear, as, much is distinction the thoogh, perience. oft*rr, 'ped by iohn McDermott and his colleaguesat Carnegie-Melthe two which in diagnosis, medical of .ur" the in example, for lon university and Digital Equipment corporation. Rl solved set single a into combined (Digital Equipment knowledge componentsare usuaily the proble* of confi.guring orders for vAX initial an from moving of ways plausible require- of rules that define Corporation) computers to suit individual customer that account for produc- set of observations to new sets of hypotheses put into it was systems, ments. unlike .urli., expert a set of legal provide growth of those observations.When it is possibleto tion use. Since then there has been an explosive that guides knowledge moves separate from the heuristic interest in AI from many directions, both in private industry deknowledge heuristic the of importance in pure those rules, the and in the government. Now AI is stiu being pursued
ARTIFICIALINTELLIGENCE State space:{chess board configurations}
State transition rule: Vx, y, z (x < y) n 0 < z) + (x < z)
Initial state: Search graph: Black IP
I ^
Ag
vA ^
tg
I
l.
t. t. l. l. l. l. l.
+
A
l\
ffi H\
A
,+ 1 \
l(
-J*.
bc
L<2 2<3 3<4 <:>
8 AA8 8 v I ma }T ,#
I={
White
Goalstate:{positionsin which opponent's king hasbeencaptured} Rules:{legalmovesof chess}
Figure 2. A forward searchgraph.
Heuristicknowledge: ' Functionsthat evaluateboardpositionsfor likelihoodthey will leadto a win. . Standardopeningmoves. . Classicendgames. . Other tacticaland strategicknowledge. Figure l. A state-space problemdescription.
pends on the size of the state spacebeing searched.Heuristic knowledge is not necessaryfor trivial problems (such as playing tic-tac-toe) because these problems can be solved by exhaustively enumerating all the legal paths from an initial state and checking for one such path that reachesa goal state. But in the more complex domains with which AI is primarily concerned (such as design, diagnosis, or chess), heuristic knowledge plays a crucial role in making computerizedproblem solving feasible. fn fact, it is not really accurateto say that AI programs rely heavily on search. Instead, the core ofthese programs is heuristic search (seeHeuristics). State space search generates a graph, whose nodes represent states and whose arcs represent moves (actions) that causethe transitions from one state to the next. This graph is generated by a procedurethat operatesin cycles.At each cycle one node in the graph is selected.The rules are examined and those that apply to the selected node are identified. One or more of them is then applied, and new nodesare generated.If any of these new nodes represents a goal, the search process may terminate. otherwise, another cycle begins (seeA. algorithm). Sometimes the moves in a state space search represent physical actions, and the states represent physical states.This is the case in a chess-playing program in which the states represent configurations of a chessboard. For other programs the moves represent inferential or deductive procedlr*r, and the states represent states of knowledge. This is the case in
many reasoning (qv) programS,such as the simple arithmetic reasoner,for which a partial search graph is shown in Figure 2. The initial state in this example contains only the knowledge that each integer is less than its immediate successor. The only state transition rule that is being used representsthe fact that the less than relation is transitive. The asterisks in the state descriptions are a shorthand way of saying "and all the knowledge from the previous state." Some shorthand like this is important, by the way, not just for this figure, but also in programs since, for interesting problems, state descriptions may get very large and usually each one differs from th; next in only a small number of ways. The problem of figuring out what those ways are and representing them is callea tftu frame problem. Sometimes the states in a search program do not actually represent real-world states at all but instead they contain descriptions of partial problem solutions. In these programs the moves represent problem-solving actions that convert partial, sketchy solutions to complete ones. Some search procedures generate only trees (simplified graphs in which each node has only one parent node),bul most searchesdo not satisfy this constraint becauseit is often possible to generate a particular state in more than one way, as shown in the arithmetic example. A state spacesearch can proceedin either of two directions (or in a combination of the two). Forward reasoning begins with the initial state and generates new states until a goal state is reached.Left sides of rules are matched against nodes in the search graph. When a rule's left side is matched, its right side may be used to generate new nodes to add to the graph. Backward reasoning begins with a goal state and applies rules in reverse until an initial state is reached. The search graph of Figure 2 was generated by forward reasonirg, which is particularly useful when the goal state is not explicitly known. Forward reasoning is often used to generateconsequencesof observedfacts on the assumption that those consequencesmay be useful later. when a specific goal is explicitly known, however, back-
12
INTELLIGENCE ARTIFICIAL
ward reasoning is often more efficient, particularly when, as in the arithmetic example, the number of paths out of each node is very large but most of those paths do not lead to a goal. Backward reasoning is often called goal-directedreasoning.It works by finding rules whose right sides generate the goal state. Then the left side of each of these rules is used to establish a set of subgoals. This set represents conditions that, if satisfied, would permit a rule that would generate the goal state to be applied. Each of these subgoalsis in turn treated as a goal to be established. This processcontinues until a set of subgoalsthat are satisfied in the initial state is found. Figure 3 shows a backward search graph generated for the specific goal 2 < 5. Arcs connectedby circles connectnodesthat must simultaneously be satisfied in order for the predecessorstate to be reachable.Conjoined goals such as these appear whenever a rule with multiple conditions in its left side is applied backward. For example, if a rule such as "if o and b then c is applied backward from the goal c, then two subgoalso and b are established, and they must both be satisfied in order for this rule to provide a path to c. In the example somenodesare duplicated to make the graph easy to read. They would not normally be duplicated in a program (see AND/OR graphs; Processing,bottom up and top down). The framework that heuristic state space search provides has served as the core of essentially all AI programs. The framework is general enough that it applies in a wide variety of problem-solving domains. But this generality prevents it from having enough power in and of itself to solve difficult problems. This power must come from the domain-specific knowledge that can be embeddedin the rules that the search procedure uses and in the additional knowledge bases that may be referencedby those rules. Thus, specifictechniquesfor representing knowledge are crucial for the successof heuristic, search-basedAI Programs. and lnference KnowledgeRepresentation Knowledge about the problem domain that is being considered is necessaryto solve problems. If computers are to solve problems, that knowledge must be encodedinto data structures that can be created and used by programs. The discovery of structures that are well suited to this has constituted a very Iarge part of research in AI.
Figure
3. A backward
search graPh'
If representational details are considered,there are hundreds of techniques for representing knowledge in artificial intelligence systems. But if details are ignored, three main techniques emerge: slot-and-filler structures, logical formulas, and production rules. Many programs have beenbasedon each of these methods, and many proglams combine two or more of them. In evaluating a particular method for a specific task, two important factors must be considered:the power of the method itself to represent all of the knowledge that is required and the power of the reasoning proceduresassociatedwith the method to draw the necessary conclusionsfrom the representedknowledge. Thus, in discussing knowledge representation techniques, it is important to consider both the form that knowledge takes within the technique as well as the reasoning procedures that are supported by that form. Figure 4 shows an example of the use of each of the three main representation techniques. Each technique is being used to represent someknowledge about automobiles.Part o shows a slot-and-fiIler structure describing a collection of objectsand an associatedset of attributes and values for those objects.The objects are connectedto each other by ISA ("is a") relations, which define instances and subclassesof more general classes. When slot-and-filler structures are conceivedof in this form, they are called semantic networks (qt). The reasoning engine associatedwith this kind of structure performs two important functions: matching (qv), in which partial descriptions are used to retrieve objects and attributes from the knowledge base, and property inheritance, in which attributes of specialized classesand of individual objectsare inferred from default values associated with more general classes that can be reached by a chain of ISA links (see Inheritance hierarchy; Reasonirg, default). In this example everything outside the dashed box represents common knowledge about cars. The part of the network inside the dashed box correspondsto the sentence"John has a red Mustang." Using the knowledge in this system, it is possible to answer the question "Does John have an American car?" by matching the fragment "catl owner/John" and then chaining up ISA links until a nodewith the nationality attribute is found. Part b iltustrates the use of logical formulas to represent more complex relationships objectsand attributes. In this example the first formula "*org .otr"rponds to the fact that every wheel has a tire on it. The ,".ond formula says that for this meaning of on (namely encircle), an object can b" on only one other object.Using the second formula, one can derive the third, namely that no tire is on more than one wheel. When knowledge is representedas formulas such as this, all of the formal power of a logical theorem prover can be applied to the knowledge to derive new knowlLag. (seeLogic; Theorem provirg). Unfortunately, the combinalorial expiosion that may result when this is done makes this an impractical approach for many problems and is one reason that other representations are often used. Part c is an example of the use of production rules to represent operational knowledge, in this case, of how to determine what is wrong with a car that does not work (see Rule-based systems)' Knowledge representedlike this can be used in a searchproce(such dure to find a path from some current, undesirable state (such state desirable more other, some to car) nonworking as a as a functioning car). Using any of these systems successfully requires a careful
ARTIFICIALINTELLIGENCE
Transportation
Purpose
Wheels
Motorized vehicle
Powered by
Internaf combustion
F u e l e db y Gasoline
Fuefedby Nationa lity
Mustang
Color
Owner
(a)
Vr wheel(r) -t [!ly tire(y) A encircle(y, x)]
if
Vx, y, e [encircle(r,y) A encircle(x,z)l ! : z vx, y, z [wheel(y) A wheel(z) A tire(r) A encircle(x,y) A encircle(x,z)l-y-z (b)
then
engine won't start, and starter doesn't turn, and battery is in good condition, and result of starter-motor-test is start er buzzesor turns engine very slowly, and result of bench test of starter is improper functioning replace starter (c)
Figure 4. Representing knowledge: (o) slot_and_filler,(b) logical formulas; (c) production rule.
analysis of the specificknowledge that is neededas well as the required to guarantee that changes are always made consistway in which that knowledge must be used. Consider the fol- ently. lowing as one example of the kind of question that needsto be answered: "Does knowledge ever ceaseto be true and need to be eliminated from the representational system?" The simple Applications case is when the answer to this question is no. Then knowlThe range of "intelligent" activities performed by people is edge can just be added to the system as it arises. Systems like very broad, so the set of possible upplirution areas for AI is this are called monotonic. Unfortunately, sometimesthe situ- correspondingly diverse. Areas that require relatively small, ation is not so simple. For example, whenever default values homogeneousknowledge basesare goodtargets for current AI are inferred, it is possible that later specific information will technology.It is in these areas (e.g.,somemedical specialties, appear and force the removal of the default fact together with some engineering tasks, some concrete financial uppli.utions) any inferencessupportedby that fact. In the example of Figure that expert systems have and are being built th;; rival the 4a onecould use default reasoning to concludethat John,s car performance of practiced human experts. Areas that require is fueled by gasoline. Then one might go on to decide where to larger, more heterogeneousknowleagr bases are ;";iifngo to buy fuel. If one is later told that in fact John has a diesel cult; these areas (e.g.,understanding thu newspaper,planning cat, one would need to record this new fact.And one would also a legal defense, serving as a private secretary) will take need to remove both the original default assumption that the longer, although some small pieces, even of thesl, huro. been car used gasoline as well as the derived decision about where solved. Still other areas, those that require interaction with to look for fuel (since one's choicemay very well not sell diesel perceptual and motor deyices, depend on technological adfuel). Systems in which the insertion of a new factmay neces- vances in perception (both vision and speech) and motor consitate the removal of some old facts are called nonmonotonic trol in addition to their reliance on Ai to provide cognitive and are complex becauseof the bookkeeping and computation capabilities. Thus, successin these areas (such as feneral
INTELTIGENCE ARTIFICIAT
14
household robots) is very difficult to predict. Two important application areas are discussedbriefly below (seealso Chemistry, AI in; Computer-aided design; Computers in education; Computer-integrated manufacturing; Image understanding; Intelligent computer-aidedinstruction; Law applications; Manipulators; Medical advice systems;MilitarY, aPPlicationsin; Music, AI in; Office automation; Programming assistants; Prostheses;Speechunderstanding; Vision). Expert system (qv) is a single term, but it can be applied to any one of a large collection of programs that rival human performance at specialized, constrained problem-solving tasks. Although the term makes no commitment to the structure of the progTamsto which it applies, the majority of expert systems that now exist are structured as a collection of if-then production rules together with some reasoning engine that applies those rules, either in the forward or backward direction (or perhaps a combination of the two), to specificproblems. Figure 5 shows examples of such production rules taken from two different expert systems. The CF (certainty factor) in the MYCIN rule is an estimate of how strongly the evidencein the if part of the rule supports the conclusion in the then part. Many expert systems use such estimates, although others, such as Rl (XCON), do not. Often additional knowledge bases, frequently of slot-and-filler structures, are also used in conjunction with the production rules, but the primary program architecture is the production system. Somecurrent efforts are directed toward expanding this framework to allow deeper, causal reasoning about how real-world objects function since existing systems are often limited by the superficiality of their knowledge about the objects with which they deal. Natuial-langU age processing, which means both understanding and generating sentencesin languages such as English or Chinese, is an important forum for the exploration of AI tr.hniques for three reasons: The first is that if programs could pro.Lts langu dge,there would be an easy way to provide programs with the knowledge they need to perform other lasks. They could just read it. The secondis that people need someway to communicate with the problem-solving programs they use and in many domains natural languages are well suited to this. In addition, people already know them. The third reason is that studying language has proven to be an and the strain of the organismis gram-positive, the morpholoryof the organismis coccus'and the growth conformationof the organismis clumps then (0.7)the identity of the organismis staphyloccus If
A rule from MYCIN If
the mostcurrent activecontextis distributing mass-bus devices, and there is a single-port disk drive that has not been assigned to a mass-bus,and there are no unassigned dual-port disk drives, and the
shouldsupportis that eachmass-bus
ilffi:;:i""tces
there is a mass-bus that has been assigned at least one disk drive and that should support additional disk drives, and the type of cable needed to connect the disk drive to the previous device on the mass-busis known then assign the disk drive to the mass-bus A rule from R1 Figures.Productionrulesfromexpertsystems.
Mary brought Sue a book for her birthday becauseshel knew she2 liked to read: her - Sue
sher : Mary
shez : Sue
Mary brought Sue a rose from her garden becauseshel said she2 liked flowers: her - Mary
shel - Sue
shez = Sue
Mary brought Sue a photo from herl portfolio becauseshel thought shezcould impress her2 with her3 work: herl : MarY her2 - Sue
sher : MarY
shez : MarY
her3 - MaqI
Figure 6. Using knowledge to resolve pronoun references.
excellent laboratory for studying many important issues involving the knowledge and the reasoning that languages are used to talk about. Much of what is now known about representing and using knowledge has come from work in naturallanguage processing. This is not surprisi^g, in retrospect, is used to talk Siven the broad range of things that language between lanconnection intimate the about and considering
guage and the ideas it is used to express. Although some work has been done on natural-Ianguage generation (qv), more has been done on understanding (see Natural-langu age understanding). Current systems exploit a combination of linguistic (Iexical, syntactic, semantic, and pragmatic) knowledge as well as nonlinguistic knowledge uUo"t specific topics of discourse. This nonlinguistic knowledgeis crucial becausewhen peoplewrite (or speak)they leave o,tt much information that is necessary for understanding. They instead expect their hearers to supply that information themselves.One example of this is the use of pronouns.Figure 6 showsthree sentenceswhose structures are very similar but whose pronouns'referents are different. Readersmust use substantial world knowledge about what makes senseto determine those referents. There exist natural-Ianguage systems that function well within timited contexts. There exist no systems that successfullyunderstand (and, e.g.,answer questions about) the entire New York Times. To build such a systemwill require both more sophisticated linguistic techniques as well as more extensive knowledge bases that cover all the topics that may arise. Learning If faithfully executing human-written programs, of whatever complexity, doesnot appear to constitute intelligent behavior, it is becauseintelligence is often consideredto require more initiative. It requires learning (qv), in which new procedures are constructed to meet new demands and solve new problems. Without learning and the adaptability it provides, it is difficult for a problem solver to be successfulin a complex and rapidly changing environment. Thus the question "Can machines be mad" io learn?" has often been put forward as the key question in AI. There are two ways one could approach answering that question. one way is to answer it by definition. one could have u d"firrition of learning and then reason from that definition is and some model of the class of machine with which one
ARTIFICIALINTETLICENCE
concernedto find a constructive (provable) answer to the question. But finding a workable definition of learning is difficult. Much of what people do when they learn, namely to acquire new facts, can be done trivially by computersusing large storagedevices.In addition, it is easy to store the results of problem-solving programs for later use and thus to make programs that "learn" from their experience.But these capabilities are not enough to make programs genuinely adaptive in the way that people are. So more is needed, but what more and how much more are not understood. The second way to try to answer the question of machine learning is empirically. Specific learning tasks can be examined, and progTams designed to perform those tasks can be built. If these efforts are successful,one is able to answer the machine learning question affirmatively. And, along the waY, one may acquire a better understanding of what learning itself is. Several efforts of this sort have been and are being conducted, and they provide partial answers to these questions. Learning is a problem-solving task and, as such, can be cast in the state spacesearch paradigm. In other words, & learning program must solve the problem of constructing a program (or a body of knowledge to be used by an existing program) that is adequate for solving some other specified problem. The primary difference between most learning problems and most other problems is the size of the search space that must be considered.A learning program must construct another prograh, and the number of possibleprograms is huge. A successful learning program, like any other successfulproblem-solving program, is one that can exploit enough heuristic power so that the actual space that is considered is substantially smaller than the total spacethat might have been examined. In other words, the learning program should only consider other programs that are "reasonable" candidates for performing the desired task. It must perform a very effective heuristic search. Many early learning efforts were devoted to games, and techniques were developedfor learning the relative weights that should be applied to each of the factors that influences a program's evaluation. Another area that has receivedsubstantial attention is conceptlearning (qv), in which a description of a prototype of a class of objects (such as chair or house) is constructed (Iearned) from a series of descriptions of objects that belong to the class and a series of objects that do not belong but are very similar to those that do (near misses). Similar techniques have also been used to learn production rules. In particular, if the right sides of a set of rules are known but the left sides are not (equivalently, things to do are known but when to do them is not), each left side can be viewed as a concept that describesthe times when the corresponding right side doesrepresent an appropriate action. Concept-learning techniques can then be applied to construct those left sides. One final learning technique that needs to be mentioned is analogy formation. Programs are being built that infer relevant information about new situations by forming analogies between the current context and one or more stored contexts. Based on evidence from programs such as these, there appears to be no qualitative reason why programs cannot learn. But all existing learning programs are limited, and their performance decreases as they move farther away from the knowledge with which they were initially provided. Current efforts are attempting to improve this situation by finding
15
more powerful ways to focus the search of learning programs in the most productive directions. and Machines Languages As AI has evolved, it has producedspecificprograms that solve specific kinds of problems. But it has also produced a set of tools that help one to construct other problem-solving programs. By the end of the 1950s it was recognized that the standard programming languages of the time, which had been designed to support primarily numeric processing,were not very effective in supporting the nonnumeric, symbolic computing that AI programs do. LISP (qv), a language designed by John McCarthy for symbolic computing, particularly list processing,had appearedby 1960,and it still (although in several greatly expanded forms) serves as the basis for much AI program development. LISP is primarily an interpreted language, although compilers for it do exist. But when it is run interpretively, it supports significant interactive, symbolic debugging. AI programmers were among the first to exploit such debugging tools in their program development process. In recent years three other kinds of AI tools have emerged. One is a set of higher level tools that build on top of LISP (conceptually, although they may actually be implemented some other way for efficiency) to provide specific capabilities for specific kinds of tasks. For example, tools for building expert reasoning systems and for building English-understanding systems now exist. The secondis a new kind of language whose main control structure is not the sequential execution of a set of statements but rather the application of a specific inference mechanism to the program, which is actually a set of logical assertions. The most important language of this sort is PROLOG (see Logic programming). The other recent tool development is the emergence of special-purposehardware to support LISP processing.These machines are designedto optimrze symbolic computation and dynamic storage allocation. They also provide rich program development and debugging environments oriented toward LISP programming. Conclusion AI is both a scientific discipline and an engineering activity. One of its nicknames is applied epistemolory (qv), which reflects both the theoretical groundwork of the field as well as its orientation toward producing programs that actually solve specificproblems. This nickname is also indicative of the major source of power in AI programs-knowledge. Harnessing knowledge, both commonsenseknowledge and specializedexpertise, is the major challenge that AI faces.At the same time work on the structures that contain that knowledge and that use it in problem solving must also continue. As these efforts progress, increasingly powerful programs that can substantially assist people in many cognitive tasks will gradually be developed. FurtherReading For an introduction to the field of AI as a whole, see Refs. 10, 11, and 12. The three volume Handbook of Artificial Intelligence(13) is a comprehensivereferencework. Artificial Intelligence is the main archival journal of the field. The AI Magazine is more ephemeral anci contains articles that are less technical and less theoretical. The major AI conferenceis the
16
MEMORY ASSOCIATIVE
International Joint Conference on Artificial Intelligence (IJCAI), which is held in odd-numbered years. Three out of every four years (when IJCAI is not in North America) there is also a conferencesponsoredby the American Association for Artificial Intelligence (AAAI). In addition, there is a more applications-orientedconferenceheld each year and sponsored by the IEEE. The proceedingsof these conferences,&s well as others on more specializedtopics, are among the best sources of information on current work in AI (seeLiterature of AI). For more information on the history of AI see Ref. 14. For detailed discussionsof heuristic searchtechniques seeRefs. 15 and 16. See Ref. 17 for a collection of articles on the main issues in knowledge representation. For more specializeddiscussionson the use of logic seeRefs.18 and 19 and on slot-andfiller structures seeRefs.20 and 2L. On rule-basedrepresentations, seeRef.22. For an introduction to expert systems see Refs. 23 and 24. On natural langU age understanding see Refs. 25-29. For a discussionof issues in machine learning, see Refs. 30 and 31. For discussionsof machine vision see Refs. 32 and 33. For an introduction to LISP see Refs. 34-36. For more detail on the emerging common dialect of LISP (Common Lisp) seeRef.37. To find out more about Prolog, seeRefs.38 and 39.
BIBLIOGRAPHY
16. J. Pearl, Heuristics, Addison-Wesley,Reading,MA, 1984. L7. R. J. Brachman and H. J. Levesque,Readings in KnowledgeRepresentation,Kaufmann, Los Altos, California, 1985. 18. L. Wos, R. Overbeek,W. Lusk, and J. Boyle, AutomatedReasoning: Introduction and Applications, Prentice-Hall, Englewood Cliffs, NJ, 1984. 19. A. Bundy, The Computer Modelling of Matltematical Reasoning, AcademicPress,New York, 1983. 20. N. V. Findler (ed.),AssociatiueNetworks:Representationand Use of Knowledge by Compnter,AcademicPress,New York, 1979. 2L. J. F. Sowa,ConceptualStructures, Addison-Wesley,Reading,MA, 1984. 22. D. A. Waterman and F. Hayes-Roth, Pattern-Directed Inference Systems,Academic, New York, 1978. 23. F. Hayes-Roth,D. A. Waterman, and D. B. Lenat, Building Expert Systems,Addison-Wesley,Reading, MA, 1983. 24. S. M. Weiss and C. A. Kulikowski, Designing Expert Systems, Rowman & Allanheld, Totowa, NJ, 1984. 25. N. Sager, Natural Language Information Processing, AddisonWesley, Reading, MA, 1981. 26. R. F. Simmons, Computations from the English, Prentice-Hall, Englewood Cliffs, NJ, 1984. 27. T. Winograd, Language as a Cognitiue Process:Syntax, AddisonWesley,Reading, MA, 1983. 28. M. Kin g, Parsing Natural Language, Academic,New York, 1983.
29. D. R. Dowty, L. Karttunen, and A. M. Zwicky, Natural Language
Parsing: Psych,ological,Computationq,l and Theoretical Perspectiues,Cambridge University Press, New York, 1985. Sketchof the Analytical NotesuponL. F. Menabrea's 1. A. Lovelace, Michalski, J. G. Carbonell, Jr., and T. M. Mitchell, Machine R. 30. Engine Inventedby CharlesBabbage,in P. Morrison and E. Tioga, Palo Alto, CA, 1983. Learning, Engines, Moryison(eds.) , CharlesBabbageand His Calculating J. G. Carbonell, Jr., and T. M. Mitchell, Machine Michalski, R. 3 1 . 1961, York, New Dover, PP.245-295. Kaufmann' Los Altos, CA, 1986. Morgan II, Learning 2. A. Turing, computing Machinery and Intelligence,in E. A. Freeman,san Francisco,1982. H. W. Vision, Marr, D. 32. Feigenbaum and J. Feldman (eds.),Computersand Thoughf, McComputerVision, Prentice-Hall,EngleBrown, C. 11-35. and Baltard D. 1963, 3 3 . Graw-Hill, New York, PP. L982. NJ, Cliffs, wood and in Generality Study A Case GPS: Newell, B. G. W. Ernst and A. 34. D. S. Touretzky, LISP: A GentleIntroduction to Symbolic CompuProblem soluing, AcademicPress,New York, 1969. tation, Harper & Row, New York, 1984. 4. R. E. Fikes and N. J. Nilsson, "STRIPS: A new approach to the Intell,2, Artif. problem solving, Wilensky, LISPcraff, Norton, New York, 1984. proving to R. 35. applicationof theorem 189-208,1971. 36. P. H. Winston and B. Horn,LISP, Addison-Wesley,Reading,MA, 1 9 8 1. 5. D. Levy , computer Gamesmanship,simon & schuster, New York, 1983. 37. G. L. Steele,CornmonLISP, Digital Press,Burlington, MA, 1984. 6. A. L. Samuel, SomeStudies in Machine Learning Using the Game 38. R. A. Kowalskr, Logic lor Problem Soluing, North Holland, Amof Checkers,in E. A. Feigenbaum and J. Feldman (eds.),Comsterdam, L979. puters and. Thoughf, McGraw-Hill, New York, 1963,pp' 71-105' gg. W. F. Clocksin and C. S. Mellish , Programming in Prolog, 2nd ed., 7 . T. Winograd, A Procedural Model of Language Understanding, in Springer-Verlag, New York, 1984. R. C. Schank and K. M. Colby (eds.),ComputerModelsof Thought E. Rrcrr and.Language,w. H. Freeman,san Francisco,1973,pp. 152-186. MCC g. B. G. Buchanan and E. H. Shortliffe, fi ule-BasedExpert Systems: Programming Heuristic Stanford the of Experiments The M7CIN Project, Addison-Wesl.y, Reading, MA, 1984' MEMORY g. J. McDermott, Rl Revisited: Four Years in the Trenches,A-I Mag. ASSOCIATIVE (FalI 1984). V(3), 2l-32, Memory architectures can be classified as randoil, sequential, 10. E. Charniak and D. McDermott, Introduction to Artificial Intelliandassociative(1).FirstintroducedbyBushin..ASwemay gence,Addison-Wesley,Reading, MA, 1985' use think" (2), associative memories have found considerable 11. E. A. Rich, Artificial Intelligence,McGraw-Hill, New York, 1983. papers. innumerable in in hardware and have been discussed tZ. p. Winston, Artificial Intettigence,Addison-Wesley,Reading,MA, from This section overviews the use of associative memories 1984. imtheir with concludes and rnemories database to controllers 13. A. Barr, E. A. Feigenbaum, and P. R. Cohen, The Handbook of portance to AI sYstems. Artifi,cialIntelligence, S vols., Kaufman, Los Altos, CA, 1981' is a An associative memory is composed of a memory that Fransan 14. P. McCorduck, Machines who Think, w. H. Freeman, of i i;1, ' ' il Mit, , bitt of array two-dimensional cisco,L979. that can search mechanism search a and columns and rows 7 1b. N. J. Nilsson,Principles of Artifi,cial Intelligence,Tioga, PaIo Alto, of bits this array and extract information from it. The array cA, 1980.
MEMORY ASSOCIATIVE
17
STARAN The associative memory may be extended with a rewrite function so that each word appears to be a processor.Matched words may be partially rewritten using the comparand C and . , i, if R[n] _ 1, then for each mask H. For each n - 1, j,if HIml: 0, C[mlis put into M[n, m\ Search7-,. m , ing and rewriting permit arithmetic and logical operations on all words simultaneously in a SIMD parallel computer such as STARAN (3). A large number of programs have been written for STARAN, and highly parallet SIMD algorithms. such as those found in raoar-signal processing,have been considerably faster in STARAN than in a conventional machine. But large databasesearchesare not suited to STARAN becausethe time to load M dominates the time to search it; even if the search time were zero, STARAN is slowed to the speed of conventional machines by the time to load its memory.
B
Set Search
A plethora of papers (4,5) have been written on the use of associativememories for searching databases.If the words are consideredordered (whereas they are unordered in the examples above), sequential addressing (e.g., the notion of next ,il can be considereda set of equat-lengthwords M[ ni t, . lower word) can be combined with associativeaddressing.Af,J] and a mask H[1, ter a searchthat sets the result bit in R, I "next search" can be for n: 1, . , i. A comparandC[1, . . , jl can be used to search M and set bits in a result made, where only words below a word where the R bit was set ,i,if registerR[1, . , i], seeFigure 1. For eachn 1, . are searchedto set the R bit at the end of the search.A string : (Mt one can say of characters can be searched, one character after another, mf), ml C[ j, n; 1 or for m: 1, . , H[ mf word M[ni 1, . ,"1]matchesC undef H, and R[n] of set to 1; forming the basis for text-oriented information retrieval. Parotherwise R[n] is cleared. Matched words can be output to a titioning the rows into contiguous blocks and using a delimiter ,i, B[m] is the OR of M[n; to mark the beginning of each block, a "set search" can be bus B[1, . ,il;for m - L, . -l *herever R[n] is 1. made, where only words in a block, where the R bit was set in Consider this simple example of a telephone directory someword in the block, can be searchedto set the R bit at the searched in an associative memory. Each word is (person's end of the search. This is the basis of most relational, hierarname, telephone number, address). Suppose one wants to chical, and network database machines. know Mr. Smith's address or telephone number. The search and output operation would use C - (SMITH, xxx, xxx) and H _ (00000, 111, 111). In the search operation the row having AssociativeDisks SMITH in the leftmost part will match, and the result bit for Although it may become feasible to build large associative that row is set. In the output part that row is put on the bus B, memories using integrated circuits for the memory, associawhere the unknown parts can be obtained. tively addresseddisk memories have been proposedfor large databases (6). As the data on a disk track pass over a read Memory Managementand Hardware Control head, searching may be accomplished.essentiallyas described above.Using additional techniques (e.9.,storing the data on a The search and output operation is often combinedin a single in RAM called a disk cache),data can be modified and track step in an associative memory, which is the MMU used in later written back on the track. If each disk head has these virtual memories (1). In such a system a processor(CPU) reads large amount of data can be associatively a capabilities, and writes data in RAM. The processorsends an address to it is stored rather than transported to and where searched put is address in it. This a word RAM in order to read or write from large mainframe a computer. into the MMU, as a comparand, Iike the name SMITH in the example above. The output on the bus, rather like the telephone number of the example, is the actual address that is Outlook sent to RAM. An MMU lets the addressmaintained and sent by the CPU (called the virtual address)be different from the Finally, the associativememory continues to find new applications as new problems are studied. Associative memory intereal addressused in RAM. Another common application, where the associative mem- grated circuits have been sold for over 10 years now, but they ory is a read-only memory, is the PLA. Used in the control have not been widely used because they were considerably togic of a processor,microprogrammed commandsare put in as slower and smaller than RAMs. However, custom VLSI is becomparands, and the outputs of the bus are sent to control coming commercially accepted, and an associative memory can be integrated with other processorlogic and memory in registers, adders, and other parts of the hardware. The PLA lets the microprogram use commands that are efficiently en- these custom chips with less relative cost than in current systems. Also, considerable work is in progress in the design of coded,which control a vast number of hardware units. memory. Figure 1. Basicassociative
IB
AUTOMATIC PROGRAMMINC
databaseprocessors.In these systemsthe disk is made associative, as discussedabove. A group of researchers in Japan and in the Microelectronics and Computer Corporation in the United States are competing to make these databasemachines commercially useful. These advancesshould contribute to better AI systems.For instance, in PROLOG (seeLogic proglamming) the databasecan be searched,the Horne clausescan be matched, variables can be instantiated, and control can be effected using associative memories from associative disks to associativeVLSI memories and associativecontrollers (scoreboards and PLAs).
BIBLIOGRAPHY 1. K. Hwang and F. Briggs, ComputerArchitecture and Parallel Processing,McGraw-Hill, New York, L984,pp. 57-80. 2 . V. Bush, "As we may think," Atl. Mo. 176(1),101 (1945).
3 . K. E. Batcher, "STARAN parallel processor system hardwata," Proc. AFIPS-NCC, 43, 405-410 (1974). 4. J. Minker, "An overview of associativememory or content-addressable memory systems and a KWIC index to the literaturQ," Cornput. Reu.,453-504 (October1971). 5. F. Maryanski, "Backend database systems,"ACM Comput. Suru., 12(1), 3-27 (March 1980). 6. O. H. Bray and H. A. Freeman, Data Base Computers,Lexington Books, Lexington, MA, 1979.
G. J. Ltpovsxt Universityof Texas
NETWORKS.see Associativememory; semantic ASSOCIATIVE networks. AUGMENTEDTRANSITIONNETWORKS.See Grammar, augmented transition network.
AUTOMATICPROGRAMMING Computer programming is the processof constructing executable code from fragmentary information. This information may come in many forms, including vague ideas of how the output should look, the nature of the expectedinput, the type of algorithm to be used, and, possibly' examples of the target behavior. The result of the programming is a section of code that is capable of receiving inputs from the target domain and processing them to yield appropriate outputs. when computer programming is done by a machine, the process is called automatic programming. AI researchers are interested in studying automatic programming for two reasons: First, it would be highly useful to have a powerful automatic programming system that could receive casual and imprecise specifications for a desired target program and then correctly generate that program (Fig. 1o); second,automatic programming is widely believed to be a necessarycomponent inte1igent system and is therefore a topic for fundaof ""V mental research in its own right. Thus, a system might discover a successfulway to achieve a given result and program itself to achievethat result. For example, a leggedbeing might
Figure 1. Examplesof automaticprogramming.(a) Generatingthe user,sprogram.The computerautomaticallygeneratescodefrom ca(b) Learningnewbehaviors.The intelligentbeing sualspecifications. codeto enableitself to functionin the world. internally assembles
assemblecodeto enable itself to walk after a seriesof walking experiences(Fig. 1b). A number of approachesto automatic programming have been developedover the years, and the most important ones are described here. The following sections, as illustrated in Figure 2, describe methodologies for synthesis from formal input-output specifications, from examples of the desired program behavior, from natural-langu age dialogue, and from cooperative interaction between a human programmer and a mechanical programmer's assistant. The methodologies based on synthesis from formal specifications utilize predicate calculus (seeLogic; Logic, predicate) notation and derive the target program in a sequenceof logical steps. Because the resulting program is mathematically derived from its specification,its correctnesswith respect to the specification is assured. Thus, the methodologies are very attractive, and their development has implications for the foundations of computer scienceas well as AI. The synthesis from example methodologies involve generayzatton and learning (qv) behaviors. Since the examples do not completely specify the target program, the initially gener-
AUTOMATIC PROGRAMMING
V a ( P ( a ) = >! f z R ( o , z ) )
if a = NILthen0 e l s ec o r ( o ) * f ( c d r ( a ) )
(a) SUnthesis frornformalspecifications. Uler: Plecsevrite o Proqrcmto oddup a setof numbers. Y O UL I K E S Y S T t l lW : H A TW O U L D ? TOCALLTHt PROGRAI"I
SUM(I NPUT,OUTPUT) PROGRAI'I R; VARX, ANS:I NTEGE
19
lnput, ((A B) C) O u t p u t :( C ( B . A ) )
f ( x ) = c o n d( ( a t o m ( x ) x ) ( t c o n sf(, ( x ) , f r ( x ) ) ) c 3 f-(x)=f(crfr(x))
r i ( x ) =f ( c s r ( x ) )
( b) Sgnthesis from exomPles.
( Interoctivehuman-machinegenerotion of I nleroctivecreotion formelspecifications. of codesnddocumentation.)
FormalSpecifications ngProduct Programmi Documentotion Complete
END (c) SUnthesis frorn noturolltngusge diologue.
(d) SUnthesis interaction throughcooperotive ossistcnt. betveenhumonondmechonicol
Figure 2. Four research areas in automatic programming.
ated program may not achieve all of the desired behaviors. But the addition of appropriate examples can force the synthesizer to efficiently converge to a satisfactory program. The attractiveness of this approach comes partly from the ease with which a user can provide examples.These techniques are also important for AI researchers to understand becausethey seem to be fundamental to certain kinds of intelligent behavior (see Inductive inference). The third approach, program generation from natural-language dialogue, involves translating informal descriptions into formal specificationsthat can be programmed using formal methods. This approach uses those technologies mentioned above as well as natural-language processing(seeNatural-language understanding), knowledge representation (qv), and AI systems design. The final approach, program synthesis using an automated programmer's assistant, assumes that a human will be the primary programmer and that the proper role for a machine is to supplement his or her efforts. The human's role is to develop and refine a set of formal specifications and make some implementation decisions,with the machine assisting by checking consistency, retrieving information from libraries, and so forth. At some point in the specification process,the machine can take the primary role in selecting data structures and building the codeand documentation for the software product. The following sections describe these four approachesto automatic programming.
Synthesisfrom Input-Output Specifications Statingthe Problem. The first approachto the study of automatic programming assumesthat a specification of the inputoutput behavior is given and that the automatic system is to
find a program to implement the specification. Very often the specification is written as Ya(P(o) )
AzR(a,z))
Here P(a) is an input predicate that is true if and only if a is an acceptable input for the target program. .B(a, z) is an inputoutput relation that is true if and only if z is the desired output when the target program reads o as its input. The specification states that for all o such that the input requirement P is met, there is a e such that the input-output relation R (a, z) holds. The program synthesis will proceedby proving this theorem. It turns out that in order to complete the proof, the theorem prover must constructively find the z that is asserted to exist and the method for finding z rs the target program (see also Theorem proving). As an illustration, suppose it is desired to automatically generate a program to add up a list of numbers a to obtain their sum z. Then one should define the input predicate P@) to be true if and only if a is a list of numbers of length zero or more.Thus P((429)) is true and P((A(B))) is false.The inputoutput relation R (a, z) should be defined to be true if and only if z is the sum of the numbers in a. For example,R((42 9), 15) is true and R ((4 2 9),16) is false. (z is zeroif a has length zero.) Then the proof of the specifying theorem Va(P(o) )
3zR(a, z))
requires the system to find a way of constructin g z for every acceptable o. This proof will thus yield the target program. Of course, it is not possible to carry out the proof unless many facts are known about P and R. For example, one needs the fact that tf a is the list of length zero,that is, o : NIL, then R(a,0) is true. R(a,0)-truetfa:NIL
20
PROGRAMMING AUTOMATIC
AIso, let car(a) be defined to be the first element of list a and let cdr(o) be defined to be list o with its first element removed. The synthesis will use the fact that the sum of o is simply obtained by adding car(a) to the sum of cdr(a) if o is not NIL. R(a, car(a) * z') - true if R(cdr(a), z' ) and not (o - NIL) The program synthesis methodolory uses these kinds of facts to prove the above theorem and generate the following progTam: f (a) - if a : NIL, then 0
else car(a) + f (cdr(o))
The proof of the specifying theorem will involve manipulation of formulas of the form if VrAr(r) and VxAz@)and ' ' and YA"(x) then SxGr@)or JxGz or '
' or f xG*(x)
Following the methodolory of Manna and Waldinger (1), such formulas can be written into a tableau of columns labeled assertions, goals, and outputs as follows:
introduced here, transformations, which convert portions of an assertion or goal to a new form, and resolution rules, which allow one to combine assertions and/or goals to obtain new assertions or goals. Tbansformations have the form r)s
ifO
which means that r may be convertedto s if I is true. In order to illustrate usage, supposethere is a goal G that contains a subexpressionr. It can be written as G[r]. Then the transformation r ) s yields G [s], and this substitution can be made if Q is true. This gives a way to generate a new goal in the deductive table if G[r] is an existing goal. The new goal is G[s] and 8. It means that if G[r] is a valid goal, then G[s] is one also provided that Q is true. In terms of the deductive table, the transformation r ) s if I enables one to begin with Assertions Goals Outputs
GIrl Assertions Goals
Outputs and deducethe new entry
A{a) Az@)
OutPuts
Coals
Assertions
Gtsl and Q An example of a transformation is U(x, -x) ) true if x < 0
An',jx)
GJx) Gz@)
and its usage can be shown on the following example goal and output: Assertions
G^@) This notation enables one to write such formulas omitting quantifiers and connectives' If an entry ti@) appears in the output column in the same row as somegoal Gi@), that output ti@) expressesthe desired program output when G;(r) is true. Using this tabular convention, the specifying theorem can be written as follows: Assertions
Goals
OutPuts
P(a) R(a, z) The Manna and Waldinger (1) program synthesis procedure involves adding rows to this table such that the correctnessof the above meaning formula is maintained. If a goal can be deduced that is always true and such that its corresponding output entry is in terms of the input and primitive functions, that output entry will be the target program: Assertions Goals True
Goals
OutPuts
U(b,y)
y
The transformation can convert (J(x, -r) to true, but this expressiondoesnot occur in the given goal. However, one can make substitutions into the transformation and the goal and output so that the transformation is applicable. (That is, unification is being used as describedelsewherein this volume.) Specifically, one can substitute rc b into the transformation to obtain U(b, -b) ) true if b < 0 and one can substitute y _ - b into the goal and output to obtain Assertions
Goals
OutPuts
u(b, -b)
-b
Now the transformation can be applied using the aboverule to obtain a new row in the tableau.
OutPuts Target Program
The Deductive Mechanism. Once the problem is stated and appropriately entered into the above table, methods are trlla"a for deducing new entries in the table so that progress toward the target program can be made. Following the methodolory of Manna and Waldinger (1), two kinds of rules are
Assertions
Goals
True and b < 0
Outputs -b
Another way to construct new entries in a deductivetable is to resolve (see Resolution) two goals to yield a new one. S,tpposeone has two goals F with associatedoutput p1 and G with associatedoutput pz. Further supposethat F and G both have
AUTOMATIC PROGRAMMING
the same predicate subexpression e so they are written Flel and G[e]: Assertions Goals
Outputs pt pz
F[e] G[el
Then the input specification must require r to be a real number: The relation V(r) is true if and only ifr is real. The inputoutput relation(J(x, z) will be true wheneverz is the absolute value of r. This information must be available to the system in the form of transformations: T1: U(x, -x) ) true if r < 0
In the following, the notation F le ,- truel stands for the goal F with subexpressione replaced by true. Gle <- falsel is similarly defined. Then the two goals Flel and Glel can be combined to obtain a new goal Fle e truel and G[e <- false] with associatedoutput tf e, then p1, else p2. The new row in the tableau is
) true if not (r < 0)
T2: U (x, x)
The synthesis proceedsby applying the available transformations and deductive rules until a program is synthesized. Following the methodolory described above, one begins with the original specification: Assertions
Coals
Outputs
Fle + truel and G[e <- false]
rf e, then p1, else p2
Assertions
An example of such a resolution
occurs if one has these
Applying T1 two goals:
aa0 Not (a < 0)
Outputs -a a
Outputs
U(b,y)
y
and T2 to the goal U(b, y) obtainsthe following Goals
OutPuts
b< 0 Not(b<0)
-b
Assertions Goals
Goals
vft)
goals. Assertions
Coals
Outputs
True and not (false)
If a ( 0, then - a, else o
TWo kinds of deductive rules have been described here, transformations and goal-goal resolutions. Manna and Waldinger (1) have given numerous other deductive rules, but these examples illustrate the nature of the technique. Once the problem is properly representedand rules are available for deduction of new table entries, one can proceedto synthesize programs as shown below. SynthesizingPrograms.The program synthesis procedure follows the problem-solving (qv) paradigm so well known in the AI community. An initial state is given, and transitions are available for moving from one state of the domain to another. The problem-solving system attempts to find a sequence of applicable transitions that will transform the world from the initial state to an acceptable final state. In the program synthesis domain the initial state is the specificationof the program input-output characteristics.The applicable transitions are the transformations, resolution schemes, and other available rules for deducing new forms from the original specification.An acceptablefinal state is one that gives a program in a machine-executable language that meets the original specification. In terms of the deductive table a final state has a goal that is true and an associated output in terms of the input and primitive machine operations. One can illustrate the whole processby synthesizing a program to compute the absolute-value function h; ifnot(r<0) otherwise
Outputs
Assertions Goals True
,., fx nlx) l-r
b
Resolving these two goals results in the final progTam:
Then resolution of these two goals where e is a < 0 yields the following new entry: Assertions
21
It b ( 0, then -b, else b
A more interesting example appears in Figure 3, where a program is generatedto add a list of numbers. In this caseP(a) is true if and only if a is a list of integers of length zercor more. Here R(a,e) is true if and only ifz is the sum of the numbers in a,.If a has length zero,R(a,0) is true. Two transformations carry the critical information. T3: R (a, 0) > true if a _ NIL T4: R(a, car(a) * z') + true if R kdr(a), z') and not (a - NIL) step
mssrtions
(l)
P(a)
goolg
outputg
remorks inputspcc.
(2)
R(o,z)
z
i-o relation
(3)
a=NlL
0
T3 on (2)
( 4)
R(cdr(a),z') and n o t( a = N I L )
cm( a) + z'
T4 on (2)
(5)
ifv
inductive frypottpsls
( 6)
not(lf cdr(a)
car(o) * (cdr(a))
resolvlng (4) ard(s)
( 7)
n o t( e = N I L )
car(a) * f(cdr(o)
simpliffirg (6)
(8)
true
if o = NILtfpn 0
resolving
ffir, +f(cdr(a)) Figure
(3)and (7)
3. Synthesizing the program to add a list of numbers.
22
AUTOMATIC PROCRAMMING
Step 5 is the inductive hypothesis for an inductive proof that introduces a looping behavior into the synthesis. It states that the synthesizedprogram f works properly [i.e., if P(u), then R(u, f (u))l for all lists shorter than the input o (i.e., u 1o). The proof that f then also works on o completesthe inductive argument and enables the introduction of recursion in the generated program. Step 6 is a goal-assertion resolution that functions similarly to the goal-goal resolution above. The final synthesized program is
That is, at the first instant of time, IL was executed. Then condition o was tested and found to be true, and ^I2was executed. This proceedsuntil the final statement at time 6, when .I3 was executed. The instructions .I; may be any instructions such as READ(r) or 3c- x + 1 and the conditions a, b and c may be anypredicates such asn > 3 or x, < y.It is desiredto find an algorithm capableof building a program that can do this trace. One can, in fact, begin building the desired program by starting at the beginning of the trace and moving downward, building a codeto account for each step. From the first instrucelse car(x) + ftcdr(x)) f G) - if x - NIL, then 0 tion (time - 1), the beginning of the target program can be created(Fig. 4a).Next condition o is observedand instruction Historical Remarks.It was discovered in the late 1960s 12 executed. /2 is added to the flowchart (Fig. 4b).The trace lGreen (2) and Waldinger and Lee (3)l that the proof of a indicates the next condition is b out of 12,and this leads to theorem with existential quantifiers will implicitly contain another execution of 12.Since an example of 12already exists, the sequenceof operators required to find the objectsasserted this transition is sent to it (Fig. 4c). Examining condition c to exist. This sequenceof operators can be consideredto be a from this instruction to 12 at tim e 4, again this transition is program for finding those objects, and this discovery became sent to the existing version of /z Gig. 4d). At time 5 the trace the basis for much research in automatic programming. The indicates a b transition to 12,and the existing program already task of theorists then becameone of finding methodsto extract has such a b ftansition to 12.Whenever the cunent trace conthe operators from the proofs and developing theorem-proving dition matches the condition on an outgoing transition of the strategies that would properly introduce the desired looping, active node of the program, one has what is called a forced branchitg, and subroutine constructions into the code. The move. If the current version of the program is correct, 8s it is methodology of Manna and Waldinger (1) systematized and here, that transition will properly predict the next instruction generalizedon the techniques that had been developedearlier. in the trace. At tim€ 6, however, a contradiction occurs:The Another related approach depends on the use of transformove is forced, and the program uses the b transition to premations that sequentially modify an input-output specifica- dict the next instruction to be /2. However, the trace indicates tion from a nonprocedural statement into an executable pro- the next instruction should be /3. Apparently an error has gram. This approach is described by Broy (4), Burstall and occurred in the synthesis. Darlington (5), and Manna and Waldinger (6). More recently Bibel and Hdrnig (7) have constructed a large logic-based automatic-programming system that employs a variety of strategies. This system not only provides a deductive system for synthesizing programs as presented above but it also includes many methodologiesfor effectively guiding the search to obtain a final program. One of its more novel mechanisms generatesexamples in the problem domain and generalizes from them to produce hypothesizedtheorems that can be proved and used in the synthesis. (o) (b) (c) ProgramSynthesisfrom Examples This section gives an algorithm for synthesizing flowcharts from traces and an example of its usage, the generation of a program to solve the tower-of-Hanoi problem. Then it shows how this leads to a program synthesis methodology for LISP code.Also, methods are given for creating LISP programs from recurrence relations and PROLOG programs from example behaviors. The section concludes with some general observations regarding program synthesis from examples.
r1\
+ \,0
r \
c
Constructing Flowcharts f rom ExampleTraces.Suppose a computer program has executed the following computation trace in completing a particular caleulation. Time 1 2a 3b 4c 5b 6b
?
t,
Condition Instruction IT I2 I2 Iz I2 Is
(d)
(e)
Figure 4. Constructing a flowchart from a trace.
(f)
AUTOMATIC PROCRAMMING
23
for the largest disk on the top of a stack or for an empty peg. Then the locations of S, N, and B on the three pegs can be indicated by three-letter sequencessuch as NSB with the ith letter referring to the ith stack. The start position shown in Figure 6o is thus SBB, and after the first move this becomes NBS. The three pegs are labeled 0, 1, and 2 so that the first move, indicated as 0 -+ 2, is to move the smallest disk to the rightmost peg. The training procedure for the robot is simple. The trainer graspsits hand and forcesit to do the first move, 0 + 2. Then it is forced through the secondmove, 0 --+ 1, and so forth. However, as it goes through each step, it remembers each threeletter condition observedand the resulting instruction, i + j. Simultaneously, it continuously runs the flowchart synthesis algorithm and begins constructing a program to solve the learning to Solvethe Tower of Hanoi. The usage of the flow- problem. Interestingly enough, after the robot has been guided chart synthesis algorithm of Figure 5 can be illustrated by through the first seven steps of the solution, it will announce examining how it might enable a robot to learn a procedurefor that it thinks it knows what to do next. The synthesis algosolving the Tower-of-Hanoi problem. This discussionassumes rithm will have found a forced move and is prepared to predict the robot has the flowchart synthesis algorithm built into its the next step of the synthesis.Thus at Figure 6b one can allow control system. The machine is presented with the initial state it to take two steps by itself, and they are correct. If they had of the five-disk version of the Tower-of-Hanoi. Five disks of decreasing size are stacked on the leftmost of three PeBS,as shown in Figure 6a, The disks are to be moved to the rightmost peg by a sequenceof stepsthat allow moving one disk at 0t2 a time from peg to peg without ever putting a larger disk on a smaller one. The following notation is used: The disks on the tops of the (r) ffit three stacks are given the names S for the smallest disk, N for the next smallest disk that appearsat the top of a stack, and B
At this point the procedure backs up (seeBacktracking) to the last unforced move and changesits decision.Returning to time 4, the c transition will not be directed backward but instead to a new copy of 12Gig. 4e). Moving forward again, a secondcontradiction will be found, another back up, and then the final flow chart of Figure 4f will be built. A general algorithm for creating flowcharts from traces appears in Figure 5. This algorithm requires the user to specifya limit L onthe number of nodesto appear in the target program and will always find a program of that size or less that can execute the trace. If none exists, backup will occur to the first instruction and above,resulting in a termination with failure. The algorithm can then be restarted with a latger L. The following section shows some applications of the method.
+ll
0->l BN5 Reod L
G e t n e x t i n s t r u c t i o nI from troce
2->l T.ISB
Does o copg of I olreodg exist in progrom ?
o- >? BStl
Add I to f Iowchort
not forced
B u i l dt r o n s i t i o nt o a l r e o d ge x i s t i n g I
f orced move with c o n t r a d ic t i o n
E o c ku p t o l o s t u n fo r c e c m l ove l f n o n ee x i s t s , holtwith foil.
forceclmove without c o n t r o dci t i o n
H o lt w i t h success
p o i n t e rt o Chonge o n o t h e rc o p go f i n s t r u c ti o n
Figure
5. Flowchart
synthesis algorithm.
Itt_
_Ll+
0->l BN5
?- >1 BSN
(c)
|
o- >? NBS
G e tn e x t i n s t r u c t i o n I from troce
D o e sn u m b e ro f i n s t r u c t r o n si n p r o g r o me x c e e dL ?
++
l->2 5BN
G ot o a p p r o p r i a t e progromnocle (b)
I
-LI -L =LIT
l->0 SNB
Check troce conclition ofter current instruction
+tt +t
_L+ + _l_+ +
AUTO},IATIC
AUTOT{ATIC
?- >o NSB
Figure 6. Learning to solve the Tower-of-Hanoi problem.
24
AUTOMATIC PROGRAMMING Start
| ->0 SBN
?->l SN8
r+
I |
-
I
0-)I BNS
II rIl
2 -> 1 N5B
_l_+ l
o- >2 NSB
(d)
-t-
o- >2 BSN
l-)0 SNB
--
lgl
l+
IC AUTOMAT
--
-T
AUTOI4ATIC
IC AUTOMAT
I
IC AUTOHAT
Figure 7. Synthesized program for solving the Tower-of-Hanoi lem with an odd number of disks.
prob-
up and down the tape and read and print as desired (see Turing machine). The machine can then "learn" the appropriate finite state control, and it can acquire the ability to do any l->2 IC AUTOI"IAT calculation. For example, it was able to construct a universal SBN Turing machine on the basis of one example calculation. As another illustration, a trainable desk calculator has developedby Biermann and Krishnaswamy (9) that enbeen o- >2 AUTOIIATIC ables a user to do hand calculations at a machine display with BNS a light pen and that creates programs from the resulting tTc traces. Using this system, many programs have been generated, including various matrix manipulation routines, sorting programs, o finite-state machine minimizer, and a compiler for a small ALGOL-like language. IC AUTOI,IAT An application for this approach was also developedby wa5BN terman et al. (10) in the construction of a programmer's helper Figure 6. (continued) called EP. This system observesthe user in typical daily applications and builds programs to automatically mimic the user behaviors. Then when repetitive tasks occur, the user can rebeen wrong, the trainer would have taken control, forced cor- lease control to the machine, which has built a code to finish rect action, and sent the algorithm into its backup and code them automatically. revision mode. The robot will need help again for four steps beginning at Figure 6c, but from there on it will complete the Revisionsto FlowchartConstructionMethod. The algorithm solution with only two additional helping steps. Figure 5 requires that each instruction be followed by a test of The program constructed during the process is shown in on the data structures to determine the next step. However, in Figure 7. If the robot were now given the Tower-of-Hanoi many casesno tests are needed after instructions or the tests probl.m with 25 disks, it would solve it without intervention. may be constructed automatically during the synthesis. Such in fact, it will solve the problem for any odd number of disks. a situation occurs in the following trace, where a user is addWith one additional trace using, soy, six disks, the synthesized ing a column of numbers: program will be able to do any size problem' The learning of the general solution to the Tower-of-Hanoi I<-1 problem is, of course,dependent on the coding of the informaSUM <- O liott given to the algorithm. Somecoding schemeslead to much SUM <- SUM + A(I) faster learning than is shown in Figure 6, and some lead to I<-I+1 slower learning. If the coding scheme is poor, the flowcharts SUM <- SUM + A(I) I<-I+1 created will only be able to solve the particular examples given in the training sessions, and no generalization will : occur. <- SUM + A(I) SUM appear methodolory this of applications other of A number I<-I+1 in the literature. As an illustration, a trainable Turing maPRINT SUM bhine was constructed (8) that allows the user to force the head r
l++
t-L+
-
AUTOMATIC PROGRAMMING
This trace includes only instructions and no conditions as required in the previous section. However, a small modification to the flowchart synthesizer makes the codecreation straightforward. First every instruction is assumedto be followea UV the null condition, which is always true. Then when the synthesis is about to fail at the last instruction, a predicate synthesizer looks at the data and tries to build a test that is true at the last instruction and false previously. It might create a poor test, such as SUM > 177, rnwhich th; resulting program would be incorrect. So one or two more examples might be necessary before the correct test would be selected,I > N, where N is the size of A. The synthesizedprogram is thus as shown in Figure 8 and is based on knowing onty the instructions of the trace. This example can be carried further by proposing that the user gives only the following trace. S U M< . 0 SUM <- SUM + A(1) SUM <. SUM + A(2)
lnstruction I start ? compsre 3 rnove 4 campsre 5 move 5 compEre 7 rnove I compsre
9 rnove
: SUM <. SUM + A(20) PRINT SUM Here the synthesizer must additionally discover the need for an index and the appropriate additional indexing instructions. A modification of the flowchart algorithm that synthesizesall needed indexing functions for a broad class of index types is shown in Ref. 11. Thus, two or three hand calculations of tn. type shown in Figure 9 are enough for the automatic generation of a program to merge two sorted lists. Going even another step, one could propose that the machine is given only information that urruy A contains four numbers, 21, 35, L7, and lb, and that the output is gg. Then the machine could enumerate all possible sequencesof operations until the trace of the previous paragraph is found. Ai this point it could resort to flowchart construction to generate the program of Figure 8. This stratery thus provides a technique for creating some classesof programs knowing nothirrg more than their input-output behaviors. Detailed algorithms and examples are given in Ref. 11. Synthesisof tlSP Code. A popular area for research in recent years (L2) has centered around the creation of LISP (qv) programs from examples of their behaviors. Thus, one might be given the fact that the input r - ((A . B) . C) is to yield
I 0 move | | move Figure g. Mergingtwo lists by hand.The synthesisprocedurecan insert the indexinginstructionsand completethe constructionof a mergesort. output z - (C ' (B ' A)). The goal is to construct a LISP program that is capable of executing this and all "similar,, examples. The synthesis of the code begins with the discovery of the LISP operations required to yieid the output from the input. z - cons(cdr(x)) cotls(cdr(car(x)),car(car(r)))) [Here eons(u, u) is defined to be the dotted pair (u . u) and car(x) and cdr(x) are defined to be the left urd right sides of dottedpairr. Thus,ifr - (A.B),then car(x): A,cd.r(x) : B, and cons(x,x) - ((A 'B) ' (A.B)).1 The breakdownof outp ut z in terms of primitive functions is unique and easy to find. Furthermore' it correspondsto the trace of instructions employed in the previous examples. In fact, the concatenated operators can be broken apart into primitives in preparation for the program synthesis:
r,(;:::(rz(x), re(x))
f = |
fzk) - f+kdr(s11 fe(x) - fskar(s11
SUH=0
fq(x) | .r,
\,
-e|
A (t )
rM
-T-.
I \
.l
f =l+ I _T-. J
-l-,. n0
ues PRINT SUM
Figure
8. synthesized program to add a column of numbers.
- xc
fs@) _ cons(fa@),fz(r)) fo(:) _ fakdr(Y11 fz@) - fskarQ677 fa@) - )c fg(r) x Here the flowchart construction methodolory can again be applied; instead of merging separate instructions from the trace, however, different valu"* of f; are merged. AIso, conditional tests need to be created during the synthesis procedure.
26
AUTOMATICPROGRAMMING
The processof merger is made possible by the existenceof the cond operator in LISP, which is written as follows: cond((pg)@zgl)@sgg). . .@ng)) The predicatespt, pz, . . . , pn are evaluated sequentially, and when the firstpi f found that evaluates to true, condreturns g as its value. LISP has some built-in predicates such .t atom(r), which is defined to be true if and only if x is a LISP atom. Supposer is a dotted pair of the atoms A and B [i.e., x : (A . B)l; then f @) _ cond((atom(r)car(x)\
The Summers method involves discovering relationships between the sequential examples and the construction of singleloop recursive programs to implement them. One can begin by writing each output in terms of its corresponding input using LISP primitives. Example
Input
Output f{x) _ NIL fz@) - cons(car(x),NIL) fs(x) - cons(car(cdr(x)),NIL) f+(x)
1 2 3 4
NIL (2) (-L,2) ( 0 ,- r , 2 )
5
(1, 0, -I,2)
6
(-2,1, 0, -L,2)
(atom(car(x))cdr(x)) (T - cons(x,x))) will evaluate to B. That is, atom(r) is false,and atom(cor(r)) is true, so cdr(r) is returned as the result. One can thus see how the cond operator can be used to merge functions from the above trace. Suppose,for example, that it is desired to merge ft(r) and f+(x) and that a predicate generatorhas discoveredthat function f+should be selectedifr is an atom. Then f1 and facan be merged to product f: f(x) - cond((atom(r)x)(T conE(fz@),fr(r)))) In fact, the flowchart synthesis procedure with automatic predicate generation can merge ft, fn, fs,/s, and fgto produce/ as shown. It will also merge f6 rnto f2 and f7 into f3 leaving them unchanged: fz@) _ f(cdr(x))
fs(x) : f(car(x))
NIL)) cons(car(x), fs@) cons(car(cdr(x)) , cons(car(cdr(cdr(cdr (r)))),NIL))) _ cons(car(cdr(x)), fe@) con s(car (cdr (cdr (x))), cons(car(cdr(cdr(cdr (cdr(x))))),NIL)))
Although this step is quite straightforward, the next one requires a key discovery.Specifically, if one studies eachfi in the above sequence,it can be seen that each is rewritten in terms of previous fi in a very systematic way. In fact, the following pattern arises: Example
lnput
Output
The combination of the three functions f f2, andf3 comprise a 1 NIL f{x) _ NIL program that will achieve the target behavior. fn fact, it will (2) 2 fz@) : cons(car(x),fi(cdr(r))) reverse any LISP S expression of any level of complexity. (-r,2) 3 fs@) : fzkdr(x)) Thus, a complete program for reversing LISP S expressions ( 0 ,- L , 2 ) 4 f+(x) : cons(car(x),fskdr(r))) -( 1 ,0 , - 1 , 2 ) has been synthesizedfrom one example. 5 fs@) cons(car(x),f+(cdr(r))) 1 , 2 ) : This function-merging technique is capable of generating ( 2 , 1 ,0 , 6 fe@) fskdr(x)) any member in the class of regular LISP programs (13). These programs include most LISP functions that have only one pa- Examining this sequence,one can seetwo recurrencerelations rameter, use no auxiliary variables, and use only the atom arising consistently, predicate. The processreliably generates programs from randomly selected examples and always converges to a correct fi@) _ corls(car(x),fi-{cdr?d11 regular program if one exists and if enough examples are given. Its main disadvantage is that it is a searching proce- and dure that becomesvery expensiveto execute if the target profi@) - f -{cdr(ts)) gram is large. One can then run a test generation procedureon the inputs to tlSP SynthesisusingRecurrenceRelations.Summers(14) has determine when each recurrence relation is appropriate. In developeda LISP synthesis methodology based on the discov- fact, it is easy to discoverthat if r is an atom, f{x) is NIL, and if ery of recurrence relations in a sequenceof examples. This the first entry of x is negative, fdx) takes the secondrecurrence methodolory has the advantage that it createsprograms more form given above. Otherwise, fi@) takes the first form given. quickly than the above method, but it also requires more care- The synthesizedprogram is thus fully constructed training examples. Supposeit is desired to -f (x) cond((atom(r)Nll,) create a program that will delete the negative numbers in a list. One might present the system with these examples: (negkar(xDfkdr(xD) Example 1 2 3 4 5 6
lnput
Output
NIL (2) (-L,2) (0, -L, 2) ( 1 ,0 , - 1 , 2 ) (-2, 1,0, -L,2)
NIL (2) (2) (0,2) (L,0, 2) (1,0, 2)
(T cons(car(x),f (cdr(r))))) Summers has given a basic synthesis theorem that specifies the nature of the required recurrence relations and a recursive program schema that will implement the observed recurrences. He also shows a fascinating strategy for introducing new auxiliary variables into the synthesizedprogram if they are needed.
AUTOMATIC PROGRAMMING
PROLOGPrograms.Shapiro (15)has developed Synthesizing a methodolory for creating PROLOG logic programs (seeLogic programming) from examples.The operation of his systemwill be illustrated here by showing how it creates the member function in PROLOG. Here member (X, Y) will be defined to be a predicate that is true if and only if X is a member of list Y. Following PROLOG notation, lists will be written with square brackets. Thus [o, b, cf is the list containing entries a, b, and c and so member (4, la, b, cl) and member (b, la, b, cl) will be true whereas member (d, la, b, cl) will be false. PROLOG programs will be written here as sets of clausesof the form pt + pz, ps, , Prwhere the p; are predicates.The meaning of such a clauseis that Pr is true tf pz, Ps, , Pn-r and pn are true. A PROLOG program is executedby asserting such a pt and having the processorprovc pz, Ps, , Pn. Typically these latter proofs involve calls to other clauses in the program, with very deep nestings possible. The example to be studied here is the following progaffi, which has two clauses.The notation [X I y] stands for a list whose first element is X and whose other elements are contained in Y; {member(X, tX I Zl) <- true, member(X, tX I Zl).-
member(X,Z))
The operation of this program can be understoodby observing its action on someof the above example behaviors. Thus member (a,la, b, cl) can be proved using the first clausewith X - a and Z -- lb, cJ.The casemember (b,la, b, cl) can be proved by invoking the secondclause,which assertsthat membet (b, fa, b, cl) if member (b, lb, cD and then using the first clause to prove member (b,lb, cD.The concernhere is to show how such a program can be generated automatically. The synthesis methodology is shown in Figure 10. The user
introduces facts in the form of ground instances of the predicates.Each such predicate must be accompaniedby an indication of whether it is true or false. Thus, the user might enter the facts "member (a, la, bl) is true" and "member (c, [4, b]) is false." The system proposes various clauses that might be parts of the target program and stores them into the data structure called PROLOG program. Then the PROLOG interpreter executesthe currently proposedprogram on the available facts and determines whether it produces the desired result in each case. That is, given the above two facts, the current progTam should evaluate to true for member (a,la, bl) and false for member (c, La, bl). If the current program is not able to prove a desiredresult as in member (a,la,bl), it needs an additional clause, and it calls for one. If the current program proves something that the facts indicate is wrong, the system finds the offending clause in the program and removes it. The clause generator at the top of Figure 10 is basically enumerative in nature. However, Shapiro has designed it carefully to avoid any unnecessary enumeration. First the user must declare the predicate symbols to be used and then specify what predicates can appear on the right sides of clauses.The clause generator also omits the creation of many "refinements" of clauses that have been shown to be unsatisfactory. In the example of this section, the clause generator producesthe following series of proposedclauses: member(X, Y) ,- true membe{X, tX I Zl) e- true member(X, tY I ZD <- member(X, Z) member(X, Y) - member(Y, X) etc.
G e n e r a t o rf o r a l l p o s s i b l ec l a u s e s
Program i ncomplete
Facts
Monitor
Get anotherclause
PROLOG interpreter Nontermination d e b u g g in g
lncorrect a n swer
27
D r o p t h e o f f e n d i n gc l a u s e
Figure 10. Shapiro synthesis algorithm.
PROLOG program
28
PROGRAMMING AUTOMATIC
A dialogue for doing this synthesis on the Shapiro system is as follows: System. Declare the routine to be synthesized. u s E Rm . ember(,). System. What routine does "member" call? usER.member System. The guessedprogram is { }, the empty program. usER.membet (a, [a]) is true. System. The guessedprogram is incomplete. Member (a, [o]) is not covered. The revised program is {member(X, y) * true}' (A caII has been made to the clause generator.) usER.member (a, tbl) is false' System. The guessedprogram is incorrect. Member (a, lbl) is covered and it should not be. The revised program is { }. The guessedprogram is incomplete. Member (a, [a]) is not covered. The revised program is {member(X, txlzl) €- true}' (A secondcall to the clause generator has been made.) usER'member(a' lb' a)) is true' System. The guessedprogram is incomplete.Membet(a,lb, ol) is not covered. The revised program is {member(X, l;,Xlzl) <- true' member(X, lYlZ)) <- member(X, Z)1.(A third call to the clause generator has been made.) This example is somewhat contrived in order to show the mechanisms of the system in a short space.However, on more complicated examples the system very effieiently uses the available facts to sort through the generated clauses until a solution is found. The overall strategy is thus one of building approximations to the target program and then debugging them using facts provided by the user. Three kinds of errors may occur: 1. The program may compute a result that is undesired, an incorrect answer, as in the above dialogue when the guessedprogfam was able to prove member (a, lbl)' 2. The program may be unable to compute a desired answer' above when member (o, [o]) could not be u, *u, ih" "u"" proved. B. The program may not terminate. In the first type of error the debugging mechanisml in the user system simulate the incorrect computation and ouerv the ;il;; ,rr" check to facts of and/or the database "o*"iiir', step inthe computation. When a clause is found that computet is disurr-irr"o""".t result from correct premises, that clause when dialogue' above in the carded from the program. Thus, proven, ihe'clause (o, was member [b]) result the incorrect true was shown to belncorrect and was member (x,y).discarded. In the second type oferror a simulation ofthe failed
computation is performed to find what predicate pt needed to be pto'tr"tr that was not proven. Then a call is made to the clause generator to find a new clause that will yield Pr in the given computation. In the third type of error, where nontermination occurs,the interpreter halts after a prespecifiedlimit on the computation size has been exceeded.The processorthen looks for an unending loop where the same computation state is reentered repeatedly, and it may also query the user concerning violations to a well-founded ordering neededto ensure termination. This debugging procedure leads to the discovery and removal of a clause in the program. Shapiro tested this system in a variety of problem domains and compared it with various other program generation systems in the literature. For example, consider the problem solved by the LISP synthesrzerdescribedin Ref. 13: Construct a program to find the first elements of lists in a list of atoms ut a fists. Thus, the target program should be able to read input la, [b], c, ldl, lel, fl and compute the result lb, d, el. Shapiro's system needed 25 facts to solve this problem and g"n*tuted the following code after 38 secondsof computation time: {heads([ ], [ ]) -- true
Z, W)' heads(ttxIyl lZ1,lXlW l) '- heads( Z)\ heads(txlyl, Z) e atom(X),heads(Y, Biermann's system used only the single example given above and produced a correct regular LISP program after one-half hour of computation. Theoretical lssuesin synthesisfrom Examples.A program synthesis system is called sound if whenever a program is glrr"ruted from a set of examples, it can properly do all of those f*u*ples. The system is called complete for a class C of progru*, if it can generate all of the programs in the class. The properties of soundnessand completeness(qv) are desirablefor a program synthesis algorithm because they guarantee at least a minimal degree of behavioral acceptability for that algorithm. An example of a synthesis method that is both sound and complete is the algorithm that simply enumerates all the members of a class c until a program is found that properly executesthe given example behaviors. Two restrictions on the be class c are neededbefore the algorithm will work; c must be must it and . Pt,Pz,P3, , members its call enumerable, C in Pj program decidable for each behavior B and each premore stated be pi can achievesB. The algorithm wheth er cisely as shown below: Algorithm Inpuf.' A finite set S_of behaviors for the target program. Outp't' A program P from class C with the property that P' can execute each B in S' 1. ^' Ji +- 1. 2. while there is B in s such that P; cannot execute B, incrementT' 3' return with result P;' This algorithm is sound by its very construction' one can show it is complete on C by considering its behavior in attempting to synthesize an arbitrary program in c' suppose P2
AUTOMATIC PROGRAMMING
is the first program in the enumeration Pt,Pz,Ps, . . . that is capable of executing all the behaviors of the target program. Then one can give the algorithm randomly selectedbehaviors of P7 and observewhich P; is generated. If Pj is not Pr , the user will detect the problem either by testing Pi or by studying its code.Then more examples can be given until the enumeration is forced to find Pr. There will always be examplesto achieve this becausePr is, by definition, the first progTamcapable of all of the target behaviors. So the algorithm is complete. An interesting pragmatic discovery that has come out of this researchis that very few examplesare neededto achievesynthesis of most prograffis, even some very large ones. Another important characteristic of the enumerative algorithm, as shown by Gold (16) is that it is input optimal in the following sense:If another algorithm is proposedfor generating programs in class C on the basis of behaviors, it will not be true that all programs in C will be generated from fewer behaviors than with the enumerative algorithm. These results have practical significance becausethe flowchart synthesis algorithm of Figure 5 is functionally equivalent to the enumerative stratery if it is executedrepeatedly for . until a program is synthesized.This means L _ L, 2, 3, that the flowchart synthesis method is sound, complete, and input optimal on the class of all flowcharts. Furthermore, many of its variations have similar properties. For example, the function merging technique of LISP program synthesis is sound, complete, and input optimal on the class of regular LISP programs. Thus, these methodologiesare not heuristic in the sensethat their abilities to converge to a solution are in any way unpredictable. The Summers synthesis method is sound, and a variation of it has been proved to be complete over a class of programs defined by Smith (17). The Shapiro methodology is sound and complete over the class of programs that can be constructed with rules from the rule generation routines.
29
correctness through further dialogue, select data structures for efficient execution, and code the output in a traditional programming language. The goals of the research were twofold: to learn the nature of the problems associatedwith assembling a wide variety of technologies into a single automatic programming system and to provide an environment within which these technologies could be further studied.
SystemDesign. An example of this type of system is the PSI automatic programmer (29), which is organized as shown in Figure 11. Here the first set of modules handle the acquisition phase of the synthesis when the user is being interviewed and a high-level version of the program is being assembled.The lower portion of the figure shows the coding phase,where efficiency decisions are made and code is generated. The acquisition phase begins with a parser-interpreter (see Parsing), which receivesnatural-language input from the user and constructs a semantic net (see Semantic network) representation of what the user has said. The discourse module monitors the input, attempts to discover the user's intentions, and coordinates the various system functions to achieve the desired result. The explainer generates user-friendly questions posedby the system or outputs a description of the program model. The domain expert builds fragmentary pieces of high-level code for solving parts of the problem and passes them on to the model builder, which assemblesfragments into a high-level version of the target program. The trace expert can interpret illustrative inputs from the user and usefully supplement other information sources.The coding phaseof the processing can involve considerable revision to the program model received from the acquisition phase.The coder and efficiency expert work together to evaluate various alternative data structures for the target program, make choiceson representation, and create the final code. In the following sectionsthe flavor of each of these stagesis illustrated by an example. However, the descriptions are necHistoricalRemarks.One of the earliest papers on synthesis essarily approximate and oversimplified, and the original from exampleswas doneby Amarel (18).Later Solomonoff(19) sources should be consulted for more accurate and detailed and Gold (16) proposed the grammatical inference problem, information. which resulted in a series of studies on the construction of grammars from their generated strings [Biermann and FeldU s er man (20), Angluin (21), Blum and Blum (22), and Feldman, Gips, Horning, and Reder (23)1.In the early 1970sBiermann and associates[8,9,11] developedstrategies for program synthesis from traces while a number of researcherswere beginning to study synthesis proceduresfor LISP code lBiggerstaff (24) Hardy (25), Kodratoff and Jouannaud (26), Shaw, Swartout, and Green (27), and Summers (14)1.Biermann and Smith Acquisition Trace D i s c our s e (28) developed a strategy for hierarchically decomposingexphase expert module amples and generating LISP codesusing production rules.
't
I ,1,
ProgramSynthesisthrough Natural LanguageDialogue Although the techniques given above provide fundamental mechanismsfor program synthesis, they need to be embedded in a larger system that can acquire the information for synthesis, provide the neededdomain and programming knowledge, coordinate the various synthesis processes,and generate an acceptable output. Several of these large systems were constructed during the 1970swith very ambitious goals. The systems were to interview the user in natural language, acquire a model of the computational processto be undertaken, verify its
r"o"r Prosram IY
_-t
l - - - " ' r B l o h a s- er
l9*1,,,
| rargetlrorrrt Figure
11. Functional
diagram for PSI system.
30
PROGRAMMING AUTOMATIC
ExampleDialogue. The PSI system is designed to handle dialogues of the following kind where we are assuming the user wants a progfam to add a column of numbers.
pr0gr0m
usER.Please write a program to add up a set of numbers. parts
System. What would you like to call the program? usER.Sum. System. What numbers are to be added? usER.The program should read nonnegative integers from the user, sum them, and then print the answer.
print endmorker
System. When should the program stop reading? usER.When a negative number is read'
Enswer
integer
System. Is there anYthing else? usER.The program should work as follows:
Output Input Output Input Output Input Output
READ: 7 READ: T2 . READ: -1 THE SUM IS 19.
System. OkaY, how is this?
n e g a tvt e unbounded
numbers
n o n n e g ot vt e
integer
structure. of dialogue-meaning Figure 12. Representation
The d.iscoursemodule (31) maintains a senseof progressin the dialogue by attempting to build a representation of the PROGRAMSUM (INPUT,OUTPUT); user's desires and initiating actions to satisfy them. It has VAR X, ANS: INTEGER communications with all acquisition modules and attempts to BEGIN coordinate activities to achieve a cohesive interaction. The ANS::0; user may introduce concepts that need to be clarified and fit WRITELN ('READ:'); into the total theme. The system domain expert and model READ (X); WHILEX>:0DO builder may recognize where information is needed to comBEGIN plete portions of the program description, and the discourse ANS::ANS+X; module must formulate queries to the user through the exWRITELN('READ:'); plainer (82) to obtain this information. Thus, in the dialogue READ (X); uborr",the system recognizesupon the mention of a program in END; first sentence that such a program should have a nam€, I the WRITELN ('THE SUM IS" ANS) of code,and other related information. This causesit, for body END example, to formulate and return the next query. If the system finds all its internally generated questions answered, The Functional Modules: Acquisition. The first module to (30), which re- it may then passcontrol back to the user as is illustrated above process user inputs is the parser-interpreter con- in the last system query: "Is there anything else?" This is their ceives incoming sentences,parses them to identify The called a mixed initiative dialogue where one party introduces stituents, and th.tt builds a meaning representation. issue that is resolved in subsequent interaction, then the many issues concerning the analysis of complex sentences' an party mentions a point that requires discussion, and so construction of a meaning structure, resolution of pronominal other forth. other and quantifi.ers and elliptical structures, and handling of The domain expert (33) has the task of converting the seproblems are handled at this level. The output of this stage of net class of information (Fig. LZ) to code fragments pro."rsing is a semantic net that assemblesas well as possible mantic Fig- written in a very high level language. Thus, a generic input the individual parts of the dialogue into a cohesiveunit. routine of the following form might be retrieved to handle the that ure 12 shows the nature of the meaning representation read function called for in Figurc L2: would be constructed from the example dialogue. The individto connections and nodes few a ual sentenceseach contribute s<-0 the total network. As the information arrives, it is properly evolves gradually input (x) attached to the existing structure, which It toward a complete r"pr"rentation of the target program. while x is valid data the may not be possible for ttre parser-interpreter to complete or SeSu{r} attachment of au parts of the description becausedomain programming knowledge may be neededin places,but most of input x Itt. ptimary connections can be made'
AUTOMATIC PROGRAMMING
And a generic collection routine might be instantiated to do the required summing operation:
read*
S-0 output ('READ:')
ans <- 0
input (x)
while more data
while x > 0 S<-Su{r}
;:::.:+y Finally, high-level codeswould be producedfor the print routine,
output ('READ:') input (x)
* sum ans <--0
output (z)
while not empty S and all of these code fragments would be transferred to the model builder for assembly of the high-level program. Thus, the domain expert instantiates somewhat vague information from the earlier stages in concrete though rather high-level code. It fills in some information where domain knowledge is needed but certain connections between the fragments are still not made. The trace expert (33) is designedto receive example inputoutput pairs for the target program, traces of snapshots of the program's behavior, or high-level traces expressedin natural language. This expert generatesa sequenceof state-characterrzing schemata that are then used in the creation of codefragments for the model builder. In the example dialogue two effects result from the given trace. First it is noted that the user wants the target program to output a read prompt before each input so this codemust be merged into the given codesegment: s<-0 output ('READ:') input (x) while x is valid data S+Su{r} output ('READ:') input (x) Secondthe trace gives the system a way of checking its generated program for acceptability. The model builder (34) receives fragments from the other processesand attempts to assemblea high-level version of the target program. This assembly may involve very complex processing,including the compilation of all the information units and control structures needed and their proper coordination. The model builder, besides receiving information, can return information to the earlier stages regarding the portion of the codecurrently being discussedand thus provide possiblereferents for unresolved incoming noun phrases. In the current example the processormust collect the above three-codesegments and coordinate them by noting that the set being read in the first segment is identical to that being added in the second. Also, the correct data object must be printed at the end.
;;TJ -TM(S) *print* output (ans) The program model is then sent to the coding phase for the creation of efficient machine code. The FunctionalModules: Coding. The coding module (35) contains a large amount of detailed programming knowledge in the form of production rules (seeRule-basedsystems).These rules are capable of proposing wide variations in the form of the data structures and the code. At selected points in the generation process possible alternatives are created and handed to the efficiency expert (36) for evaluation. The efficiency expert has tools similar to those of a human being for making choices: analysis of algorithms techniques, general knowledge, and simulation. The system uses probabilistic information about data and the costs of machine operation to compute space-time cost functions for various alternatives and passesresults back to the coder.Thus, through a processof generating alternatives, evaluation, and movement down the search tree, the production rule system refines the program model into machine-executableform. The general organization of the coding module follows the tradition of expert system technology. It receivesthe program model from the acquisition phase and has several hundred production rules for modifying this model and converting it one step at a time to concrete code. Program model
.t, Partially refined description
.t, :
v Target program Many possible rules may be applicable at a given stage in the development.For example, if a set of objectsis represented
32
PROGRAMMING AUTOMATIC
in the program model, the coder must make a decision concerning how the set is to be representedin the target programming language. Production rules will be available to select specificrepresentations such as arrays, linked lists, bit maps, and so forth. An agenda (seeAgenda-basedsystems)orders the tasks to be addressed in the coding process and guides the selection of the rule to be tried next. Evaluation of each new partially refined program description is done using heuristics methods (seeHeuristics) and calls to the efficiency expert. The more attractive paths in the sequential search for an acceptable program are moved toward the top of the agenda for continued expansion and refinement. The stepsthat the coding phase might follow in completing the example of this section will be describednext. The production rules given here are not actually taken from the system but give a feeling for how it works. The system might have a production rule for combining Ioopsthat scan the same set: If two separate loopsincrement through the s&rneset and their code segmentshaue independenteffects,then they can be com' bined into a single loop. Following the style of the original author, the rules will be given here in English rather than in a detailed notational form. The result of this rule applied to the two loops in the program model for reading and summing would be the following: Se0 output ('READ:') input (x) ans <--0 while x = 0 S<--Su{r} y €- remove from (S) ans+ans*y output ('READ:') input (x)
Finally, a long series of rules is neededto actually create the executable code.The initialization lines, declarations, and all other special syntax must be properly assembledto achieve the target code. PROGRAMSUM (INPUT,OUTPUT); VAR X, ANS: INTEGER BEGIN ANS:-0; WRITELN('READ:'); READ (X); WHILEX>:0DO BEGIN ANS::ANS+X; WRITELN('READ:'); READ (X); END; WRITELN('THE SUM IS" ANS) END. The lmplementation.The PSI system was completedin the mid-L9?0s and is capable of constructing programs of several types including some concept formation programs and some numerical programs. A number of example dialogues have been published (29) including interactions of up to about 50 sentencesthat result in several dozen lines of LISP code. Historical Remarks.Heidorn (37) built the first and one of the most impressive natural-language automatic programming systems,NLPQ, which was aimed at the solution of operations-research-queuing problems. The system translated incoming sentences into a semantic network problem representation that then could be translated back to the user in paraphrase for verification. Then the network was compiled into the GPSS simulation language and run on conventional software. Simultaneously with the Green project (29), a synthesizer called SAFE was built by Balzer et aI. (38,39)that emphasized the automatic acquisition of domain knowledge and the creation of software from informal specifications. Also, Martin et aI. (40) built a system that placed heavy emphasis on highquality processingof natural language. Biermann and Ballard (41) constructed an interpreter for natural-langu ageprograms that was robust enough to be used by college students in solving programming problems (42,43).
print (ans) At this point the system could notice the redundancy of the structure and employ the following rule to delete it: data S If a single d.atastructure is loaded and then emptied without any intirmed.iate references,it can be remoued. output ('READ:') input (x) anse0 while x > 0 ans+ans*x output ('READ:') input (x) print (ans)
ProgramConstructionUsingMechanizedAssistant More recently researchershave been examining the role that AI can play in industrial programming environments where large toft*ut" systems are specified, coded, evaluated, and maintained. Here the whole life cycle of the software system is under consideration: The client and the professional systems analyst discuss informally a proposed software product. More formal specificationsare then derived, performance estimates are made, and a model of the system evolves. Many times specificationsare modified or redefined as analysis proceeds. The next phase is the actual construction, documentation,and testing of the product. After release into the user environment the tyttu* may be debugged and changed or improved on a regular basis over a period of years. A developing idea in somecurrent automatic programming projects @4,45) envisions a mechanrzedprogrammer's assisiant that would intelligently support all of the aboveactivities.
AUTOMATIC PROGRAMMING It would provide a programming environment for the user capableof receiving many kinds of information from programmers, including formal and informal specifications,possibly natural-language assertionsregarding goals,motivations, and justifications, and codesegments.It would assist the programmer in debugging these inputs and properly fitting them into the context of the programming project. It would be knowledge basedand thus capable of fully understanding all of the above inputs. It would provide library facilities for presenting the programmer with standardized program modules or with information concerning the current project. It would be able to generate code from specifications, program segments, and other information available from the programmer and other sources.It would be able to understand program documentation within the code and to generate documentation where necessary. Finally, it would maintain historical notes related to what was done, by whom, when, and, most important, why. All of these functions are envisioned as operating strictly in a supportive role for human programmers, who are expectedto carry on most high-level tasks. Thus, the concept of the automatic programmer's assistant placesthe human programmer in the primary position of specifying the program and guiding progress toward successful implementation and maintenance. The task of the assistant is to maximally utilize available technologies to automate as many lower level functions as possible.
33
merations, state transformation sequences,and other constructions. The V language is being implemented within the CHI project (47,48), which emphasizes the idea of self-description. That is, the CHI system is a programmer's assistant that provides an environment for using the v language in program development. The CHI system is also being written in the V language; hence,it is "self-describittg."V has been designedto include capabilities for expressing program synthesis rules as well as its many other facilities. Another approach (44,49)is basedprimarily on the concept of plans for programs that contain the essential data and control flow but exclude programming language details. An example of a plan appears in Figure 14, where the computation of absolute value is represented. The advantages of such plans are that because they locally contain essential informutiorr, they can be glued together arbitrarily without global repercussions. This facilitates the use of a library of small standard plans [or "cliches" (49)],which can provide the building blocks for the assembly of large plans. This approach uses code and plans as parallel representations for the program and allows the user to deal easily with either one, as illustrated in Figure 15. If the user choosesto work in the plan domain, each action in creating or modifying a given plan results in appropriate updates in the code domain. The coder module translates the current version of the plan into code. If the user wishes to work with the code,the The ProgrammingParadigm.This view emphasizesthe de- analyzer appropriately revises the associatedplan. compositionof the programming task into two stages,as illusThe use of the system could begin with the assembly and trated in Figure 13, systems analysis and programming. The manipulation of various plans from the library to result in a first stage involves the development of formal specifi.utior6 large plan. Then it could be translated automatically to code. and deals primarily with what performance is required; the Another usage might begin with an existing code segment latter includes the decompositionof the task into an appropri- that needsto be modified. Its correspondingplan could be autoate hierarchy of subparts, the selection of data structures, and matically created and then manipulated in appropriate ways, the coding and documentation of the product. The former is including possibly the addition of some library routines. Then assumedto be the appropriate domain for considerablehuman translation back to codewould yield the desired codewith its involvement, whereas the latter is expectedto be more amena- revision. ble to automation. The automatic programmer's assistant concept assumes In order to begin implementing such an assistant,it is nec- that most coding functions below the formal specificationstage essary to have appropriate languages to handle the many will be automated. Once the specifications are derived, the kinds of information that appear in this application. One ap- machine will be able to select data structures for the developproach is to introduce the conceptof a wide-spectrumlanguage ment of efficient code,generate the code, and produce approthat can be used at all levels of implementation from the speci- priate documentation. This level of automation has -utty imfication of requirements to high-level coding of the actuaf target program. An example of such a language is V (46), which has as primitives sets, mappings, relafions, predicates,enuData Controf r n fo r m o ls p e cfi r c o t i o n
s g s t e mo n o gl s rs
f o r m a ls p e c r f r c s t r o n
p r 0 g r 6 m mnt g
p r o g r 6 mp r o d u c t Figure
Testnegative
h u m a nl a b o r rn t e n svl e
-l
h e o v rI g outomoted
_l
13. Stages in program construction.
Data
Control
Figure 14. Example plan for computing absolute value showing both flow of data and flow of control.
34
AUTOMATIC PROGRAMMING
Library
Analyzer
User to codeand assoFigure lb. Architecturegiving userparallelaccess ciatedplans. plications in that programmers might then wish to automatically generate several versions of the target system while varying specifications or other implementation parameters. Thus, a higher degree of optimi zation would be possible becausemore experimentation could be done on different design strategies. A secondbenefit made possible by this approach would be that program maintenance and improvement would be done in a new way. Instead of modifying a system by workin g at the programming language level, changes would be made by workin g at the specification or planning level. After the completion and validation of the new specification or plan, the automatic program generator would then be released to assemble the product, again repeating, where appropriate, previous design decisions but modifying decisions both at local and global levels where earlier choices are no longer acceptable. The automatic programmer's assistant will thus be aimed at revolutioni zrrrg software development processes.With the successof this research,human programmer activities will be moved more into the software specification cycle, Ieaving code generation to the assistant. More efficient programs may be possiblethrough more extensive experimentation with design alternatives. Fewer programming personnelwill be neededfor actual coding and documentation, and fewer errors should occur at these levels. Program maintenance and upgrading will be done by working with plans and specificationsrather than with the code itself. Conclusion Automatic programming is the processof mechanically assembling fragmentary information about target behaviors into machine-executablecode for achieving those behaviors. This section has described the four main approachesto the field followed by researchersin recent years. The field is still very much in its infancy, but already many useful discoveries have been made. Becauseof its tremendous importance, it is clear that automatic programming will be a researcharea central to AI in the years to come. Additional readings on the subject are found in Refs.50-54. BIBLIOGRAPHY program 1. z. Manna and R. waldinger, "A deductive approach to (1980)' synthesis," Tra,ns. Progr. Lang. Sysf. 2(L),90-121
2. C. C. Green, "Application of theorem proving to problem solving," Proc. of the First Int. Joint Conf. Artif. Intell., Washington, DC, May 1969,pp. 219-239. 3. R. J. Waldinger and R. c. T. Lee, "PROW: A step toward autowriting ," Proc. of the First Int. Joint Conf. Artif' matic ptogr* Intell., Washington, DC, May 1969, pp' 24L-252' 4. M. Broy, Program construction by Transformations: A Family Tree of sorting ProgTams," in A. W. Biermann and G. Guiho (eds.),Computir Program SynthesisMethodologies,D. Reidel, pp' 1-50, 1983. b. R. M. Burstall and J. Darlington, "A transformation system for developingrecursive programs,"JACM,24, 44-67, L977. 6. Z. Manna and R. Waldinger, "synthesis: dreams ) programs," IEEE Trans. software Eng., SE-5, 294-328 (1979). 7. W. Bibel and K. M. Hornig, LOPS-A System Based on a Strategical Approach to Program Synthesis, in A. Biermann' G' Guiho, and y. Kodratoff (eds.),Automatic Progrqm Construction Techniques,Macmillan, PP.69-90, 1984' g. A. W. Biermann, "On the inference of Turing machines from sample computatioDs,"Artif. Intell.3, 181-198 (L97D. 9. A. W. Biermann and R. KrishnaswailY, "constructing programs from example computations," IEEE Trans. software Eng., sE'z' 141-153 (1976). 10. D. A. Waterman, w. s. Faught, P. Klahr, s. J. Rosenschein,and R. wesson, Design Issues for Exemplary Programming, in A. Biermann, G. Guiho, and Y. Kodratoff (eds.),Automatic Program Construction Techniques,Macmillan, 433-461, 1984. in 11. A. W. Biermann, "Automatic insertion of indexing instructions (1978)' program synthesis," Int. J. Comput. Inf. Sci., 7, 65-90 A L2. D. R. Smith, The synthesis of LISP Programs from Examples: (eds.), AutoKodratoff Y. and G. Guiho, survey, in A. Biermann, -324, matic Program Construction Techniqttes,Macmillan, pp. 307 1984. from 13. A. W. Biermann, "The inference of regular LISP progTams exampl€s," IEEE Trans. sys/. Man cybern SNIC-8, 585-600 (1e78). ,,A methodology for LIsp program construction p. D. summers, 14. from examples,"JACM 24, L6L-I75 (1977)' Cam15. E. Y. Shapiro, Algorithmic Program Debugging, MIT Press, bridge, MA, 1982. l0' 16. M. Gold, "Language identification in the limit," Inf' Contr' 447-474 (1967). 17. D. R. Smith, A Classof SynthesieeableLISP Prograinzs,A.M. Thesis, Duke UniversitY, L977. Program 18. S. Amarel, On the Automatic Formation of a computer A' D' which Represents a Theory, M. Yovits, G. T. Jacobi, and spartan systems-L962, organizing Goldstein- (eds.), in self Books,PP. 107-t75, 1962' ,.A formal theory of inductive inferenc}," Inf. 19. R. solomonoff, Contr. l(22),, 224-254 (1964)' zo. A. w. Biermann and J. A. Feldma', A survey of Results in Gram(eds.),context' matical Inference, in y. H. Rao and G. w. Ernst Technolopattern Intelligence Machine and. Recognition Directed. giesfor Inforrnation Processing,IEEE Computer Society Press, L982,pP. 113-136. ,,on the complexity of minimum inference of regular 2r. D. Angluin, 39,337-350 (1978)' Contr. Inf. sets," "Toward a mathematical theory of inducBlum, M. and Blum 22. L. (1975). tive inferen cQ,"Inf. contr. 28, L25-L55 S' Redet, Grammatical and 23. J. A. Feldmatr, J. Gips, J. J. Horning, cs-125, computer Report Technical Infirence, and, complexity 1969. university, science Department, stanford Automatic 24. T. J. Biggerstaff, c2: A Super compiler Model of program-Jing, Ph.D. Dissertation, University of Washington, Seattle, L976.
AUTOMATION, INDUSTRIAT
35
25- S. Hardy, "synthesis of LISP functions from exampl€s,"proc. of 49. R. C. Waters, "The programmer's apprentice: Knowledge based the Fourth Int. Joint conf. Artif. Intell., pp. 240-z4s (lgzb). program editing," IEEE Trans. Softtar. Eng., SE-g(l), L_r2 26- Y. Kodratoff and J.-P. Jouannaud, Synthesizing LISP Programs ( 1982). Working on the List Level of Embedding, in A. Biermann, G. 50. A. Barr and E. A. Feigenbaum, The Handbook of Artifi.ciat IntelliGuiho, and Y. Kodratoff (eds.),Automatic program Construction gence,Vol. 2, Kaufmann, Los Altos, CA, LggZ. Techniqu,es, Macmillan, pp. B2b-874,1gg4. 51. A. w. Biermann, Approaches to Automatic programming, in M. 27. D. shaw, w. swartout, and c. Green, "Inferring LISP programs Rubinoff and M. C. Yovits (eds.),Aduancesin Computers,Vol. 1b, from exampl€s," Int. Joint conf. Artif. Intell., 4, 260-267 (1975). AcademicPress,New York, pp. 1-69, L976. 28' A. W. Biermann and D. R. Smith, "A production rule mechanism 52. A. w. Biermann, G. Guiho, and y. Kodratoff, (eds.),Automatic for generating LISP code,"IEEE Trans.sys/. Man cybern, sMCProgram construction Techniqtres,Macmillan, 1994. g, 260_276 (1979). 53. G. E. Heidorn, "Automatic programming through natural lan29. C. Green, The Design of the PSI Program Synthesis Syst em,proguage dialogue: a survey," IBM J. Res.Deuelop.902-g1g (19z6). ceedingsof the SecondInternational Confere'nceon Software Engi54A. W. Biermann, "Formal methodologiesin automatic programneering,San Francisco,pp. 4-19, 1926. ming: A tutorial," J. Symbol. comput. l, 119-L42 (Lg8b). 30' J. M. Ginsparg, Natural Language Processingin an Automatic Programming Domain, Report No. srAN-cs-Tg-6?1, computer A. BrnnlaANN science Departmenf, stanford university, 1g7g. Duke University 31. L- Steinberg, A Dialogue Moderator for Program Specification Dialogues in the PSI System, Ph.D. Thesis, Stanford University, 1980. AUTOMATION,INDUSTRIAT 32. R. Gabriel, An Organi zation for Programs in Fluid Dynamics, Report No. STAN-CS-81-856, Computer Science Department, The term automation as a combination of automatic and operStanford University, 19g1. ation was coined by Ford executive D. S. Harder in Ig47.It 33' J' V. Phillips, "Program reference from traces using multiple connotesthe use of machinery to augment or replace human knowledge sources,"Int. Joint conf. Artif. Intell., s, aIz O}TT). endeavor. Although AI plays a very minor role in industrial 34' B. P. McCune, "The PSI program model builder: synthesis of very automation today, within a decade it can be expectedto behigh-level programs," SIGART Newsletter,64,180-1gg (Lg7T. come one of the drivers of industrial automation. 35' D. R. Barstow, Knowled,ge-Based, Program Construction, Elsevier The development of industrial automation dates back sevNorth-Holland, Amsterdam, lg7g. eral thousand years, but it acceleratedduring the Industrial 36' E' Kant, "The selection of efficient implementations for a high Revolution. The steam engine provided a new technique for level language," SIGART Newsletter,64, L40_r46 (L}TT. powering manufacturing tools, interchangeable part, gurr. u 37' G. E. Heidorn, "English as a very high level language for simula- new methodology for designing products, and assembt! hnes tion programming," SIGPLAN Noticesg, 91-100 (1924). presented a new approach to logistical control. 38' R' M. Balzer, N. Goldman, and D. Wile, On the Transformational Up until about 1950 nearly all industrial automation sysImplementation Approach to programming, proceed,ings of the tems involved fixed automation. Due to its inflexibility and second Internationar conference on softwire Engineiring, pp. high cost, such equipment could be justified only for high337-344 (1976). volume products with unchangrng designs. Since 1gb0 39. R. M. Balzer, N. Goldman, and D. wile, .,rnformarity comin program puters have facilitated the new technology of programmable specifications,"IEEE Trans. softwr. Erg., sE-4, g4-10g (1gzg). automation. Even today, however, over 807oof *t automation 40. w. A. Martin, M. J. Ginzberg, R. Krumland, B. Mark, M. Morgen- is fixed rather than programmable. stern, B. Niamir, and A. sunguroff, Internar Memos, Automatic From a historical perspective, manufacturing has been Programming Group, Massachusetts Institute of laTechnology, bor intensive, it is now capital intensive, and it is becoming Cambridge,MA, IgT4. data intensive. 4L' A' W' Biermann and B. W. Ballard, "Towards natural language programming," Am. J. comput. Linguist.,6, (1gg0). z1-g6 42. A. w. Biermann, B. w. Bailard, and A. H. sigmon, ,,An experi- Objectives mental study of natural language programming,,, Int. J. of Man_ Mach. stud.,1g, 71_g7, 19g3. Industrial automation addressesthe processesby which prod43. R. Geist, D. Kraines, and p. Fink, Naturar ucts are designed, developed, and manufactured. The Language computing objec_ in a Linear Algebra Course, Proceedings of the National Ed,uca- tives are to improve effrciency, increase quality, and reduce tional computing conference, rggz, pp. zod -20g. the time to effect changes (see Computer_aided-desigrr; Com44' C' Rich and H. E. Shrobe,"Initial report on puter-integrated manufacturing). a LISP programmer,s apprentice," IEEE Trans. softwr. Erg., sE-4, 4b6-467 (rgzg). As a result of evolution, human hands are well adapted for 45. R. Balzer, T. E. cheatham, Jr., and c. Green, ,,sofbware technol_ holding branches,but !h"v are poorly adaptedfor *ort fuctory ogy in the 1990's:using a new paradigm cornpltter,16, gg_4b tasks. A major focus of auto*"iiott is, therefore, the reduction ,,, (November 1g8B). of direct labor in manufacturing. 46- c. Green,J. philrips, s. westford, T. pressburger, Human minds are adept at learning new skills, but B. Kedzierski, s. they are Angebranndt, B. Mont-Reynaud, and s. T-appel, Research on poor at remembering large amounts of data. In factories, Knowledge-Based programming and Argorithm therefore, such data is normally written down on paper. Design-lggl, The TechnicalReport KES. U. 81.2,Kestrel Institute, palo Alto, 19g1. volume of paper results in inefficiency, poor quality, and slow 47' C' Green and S. Westfold, "Knowledge-based programming self- response.A secondaryfocus of industrial automation is thereapplied," Mach. Intell., 10, (1gg1). fore the elimination of paperwork. More generally, it is a re48' D' R' Smith, G. B. Kotik, S. J. Westfold, "Researchon knowledge- structuring of the indirect operations that support the manubased software environments at Kestrel Institute,,, IEEE Trans. facturing floor, including design, drafting, planning (qv), softwr. Eng. sE-rr(11), LzTg-Lzgl (19gb). and
control.
36
INDUSTRIAL AUTOMATION,
Move. Within plants small vehicles are frequently used to move parts. They may be human operated or they may autoAlthough industrial automation has enormously increasedthe matically foltow a desired path. In order to facilitate moveworld's averagestandard of living, the social impact of automent, objectsare often placed on pallets. Each pallet may conmation is a controversial subject. When people are displaced tain a single part, an arTay of parts, ordered parts in by automation, it is no consolation to realize tnat they are a magazines,or disordered parts in tote boxes (seealso Autonosmall dislocation in a globally good picture. mous vehicles). A historical example is the farming industry, which until When higher throughput is required, conveyor systemsare population. Today the Middle Ages employed over 90Voof the used. When parts and materials are being moved, sensorson the figure is much smaller, even when supporting industries the conveyor can be used to detect and count the passage of like farm machin€rY, pesticides, transportation, marketing, objects,or codedpatterns can be used to keep track of what is and so on are included. The social impact was limited by the actually in transit and where it goes. fact that the transition occurredover a considerablenumber of Store. Store operations are a means to smooth the flow of years. parts and material. Objects that are stored constitute either One potentially unique aspect of today's situation is the work in processor final inventory. existenceof world markets, which may be reaching the limits Storage may take the form of a magazine of parts or a small of growth. Another is the availability of powerful inexpensive buffer associatedwith an individual tool. At the other end of computers, leading to speculation that the new automation the scale, it may be an enormous stacker crane warehouse, may raise rather than lower required skill levels. covering more than 10,000 ftz (929 m2) to a height of 50 ft Over the next hundred years or more it is possible that (L5.2m) and containing millions (106)of items. industrial automation witl cause the number of direct and indirect manufacturing jobs to decreaseultimately to a numEngineeringDesign. In recent years there has been a rapid ber near zera. Whether this outcome actually happens, growth in the use of computer-aided design (qv) systems to whether it is desirable, and whether appropriate social policy acquire, manipulate, and maintain design data. Engineering can be formulated will remain controversial. deslgn of a product plays an overwhelmingly important role in determining how that product is manufactured. Taxonomyof IndustrialSYstems Design information for a typical discrete object includes and process. Industrial systems range from continuous processto discrete data on form, hierarchical composition, information on the input gxaphics to use Designers Form. ;;;;;rr, Uuftney oro"ily involve a blend oi bottt extremes. tools, generally in and products of finish and geometric shape major activithree are there Within all industrial systems and tolerdimensions with views, ties: the engineering design of the products and the manufac- terms of front, side, and top built, by are they before visualized be turing procfsses,thJ logistics operations to ensure that manu- ances.objects may then with or frames, wire show may Views drawings. means of facturing operations prJceedsmoothly, and the manufacturing with optional color renderings solid or lines, hidden without operations themselves. and shading. Computer-aided design systems have automated the creaManufacturing Operations. Manufacturing operations can tion of drawings. In general, however, any set of drawings is be recursively classified as make, test, move, or store. piece likely to have inconsistenciesthat require human interpretafabricate to Make. In make operations tools are used tion for their resolution. To reduce the current ambiguities, subproducts. or products into parts that are then assembled store object models that contain geoAssembly is defined to be orienting and placing parts in prox- computers will need to on form, not just drawings. information complete imity for subsequent fastening operations. Assembly tools metricatly progresses,computers modeling object of (see technology the As Rotrttg" from simple mechanisms to multiaxis robots function, cost, and ease of manufacture. model to used be can botics). Dimensionality makes electronic modeling generally much In discrete manufacturing, make operations are dependent easier than mechanical modeling. The most elementary meon the location, orientation, and shape of the workpiece. A aschanical property that two things cannot occupy the same that common procedure, therefore, is to provide fixturing provide spaceat tle r.*" time is nonlinear and difficult to model. can sensors Alternatively, sures workpiece placement. In the mechanical design domain structural deflection is feedbackto allow adaptive make operations. an object into a mesh of small eleIesf. Tools are also used for test operations. Usually, test- simulated by subdividing differential equation can be appropriate an which for ments used ing is used to cull bad products. Increasingly, it is being In the electronic design domain complex as a means of providing feedback to control or correct pro- solved iteratively. digital logic circuitry can be simulated from the model of the cesses. and interconnections' In some casestest tools are componentswithin make tools, logic elements When computers are used to automate electronic design, providing sensory feedback to allow manufacturing processes they incorporut. design rules to assurethat the objectis buildto be controlled more effectively' they can generate data automatically to control Test tools are also used extensively for quality assuranceto able. often, manufacturing iools that build the completed object. Generadetermine when in-processquality is outside of acceptablelimtion of *"rrrri"cturing instructions is referred to as cADl its to the extent that intervention or correction is needed. computer-aideddesign/computer-aidedmanuData collected from tests may be written down or automati- CAM, meaning cally collectedin a database.If tests uncover the existenceof a facturing. In computer-aidedmechanical design,however, it is beyond defect, statistical analysis may determine the probable cause state of the art to build in many well-known design rules. the of the defect. Social lmpact of Automation
AUTOMATION,INDUSTRIAL
For example, for ease of assembly,parts should be symmetric or markedly asymmetric; shafts and holes should be chamfered; and parts should not interlock. BecauseCAD systemsdo not incorporate these rules, it is possibleto design objectsthat are unnecessarily difficult or even impossible to manufacture. The proliferation of low-costplant floor computerswill lead explosion of fully automated control systemsthat provide an to execution-time-adaptive behavior. Significant research is neededto determine how to exploit these execution time capabitities from computer-aided design systems. Hieiirchical Composition.Hierarchical compositionis given by the bitl of materials, a description of the "part-of" relationship. This information is often specifiedimplicitly as annotation in drawings, but a more precise approachis to provide an explicit textual specification. Design for ease of manufacture generally favors objects whose bills of materials have as few parts and as few part types as possible. Process.Routings describe process steps to be performed and associatedcostsat each node of the bill of materials. Routings are used for operational control, but they could also allow the simulation and analysis of logistical properties. High-technology products differ from ordinary consumer products becausethey depend on the design of new manufacturing processes.Even when existing processesare sufficient, the selection and sequencingof processesis an issue. In discrete-part manufacturing there is some possibility of deriving routings automatically from object models.A heuristic method called group technology attempts to classify part shapes so that similar shapes can be used to imply similar routings. In continuous-processmanufacturing the choice between alternative processescan sometimesbe formulated as a linear programming problem, for which the precise mathematically optimal solution can be found. Logistics.Manufacturing logistics relates to the acquisition, storage, allocation, and transportation of manufacturing resources, including materials, parts, machines, and personnel. Logistics is important when manufacturing facilities are initially designed,as well as later, when existing facilities are operated. Logistical models allow the designer to evaluate trade-offs. For some variables, like quality and flexibility, the analysis may be subjective becausethere are no easy means to quantify the costs and benefits. After manufacturing facilities are built, logistics is concerned with planning, tracking, and controlling the ongoing operation as well as providing methods of improving the actual performance. It encompassesadministrative operations like order entry, purchasing, receiving, inventory management, shipping, and billing as well as planning operationsthat relate to long-range resources, final shipment, material requirements, and load balancing. For efficient control of complex manufacturing operations, logistics is an essential function. It is possible,in principle, to automate both the logistics planning and execution, even for manufacturing operations that are otherwise unautomated. Timeliness and accuracy of data, however, are best assured when data distribution and collection are automatic. Material requirements planning (MRP) is an algorithm that determines schedulesfor completing constituent parts of
37
a final product. For many products, the manufacturing process may need to begin years before the final product is to be shipped. MRP is sensitive to routing times. These times, unfortunately, ilay be very inaccurate whenever machine setup times are long becausethere is no easy way to infer when setup is required and when it is not. The preferred solution is to design tools to minimize setup time. Sometimes management attempts to protect against unforeseen contingencies by providing unnecessarily conservative timing information in the routings. In turn, MRP then computesunnecessarily early starts to the manufacturing activities, and work-in-processinventory abounds.Such logistical operations are referred to as push systems. The alternative to push systems are pull systemsin which the start of any operation triggers the start of antecedent steps in the bill of materials. If these antecedent steps have appreciable time delays, the lack of work-in-process inventory results in immediate work stoppages. Thus, accurate routing time and production plan data are neededfor smooth logistical operations,regardlessof whether they are push or pull. Once the system is based on accurate planning, the distinction between push and pull becomes moot. In a well-run system each step is completedjust in time to be used by the next step, and work-in-processinventory is minimized. Other names for such systems are just-in-time manufacturing and continuous-flow manufacturing. MRP by itself fails to consider the utilization of machines and personnel. As a result, even if production plans are reasonable and routings contain correct timing data, MRP may yield unworkable solutions. To complement MRP, computer programs can compute how to balance loads at the level of machines, lines, and plants. By alternating line-balancing computations with MRP computations, reasonably good overall solutions can be found. Plant floor monitoring and control systemsallow the collection of data from manufacturing tools, conveyors,and personnel to detect stoppagesand provide means for analyzing performance. But even with instantaneous data on machine availability, existing programs do not generally provide the rapid responsetime needed to manage logistics efficiently in an environment of uncertainty and rapid change. Integration Industrial systems that link the engineering design and manufacturing functions are referred to as vertically integrated. Those that couple logistics and manufacturing are called horizontally integrated. The acronym CIM, for computer-integrated manufacturing (qv), refers to idealized industrial systems in which all three functions cooperatesmoothly. One example would be flexible machining systems,which have automatically guided vehicles delivering parts between numerically controlled machine tools in darkened unmanned factories. As new parts are designed,machining instructions, bills of materials, and routings are transferred to logistical software that controls the plant floor. Software systemsmanage plant floor communications and data basesfor design and logistics. The design databaseallows several designersto work concurrently, and it provides a formal processby which completed designs get released to manufacturing. Another example would be fast turnaround lines for inter-
38
INDUSTRIAL AUTOMATION,
connecting logic elements on gate array semiconductors.After designers specify the interconnection patterns, silicon wafers are moved automatically through lengthy sequencesof lithogfaphic and chemical operations, with each wafer taking a unique route. By definition, all manufacturing is integrated, but in only a small fraction of industrial systems is this integration highly efficient. Driversof Automation Industrial automation is currently undergoing rapid growth and change throughout the world, stimulated by international competition, which motivates companies of every nation to increase their efficiency, quality, and flexibility. A major driver is the proliferation of low-cost computing hardware. A decadeago the cost of a computer neededto control a manufacturing tool might have been more than the tool itself. Today, nearly every new tool costing more than $10'000 probably contains a computer. Also, industrial automation is being advancedby software technology. Most of this technology has been in the mainstream of computer science:algorithms, languag€s,operating systems,databases,and data communications. The latest addition to this repertory is AI and, more specifically, expert systems (qv). Although AI has not yet had a major impact in industrial automation, it will probably becomea driver within the next decade.
ciently constrained that there is a reasonable prospect for AI to be practical. It appears that the problem could be mapped into one large expert system or an arcay of many small expert systems. Such an AI system might offer a heuristic approximation to MRP and load balancirg, but with a much faster turnaround time. It might be able to copewith incomplete, inaccurate, and volatile data, making fast decisions to act, to delegate, or to deny requests. Additionally, it might automatically derive subordinate objectives from higher level ones. Quality Analysis.Testing and customer feedback provide the basic inputs for quality analysis, which looks for meaningful patterns in voluminous data that are frequently irrelevant uttd obsolete. The defects being sought may be masked by purely random events, they may be intermittent, or they may dependin a nonlinear fashion on a coincidental combination of many independent systematic factors. The difficulties are compounded by bad testers and inaccurate field reports. The similarity of this problem to that of diagnosing illness in humans (see Medical advice systems) suggeststhat an AI expert system might be able to outperform the quality experts.
process Planning. Procedures used to construct object models may be different from the processesto construct the objects themselves. As a result, there may be features in the constructed model that are not identified but are nevertheless essential to processplanning. For example, an object that is almost cubicll with a groove machined away may have been representedby the union of three cuboids. Roleof Artificial lntelligence If an AI expert system were built to do processplanning, recognizing features in an object Although industrial automation offers several unique and fer- the hardest problem would be would include flats, grooves, features of types The be model. tile areas for AI research, advances in AI are likely to similar subparts, and so on. edges, rounded pockets, holes, unstructured highly of motivated more by the requirements for the approximaprocedures environments, such as the military, the office, the home, and Such recognition would require or identity of two of similarity the of tion of shape, recognition the laboratory. in a design. The symmetry of recognition the utta Within industrial automation adaptive tools in general and designs, very large, a be can models object because is difficult problem to continue will industrial robotics in particular have been and contiguous occupy generally not does pattern given feature's additional the Among research. AI be a major stimulation to decomposition canonical or features storage,and no algebra of current problems within industrial automation that have AI (see Image understanding)' potential are real-time logistics, quality analysis, process has been invented AI system would have to represent meththe the Additionally, determining planning, design for easeof manufacture, and from manufacturable features and routings inferring of ods quality' fi.nancial value of flexibility and routings based on availalternative Most of these problems are characterized by people mud- rules for choosing among processes. dling through somehow,without understanding or a good al- able gorithm to guide them. Since computers can communicate DesignFor Easeof Manufacture. Design for easeof manufacmuch more rapidly and precisely than people,AI should make object model feature recognition problems of it possiblefor computers to muddle through at least as well as ture has all the plnr the harder problem of representing depeiple do. Expert systems, in particular, seem to thrive in processplanning of hypothesizing alternative designs methods and situations for which there is no alternative prospect of devel- ,igt intent intent. this meet to oping the analog of Newton's laws' Additionally, there would be expert design rules like ,.chamfer all holes and shafts," but this expert system portion Real-TimeLogistics.From a logistical viewpoint, the plant piece of the overall problem' floor can be modeled as a graph whose arcs represent the flow is a trivial and decision of data and material and whose nodes represent FinancialValue of Flexibilityand Quality. When automation manufacturing processes.At each node there is a set of someproposed, costs and benefits affect the design what ambiguo* objectives and a menu of possible actions systems are the subsequent financial analyses that deteruttd At trade-offs with urro.iited probabilities of achieving each objective. the systems are justified. Perhaps AI expert whether incommine by each node involuntary changesof state are created provide a means of estimating the value of flexiing data and parts, by chance, and by the passage of time systems can which frequently overshadowsthe objective q.,"tity, itser. The incoming data may be purely informative, or it may uitity and benefi-ts. and costs contain action requests that need to be prioritized. For example, typical inflexible electronic assembly lines In manufactuiing, the objectives and actions are suffi-
AUTONOMOUS VEHICTES
cause work in process to spend less than L%oof the time in "value add" make and test operations. Having more part feeders on each tool would reduce the number of times a card would need to passthrough, reduce the need to move and store cards, reduce the frequency with which tools must undergo setup, and vastly improve the overall line throughput. Similarly, lack of quality can result in tangible costs in terms of scrap and rework within the plant and field returns from distribution centers and customers.More insidious intangible costsare the consequentialdamagesthat customersmay suffer or the loss of company reputation that can adversely affect sales for years to come. AdaptiveTools. One area of research since the earliest days of the field of AI has been hand-eye robotics. The motivation was to create highly adaptive robotic systemsthat emulate the dexterity of animate motion and sensing systems(seeAutonomous vehicles; Robotics). Although it is not necessary to have intelligent, dextrous, humanoid robots because factories are sufficiently constrained, there is some benefit to be gained by providing modest levels of adaptive behavior in a broad range of make and test tools. Software can substitute for hardware precision, and it can make decisionsthat reduce the need for operator intervention. The mainstream of current industrial robot research is aimed at making robots that are faster, more precise,cheaper, and easier to program. Of these topics, only ease of programming appears to be appropriate for AI. Two promising approachesare teaching by showing and object modeling, both of which are relatively simple for nonadaptive tools. Conversely, in adaptive teaching by showing the system must infer an adaptive strategy from one example of the desired behavior. The use of object modeling to simulate adaptive tool programs is fairly easy if the tool reads its sensorsless often than about once a secondbecausethe user can be asked to provide simulated sensoryinput (seeSensors).If the feedbackactually occurs at a much higher rate, the model must provide an autonomous means of simulating the sensors.It is reasonableto expect comprehensivesolutions by the end of this decade. An entirely different application of object modeling is the generation of adaptive robot programs automatically from higher level task descriptions. This problem has been a major focus of AI hand-eye research over the course of the past 20 years, but the limited scopeof successhas mainly served to clarify the intrinsic technical difficulties. AI researchers have also worked on robotic sensing. The emphasis has been on emulating human sensory capabilities, especially taction and vision (qv). Contact-sensing micro_ switches in a gripper's fingers allow a robot to do a centering grasp. Strain gaugespermit a raw eggto be grasped.Contact image sensing allows part identification. Current approaches to contact image sensing include miniature contact-sensing arrays on silicon and artificial skin made from conductive polymers (seeMultisensor integration). Vision includes one-dimensionalsensorsthat detect when a light beam is interrupted, two-dimensional imaging sensors, and three-dimensional ranging devices. With ; one-dimensional tight sensor between a robot's fingers, the robot can calibrate itself to fiducial posts in the workplace. Imaging and ranging can be used to inspect, determine shape, *"u.1,r", determine location and orientation, and identify-workpieces.
39
Actually, researcherswho restrict their attention to a subset of the five human sensesare anthropomorphic chauvinists. In a factory every test tool, instrument, and transducer is a sensor. Factory sensorsmeasure temperature, current, color, chemical composition, vibration, and hundreds of other quantities that are outside the range of direct human sensation. Similarly, there is much more to adaptive tools than just robotics. General Referenees M. P. Groover,Automation,ProductionSystems, and Computer-Aided Manufacturing,Prentice-Hall,Englewood-Cliffs, NJ, 1980. Computerized Manufacturing Automation: Employment, Education, and the Workplace, Washington D.C., U.S. Congress, office of TechnologyAssessment,Report OTA-CIT-?}1, April LgB4. D. F. Nobel, Forcesof Production: A SocialHistory of Industrial Automation, Knopf, New York, 1984. D. GnossMAN IBM Corporation
AUTONOMOUSVEHICLES Simply defined, 8r autonomous vehicle must travel from one specifiedlocation to another with no external assistance.This definition encompassesall vehicles from unmanned vehicles without data links to remotely piloted vehicles with high bandwidth data links for real-time control. So broadly defined, autonomous vehicles for simple or well-structured environments are commonplacein military applications [e.g., some missiles and torpedoes,advanced remotely piloted vehicles (RPVs)1,in industry [e.g.,automatic guided vehicles(AGVs)], and in space exploration (e.g., Voyager, Viking. Automatic control technolory alone is sufficient to meaningfully coordinate sensorand actuator resourcesfor nearly all of these vehicles. However, automatic control becomesinadequate for uncertain, unknown, complex, and dynamic environments, where the most interesting applications for autonomousvehicles exist. Many autonomousvehicles have been developedfor simple environments. Only a few efforts approachrelatively .o-pi." environments and only a notable subset of those is discussed here. More information about past autonomousvehicle efforts is provided in other sources (I,2) (see also Manipulations; Multisensor integration; Robotics; Robots, mobile). sHAKy was developedin the late 1960sas a researchtool for problem solving and learning research(B).SHAKY could u.r"pl incomplete task statements, represent and plan paths through space occupiedby known and unknown obstacles,and collect information through visual and touch sensors.JASON was among the first mobile robots to use acousticand infrared (ir) proximity sensorsfor path planning and obstacleavoidanceas well as having a considerableproportion of its computation done onboard (4). The Jet Propulsion Laboratory (JPL) Rover was intended as the prototype for a mobile planetary exploration robot and was designedto deal with an unknown environment and uneven terrain populated by obstacles(b). HILARE was the first mobile robot to actually build a map of unknown space using acoustic and visual sensors,represent map information as a graph partitioned into a hierarchy of places, construct approximate three-dimensional representationswith informa-
40
AUTONOMOUS VEHICLES
tion from two-dimensional optical vision and a laser range finder, and integrate information from a variety of sensorsto make vehicle position estimates (6). The Stanford University (SU) Cart was developed to explore stereovision navigation and guidance for a mobile robot. It could travel over completely unknown flat territory while avoiding obstacles and has been tried outdoorswith man-made obstacleswith limited success(7). Of all these vehicles only HILARE remains an active research effort, although the SU Cart experiments are used in other vehicles at Carnegie-MellonUniversity (CMU) (8). Nevertheless, participation at a recent autonomous ground vehicles workshop has indicated a rapidly growing interest in the field (9). In spite of the diversity of possibleconfigurations,all autonomous vehicles must perform certain common functions to be capableof autonomousmobility. For simple vehiclesonly vehicle control and position location functions are required. An autonomousvehicle must control its transport mechanism and internal environment to reach the goal, and it must know its location in some absolute reference frame, at least, to determine when it has reached the goal. All past implementations have employed this minimal functional set. If the traversed environment is insufficiently knowh, an autonomous vehicle must perceive the environment through sensors(qt) for various purposes;if the environment contains localizedobstacles, the vehicle must perceiveand avoid them; if potential vehicle paths to the goal location are constrainedby known or perceivable large-scale features and the time that the vehicle has to reach the goal is finite, the vehicle must plan its route using information provided by an existing map and/or by the perception system; and if the environment is unknown and the vehicle must store environmental characteristics during its transit for later use (i.e.,make a map), the systemmust learn from its sensorperceptions.Perception,vehicle control, position location, obstacleavoidance(qv),route planning, and learning (qv) are the generic functions necessary for any level of autonomous mobility.
edge map. A decision tree guided the image search for obstacles (3). HILARE uses a two-dimensional camera image together with a laser range finder to develop three-dimensional world representations. An adjacency matrix that represents each region in the image is constructed by following edges detected by nearest-neighbor analysis. The matrix is pruned using region dimensions and inclusion- and object-contrast constraints; then a computer-controlled laser range finder obtains the range information for each region in the scene(12). As an example of stereo vision (qv), the SU Cart took nine pictures at different positions and used an interest operator on one of them to identify features for tracking. A correlator looked for those features in the remaining images. Features were stored as several different-sized windows, and the correlator used a coarse-to-finestratery to match the features. A camera solver took the information from the correlator and computed the relative feature positions. The camera solver superpositionedthe normal error curves of the feature position estimates from each image and chose the peak value as the feature position. Features that were not reacquired after several successiveframes were forgotten, and new features were added to the feature list using the interest operator. Objects were modeled as clouds of features approximated by spheres. This system did not seebland objects,and the long processing time causedit to becomeconfusedby moving shadowsof outdoor situations (7). Recent work has extended the SU Cart work. This work, embodied in a system called FIDO, uses imaging and motion geometry constraints to reduce the correlator searchwindow and to improve the accuracyof the vision. Imaging geometry constraints include near and far limits and epipolar constraints.Motion constraints use estimatedvehicle motion to limit the search areaand to gauge to reasonableness of a stereo match. FIDO reducescomputational complexity by restricting vehicle motion to a plane (8). Experiencehas provided the following observations:epipolar constraints are the single most powerful constraints, more features improve vision accuracy, and geometric constraints tend to limit the search area too much (8). Optical flow (qv) analysis can also locate the obstaclesnear a vehicle. One technique assumes Perception that the scenecontains visible vertical edgesand that the floor Perception subsystems in autonomous vehicles are used pri- is almost flat. Information from a camera tilt sensorconstrains marily for path detection (3-5,10), position location (10,11), the search for the vanishing point in an image. The exact and mapping (5,10).Path detectionincludesdetectionof obsta- camera tilt angles are computedfrom the vanishing point locacles and roadways. Perceptual position locating can be accom- tion. Knowing the camera angles reduces the optical flow plished by map matching and landmark recognition. Mapping equations to just the translational components.The optical activities build and improve the vehicle's assessmentof the flow equations are used to track features found in the neighborhood of vertical lines using an interest operator through environment. successiveimages (13). located ObstacleDetection. Obstaclescan be detectedand RoadDetection. Road detection is an alternative to obstacle with direct-ranging sensors(e.g., acoustic ranging sensors)or with a variety of vision techniques (e.g., simple two-dimen- detection if roads are available. In one technique the edgesof an image are detected with a model-directed gradient operasional vision, stereovision, motion stereo, and optical flow). Acoustic ranging sensorscan detect and locate both obsta- tor, and the edge map is corrected using a camera model and cles and free space.In one technique raw sensor returns are assuming a flat world. Roadsare detectedby rotating the edgethresholded and clustered; then probability functions of range filtered image 45" and applying a Hough transform (qv) to and azimuth are assigned to each filtered sensor reading. detect path edges.This technique works well when the vehicle Maps are generated by superpositioning the sensor-reading is closeto the road center and degradesnear the edges(14). In another technique visual road detection is performed in two probability distributions onto the floor plane (10). phases,bootstrap and feed forward. The bootstrap phase operSHAKY located free spaceand obstacleson aflat floor with a single camera'simage. The raw image was first reducedto a ates in situations when no prior sceneinformation is known. (qv) line representation usin g a gradient operator, and then floor Dominant linear features are extracted by region growing fitsmoothing edge-preserving using components to the connected applied were operations boundary and object-finding
AUTONOMOUS VEHICLES ters' The resulting features are consistently labeled by geometric and rule-based reasoning modules (seeRule-basua ,yr_ tems). The feed-forward phase uses information from previou, imagery to constrain the image search to a small region of the total image. Accurate predictions significantly reduce the window size' substantial processing savings are available if the absolute camera orientation is known (<1") and position is known (<0'25 m) (15)' LandmarkRecognition.Landmark recognition provides additional critical position-locating information. One technique for landmark recognition and location uses a matcher, a finder, and a selector. The selector supplies landmark subsets to the finder, which returns the likely landmark positions.The selector then computes the vehicle's actual po.1tio1 and the new position uncertainty. The finder locates possible landmarks from a set of candidates using geometric constraint propagation and directs the matcher to find possiblelandmark positions in a frame. The matcher detects image edges (see Edge detection), matches landmark templates (see Hough transform), and interprets the uncertainty of the match. It also uses gradient direction informativeness to reduce the false peaks in Hough space(16).
41
tions with the partial model of the world and hopelessly frag*L"t"a polygonal ,"ji"*"rrt.tions (11,19). Techniques for a".ii"g wiih' sensor uncertainty have been suggested (f OJZ,f Sl. The knowledge in the world model can be labeled ,iti, *"u.,r.es of uncertainty (10,17,1g). Uncertainty in the ttrree-aimensional world can be represented by uncertainty i"iirotar. The complexity of these manifolds can be reduced significantly by projecting them into the ground plane (19). VehicleControl In general, vehicle control can be adequately formulated using automatic control techniques, which are not discussedhere. However, a few techniques that warrant mention have been developedto coordinate vehicle control with knowledge-based planning. In an early technique for planner-control coordination the planner issued motion and control commandsto the controller through a FIFO store. The controller removed and executed those commands one af[er another. Commands were executed until a termination condition was satisfied. The transition between successivemotion commandswas coordinatedby maintaining uniform acceleration from one state to another. Control commands facilitated variable sharing between planner and controller (ZI). Others have suggestedextending production rule concepts for planner-control coordination (Zz,tU. In a recent approach, control information is communicated from planner to vehicle control as an action plan with initiation, trigger, and termination conditions. Plans control both vehicle *tio.r, and reporting actions. World model data is communicated through reports from sensorand control subsystemsto the planner. plan conditions enable the planner to download an entire coordinated plalt thereby giving considerableautonomy to the vehicle controller and enabling complex, real-time knowledge-directed control. This arrangement loosens the coupling between the planner and the ,r.hi.b controller and easestheir coordination by freeittg the planner from the event-driven details of real-time control. With this freedom the planner can operate predictively and can prepare the vehicls sensor and control modules for anticipated events by downloading additions and modifications to the global plan structure. In this conceptthe vehicle controller must r..og.rize unpredicted hazardous situations and stabilize the veh-iclecondition to give the slower planner time to reassessthe prevalent situation and modify the plan structure (28)
SensorMapping. Some autonomous vehicles started their tasks with incomplete or inaccurate maps (A,4) and some required no map at atl (S-7). In both of these situations the vehicles used sensor perceptions to create or correct maps. Many researchers have demonstrated sensor mapping from autonomous vehicles. Many different space representations have been adoptedto accommodatemeasurementsfrom different sensor sources. Quadtrees provide a simple yet powerful representationof surface space(9,16).Othershave used grids in which eachcell is labeled (4,b,10).Cell labelings range from a probability of whether the cell is occupied(10)to a projection map of the passable,impassable,and ,rttkttown regions within the cell (5). HILARE organizesspaceinto a hiera".rry of places and representsthis spaceat a topological level and a geometric level. Polygonal objectsand walls define convex cells of empty space(6,11). Another similar representation labels wall and obstacle segments with time- urh observation-dependent uncertainty measures(17). The SU Cart representedobstacles in three-dimensional space as spheres enclosing clusters of the features (7). Most representations of spacehave described indoor space.A few techniques have beln developed for representing actual terrain information. Some of these techniques include dividing free space into generahzed,cones (qv) and convexpolygons(18) or freewaysand meadows(1g) and dividing terrain into centrally symmetric convex patches and sin- Positionlocation gular objects(e.g.,roads, bridges, rivers, fences) and then la- The position-locating function identifies a vehicle's position in beling the terrain patches *ittt measures of traversability some absolute coordinate reference frame (usualiy (2U. that in which the goal is stated). position location is necess Uncertainty in sensor maps arises from inherent ary to de_ sensor termine the relative distance to the goal and to construct a errors, from robot-position-locatingerrors and from sensorcor- map in the absolute reference frame. Accurate position locarelation erro-rsif independent sensor measurements must be tion is crucial to mapmaking becausecumulative worst-case combined. Without careful filtering and model-directed corre- errors in vehicle position estimation lation, uncertainty can render sensormaps useless. can lead to global models Most sen- that do not_correspond to perceptions (19). Common tech_ sor mapping techniques project three-dimlnsional spaceinto a niques for absolute position locaiirtg include reference beaplane for simplicity. However, the mapping between the pro- cons, inertial navigation systems (tNSrl, dead reckonirg, jection world model and the perceived-repilsentation can be- landmark recognition, and map matching. comeunstable over slight ehangesin position, orientation, and Referencebeaconsrange from indoor acoustic illumination. This causesdifficulty in matching or ir beacons new percep- placed at known locations (6) to global-positioningsatellites
42
VEHICLES AUTONOMOUS
(GPSs).Referencebeaconsare readily available worldwide for airborne and surface environments. Referencebeaconsfor undersea environments are more difficult to implement and have relatively short ranges. Most beacon-tracking systems use various forms of triangulation to determine position. Where reference beacons are not applicable alone, an INS can be used. These systems use vehicle-dynamic states to predict location from an establishedreferencepoint. INSs are relatively expensive,and the errors accumulate to intolerable levels over time.
Several other sensorscan be used to detect obstacleproximity and measure range. Most efforts have employedmultiple-sensor systems (seeMultisensor integration) cooperativelyfor obstacle avoidance [e.g., acoustic ranging and touch (6)' ir proximity and acoustic ranging (4), and laser range finder and stereovision (5)1.High-speed ranging sensorsare quite versatile. HILARE's acoustic ranging sensorsenable avoidance of fixed, movable, and mobile obstacles(11) as well as wall following (6). Visual obstacle location has been and continues to be of considerableinterest, and various visual techniqueshave been explored. Both the JPL Rover (5) and the SU Cart Q) used stereovision for obstacle avoidance.Successiveimages from a single camera fi.xedto the robot have been used to detect and locate obstaclesusing motion stereo (14) and optical-flow techniques (1g) although both assume a nearly flat world. Since ,oud*uys tend to be obstacle-freepaths, some work has been done to detect and follow roadways (14,15).
Dead Reckoning.The simplest navigation technique is dead reckonirg, where the vehicle position is computed from measuring its accumulated rotational and translational motion. Different researchers have used different techniques to monitor vehicle motion. SHAKY (3) and JASON (4) used the drivestepper-motorcount and HILARE usesmore accurate and reliable shaft encoders (6). The JPL Rover further improved its dead-reckoningnavigation by monitoring vehicle motion with Avoidance Control. Once the nearby obstacles have been an odometer and a gTrocompass(5). Moravec used motion the vehicle's dynamic behavior must be altered to navigation Iocated, visual although stereo to estimate vehicle motion the existing goal proved fragile (7). In recent experiments motion stereo was avoid them and preserve the integrity of a potential field and pilot Q5) structure. A simple rule-based lound to be not as accurate as dead reckoning in translation to solve this suggested (26,27) been have technique rotation control estimates and only in the best of casesas accurate in in dynamic problem. limited (8). is severely Dead reckoning estimates The rule-directed vehicle pitot uses a frontal two-dimenenvironments (e.g., underwater and airborne domains) and is proximity sensorsto transit hampered by gradually increasing position errors if not up- sional visual image and ranging the decoupling,linearpilot applies The field QS). dated from an independent sourceof vehicle position informa- an obstacle algorithm to optimrze poles assignment OLPA) by and reckoning ization, dead tion. HILARE overcomesthe limitations of information and to sensor real-time to path according absolocal location: position the for using three independent modules planning modules. level higher the constraints assignedby the lute position measurement using ir beaconsand triangulation, proceduresbased zation optimi applies dead reckoning and shaft encoders,and relative positioning The DLpA algoritftperforms controller pontriagin;s This principle. minimum (11). on using environmental features (this concommo1 *.neuvers by using scriptlike structures 20). Map Matching. Sensor-mediatedmap matching has been cept is discussedin Ref. The potential field control technique is elegant and robust r,rgg.rted for position location (11,16,19).Acoustic proximity the locate to and an4 involves growing attractive and repulsive fields around r.rr.ors have been used to map room interiors mapping goals and obstacles,respectively. A compositefield can be sim(10,IT. acoustic best The vehicle on a given map sum of individual obstacle and goal technique involved organi zing maps into a hierarchy of re- ply computed from th; approaches to this technique have duced resolutions and then proposing trial transforms until a fields. Two very similar (26,27).In one of these simultaneousry nearly match is found,.Maps have been matched with an accuracy of been developed are determined direction and acceleration vehicle this using rotation approaches 0.1b m (0.b ft). in displacementand 3" in distant goals and obstacles directly from the location of nearby technique (10). Map matching has also been approachedwith as(GPFs). technique This fields potential backward reasoning (19) and graph-matching Q4) techniques. using generalized be to environment the and point a be to when vehicle sumes the Map matching can becomevery difficult and expensive instaand divided into convex regions. To minimize deadlock the map and/or sensor data are complex' detecbility probleffis, only the fields of obstacles within the on dependent made are fields all and tion range are considered subphantom addition, In velocity. and position ObstacleAvoidance the relative goals are formed at the edgesof obstaclesto further minimize must robot a obstacles, more or one avoid actively to In order (26). In the other approach to potential field control, first detect the obstaclesand locate them in spacerelative to deadlock equations of motion are computed using the force itself. Detection and location sensors include touch, ranging the robot's op"rational space formulation of Lagrangian in sensors, and various forms of vision. Once nearby obstacles vector ",The attractive and repulsive forces of goals and mechanics. have been located, the vehicle path must be appropriately alDissipative obstacles influence this operational force vector. tered to avoid collisions. are velocity vehicle to proportional forces that are directly prevent to and stability for vector force resultant to the Detection and Location. Several different nonvisual obsta- added generating velocities beyond,vehicle limits. Known obstacles simThe tried. been have techniques location and cle detection parallelas points, lines, planes, ellipsoids, cones' plest technique is to use touch sensorsto detect contact. Touch are modeled are not primitives obstacle (3,4,6).obsta- epipeds, and .yiindrrt. However, 5"rrro"shave been usedby severalindoor robots to the relative distances shortest of values sensor if cle detection with touch sensors necessarily limits vehicle necessary field potential to are directly available. This approach speedbecauseobstaclesare not detected until vehicle contact. vehicle
AUTONOMOUSVEHICLES
43
namic programming and relaxation techniques (22,28)and the overlapping n-tuple search algorithm augmented by shortcut heuristics (20). Other route searchapproacheshave been developedfor spatiat information that is not codified into a graph format. DeRoutePlanning tailed planning was accomplishedin JASON using a combinaRoute issues. tion of procedural network planning and cooperating plan complex several involves Vehicle route planning planning for surface transit best illustrates this complexity, critics. In this approach the plan is representedas a graph of (4). The JPL although similar techniques can be applied for other modesof processesrelated by explicit timing restrictions generators in conjunction move special-purpose mobility. Route planning usually begins with the transforma- Rover used spaces confined with to deal graph technique search its into a with perception system the tion of a spacemap produced by sensor-mediated use it would goals. instance, For planner then special and planning. The for suitable more representation searchesthis representation for a good route based on a cost wall following to return the vehicle to a known area after an excursion into a previously unknown cul-de-sac(5). A set of function. special problem solvers based on dynamic programming techthe augmented by geometric reasoning heuristics have niques transform vehicles autonomous Most Transforms. Map free spacemap obtained by the sensorsinto a path graph. In been developedto find paths that meet different path criteria most planners three-dimensional space is projected into the (i.e.,the shortestpath, paths that avoid observation,and feasiground plane, and the area is divided into passable and im- ble paths in complex terrain) (22). A technique similar to the passable regions. For route-planning purposes obstacles are potential field-obstacle-avoidancetechniques has been sugphysical obstructions or other forms of impassableareas (5). In gested for global route planning. With this technique routes existing planners these regions are either circles (7) or convex are found by searching for local minima in the compositefield The geometricproblem of finite map Q7). In another approach route plans are generatedby a polygons(3,5,10,L7,18,20,28). solved by shrinking the vehicle backward-chaining search mechanism that has been extended be can shape and size vehicle to a point and expanding the extent of all obstacles by the to resolve goal conflict, for iteration, and for efficient retrieval radius of a circle enclosingthe vehicle (3,5,7,L7).Inaddition, (2e). Many route search techniques (e.g., A* and dynamic prophantom obstaclescan be used to accommodateany significant (7). grammirg) use evaluation or cost functions to guide the vehicle-turn radius Path graph nodes can represent the corners (3,5) or tan- search. Several path cost function criteria have been suggents (7) of the expanded obstaclesor the centers of line seg- gested. Several approacheshave used cost functions that rements connectingobstacles(17,18)or defining unknown areas flect distance and uncertainty (3). Uncertainty can represent (6,10).Nodesthat representspacebetweenobstaclesare called both unknown spaceand unfavorable obstacle clustering (6). entry points (18), passagepoints, and adits (L7), and the line Others account for expenditures in energy (5,6),heuristic estisegments defining unknown areas and 'doorways are called mates of the remaining path quality (16), estimated transit frontier segments (6). In HILARE a hierarchical graph repre- time (20,28), fuel consumption, detectability, and visibility sents places in an area as nodes. Places contain landmarks, (28). work stations, and internal topological models that themselves are graphs representing frontier segments (11). Entry learning points can be identified by using a Voronoi diagram (seeTexture analysis) of the spaceand the neighborhoodgraph to form Most complex mobile robots use sensorinformation to improve a connectivity graph. Then the connectivity graph can be used their models of the world (3-7) (seealso Learning). Strategies to construct the generalized cones that define entry points have been developedto guide the robot in collecting map infor(18). In approachesthat plan using projection maps of actual mation autonomously (5,11) and with human assistance(LT. terrains, nodes represent the reference centers of convex map HILARE operates in an unknown environment using space regions (20,28).In all approachesthe path graph links repre- structuring and path search. In spacestructuring the vehicle systematically searchesunknown spacefor traversable paths, sent the traversable paths between nodes(3-7 ,17,18). Path graphs can be labeled with information to aid the path and in path search a graph is constructedthat representsfronsearch. Most path graph links are labeled with the distance tier regions and is used to guide the search for a low-cost between nodes (3,4,6,7,L7,L8,20,22,29).In HILARE links are traversable route (11). New information is incorporated into given width, and nodesare given finite-sized boundaries.With the HILARE's map by decomposingthe spatial representation this labeling only those links with width greater than the (seeReasonirg, spatial) into graph sets. These graph sets can robot are defined as traversable (11). Links can be assigned be matched to decomposedsensor data to recognrzeplaces in other measures of the path quality [e.g., enerry expendedfor the network hierarchy. In this technique geometric models are transit (5), estimated transit time (20), and local terrain fac- transformed into concepts that have unique characteristics tors such as slopeand ground cover (28)1.Nodescan also repre- (24). A generalized learning mechanism has been demonstrated sent such local factors as sensorvisibility measures(20). on a mobile robot. In this system sensory and motor kernels Route Search. Several route search techniques have been flow into discrete time slots in a short-term memory queue. developed.In simple casesa shortest distance search suffices The set of kernels associatedwith a given time slot is called an (7,L7). However, where more complex criteria are necessaty, event. Schema are defined in the form "event implies event." the A* search algorithm (qt) is most widely employed Each kernel has a measure of the certainty that it is associated (3,5,6,18). Other path graph search techniques include dy- with a particular schema, and each schemahas a measure of
obstacleavoidancehas been extendedto moving obstaclesand has been demonstrated successfully in a manipulator implementation Q7).
44
VEHICTES AUTONOMOUS
its own correctness.Events are predicted using a linear prediction queue, the schema from a world model, and the events in the short-term memory queue. Goal-seekingbehavior can be generated by backward chaining through the schema in the world model. Learning (qv) is implemented by generating new schema from unpredicted observations and by updating the certainty values of the existing schema based on observation (for partially matched schema) (30). lmplementation
changed between any of the modules, although the highest level plans are generated only by the planner (32).
Applications Autonomous vehicles have many future military, industrial, and domesticapplications. The first practical implementations are likely to be for military surveillance, nuclear plant maintenance and decomissioning,and spaceand underseaconstruction and exploration. These are applications for which the human costs are very large because of the hazatdous environment. More distant potential applications include commercial and domestic maintenance, public transportation and delivery, farmirg, and fire fighting (33,34)(seealso Manipulators).
The essential functions describedabove (i.e.' perception,vehicle control, position locating, obstacle avoidance, route planning, and learning) have a cost in terms of memory and processingtime. The total computational burden is increasedby the computation to support the implementation and coordination of the resourcesto perform the essential functions. This entire burden must be borne by sufficient computing and comBIBLIOGRAPHY munications hardware to permit vehicle speedsadequate for supfunction, of elements the Layering the desired mission. 1. B. J. Schactef,G. E. Tisdale,and G. R. Jones,Robotvehicles:A port, and hardware decouplesthem sufficiently to be treated Electric, Surveyand ProposedTest-BedFacility, Westinghouse r.purutely. Since functional elements are discussed above, Center,Baltimore'MD, 1984' Defenseand Electronics only hardware and support issues are discussedbelow. of the NATO Aduanced 2. G. Giralt, Mobile Robots,Proceed.ings Intricate autonomous vehicles clearly need considerable Barga,Itand Artifi.cialIntelligence, Instituteon Robotics Study computational hardware if they are to travel faster than contialy,June-Juty 1983.springerverlug, Berlin PRG. nenlal drift. Early vehicle experiments relied on remote mainB. N. J. Nilsson,A MobileAutomaton:An Applicationof Artificial frame computing capability linked to the vehicle through rf Intelligence Techniques, Proceedings of the First International communications (3,7). The continuing rapid development of Joint Conferenceon Artificial Intelligence, Washington, DC, May microelectronicstechnology has enabled progressivelymore of 1969,pp. 509-520the autonomous vehicle's computing load to be transferred to 4. M. H. Smith, R. P. Sobek,L .s. coles, D. A. Hodges,A. M. Robb, onboard hardware (4-6), and recently a vehicle has been deand P. L. Sinclair, The System Design of JASON, A Computer Controlled Mobile Rob ot, Proceed,ingsof the International Conferveloped that is totally independent of remote computers Qil. sysmultiprocessor onboard with ence an Cybernetics and., SocietyIEEE, New York, September Most autonomous vehicles star coupled L975,pP. 72-75. loosely into a organized processors tems have the Pro' 5. A. M. Thompson, The Navigation system of the JPL Robot, with a single processorcoordinating both internal and exterArtifi'cial on Conference Joint International tfth f inu has of effort ceedings nal communication (4,6,31).A more recent vehicle Intelligence,cambridge, MA, August L977,pp. 749-757. coupled multiple onboard processorsthrough a local area netand 6. G. Giralt, R. sobek, and R. chatila, A Mutti-Level Planning *oik in a bus architecture (23). AIt of these onboard multiproApproach to First A Robot: Mobile a for system Navigation cessor systems fix function to specific processorunits rather HILARE, proleed.ingsof the Sixth International Joint Conference than use distributed computing techniques that support re1979, pp' 335source sharing. A hierarchical structure has been proposed that is composed of a planner, d navigator, and a pilot all fed with map information from a cartographer. In this structure perception is organized into long-, medium-, and short-range sensors. Each level of the hierarchy has direct accessto sensordata at the appropriate level as weII as to the compositepicture construcled and maintained by the cartographer. Progressively more constrained command information is communicated from planner through the navigator to the pilot (20). HILARE uses a more flexible hierarchical structure in which the essential functions are organi zed,as cooperating experts activated by a high-level coordinator driven by means end analysis. Each module can also accessthe other modules as primitive functions (6). An even looser hierarchy has been developedin which planner, perception, and control subsystems are regarded ur u community of cooperating entities. These entities coordinated through the exchange of plans and reports. "r" Intelligent communications interfaces (ICIs) for each module use reports to maintain the consistency of local copies of a distributed blackboard. In this structure plans can be ex-
on Artifi'cial Inteltigence' Tokyo, Japan, August 337. of 7. H. p. Moravec, Rover Visual Obstacle Avoidance,Proceedings InteIIiArtificial on Conference Joint International the Seuenth gence,university of British columbia, vancouver' canada, August 1981,PP.785-790. and 8. c. Thorpe, L. Matthies, and H. Moravec, Experiments on conference the of Proceedings Thoughts on visual Navigation, New S5cH2152-7,IEEE, 1985, March Autornation, and, Robotics York, pP. 830-833. Autonotnous 9. A. Rosenfeld (ed.), Final Report on Workshop on of university Research, Automation for center vehicles, Ground. 1983' Maryland, College Park, MD, October Maps from wide 10. H. P. Moravec and A. Elfes, High Resolution and AutoRobotics on conference the of Proceed,ings Angle sonar, pp' 116-121' mation,March 1985,Sr-cHzL52-7,IEEE,New York, Navigation 11. G. Giralt, R. Chatila, and.M. Vaisset, An Integrated Mobile Multisensory Autonomous for Siystem Control and Motion an Ro' Conference International Robots,Proceed.ingsiS tt u First 1983' August/September NH, Woods, Bretton Research, botics a video Image L2. M. Ferrer, M. Briot, and J. c. Talou, study of Proceedingsof HILARE, Robot Treatment system for the Mobile
AUTONOMOUSVEHICTES the First Conferenceon Robot Vision and SensoryControl, Stratford-upon-Avon,UK, April 1981,pp. 47-58. 13. s. Tsuji, Y. Yagi, and M. Asada, Dynamic scene Analysis for a Mobile Robot in a Man Made Environment, Proceed.ingsof the Conferenceon Robotics and Automation, March 1985 B5CHZI527, IEEE, New York, pp. 8b0-8bb. 14. R. M. Inigo, E. S. McVey, B. J. Berger, and M. J. wirtz, ,,Machine Vision Applied to Vehicle Guidance," IEEE Trans. patt. Anal. Mach. Intell., PAMI-G(6), 820-826 (November 1gg4). 15. A. M. Waxman, J. LeMoigne, and B. Srinivasan, Visual Navigation of Roadways, Proceedings of the Conferenceon Roboticsand, Automation,March 198b,8bCH2t52-7,IEEE, New york, pp. g62867. 16. F. P. Andresen,L. s. Davis, R. D. Eastman, and s. Kambhampati, Visual Algorithms for Autonomous Navigation, Proceed,ings of the Conferenceon Robotics and Automation, March 1g8b, S5CH2IEZ7, IEEE, New York, pp. 8b6-861. 17. J. L. Crowley, Navigation for an Intelligent Mobile Robot,proceedingsof the First Conferenceon Artificial Intelligence Applications, December1984, 84CHz10z-1,IEEE, New york, pp. z9-g4. 18. D. T. Kuan, R. A. Brooks,J.c. Zamiska,and M. Das, Automatic Path Planning for a Mobile Robot Using a Mixed Representation of Free Space,Proceedingsof the First Conferenceon Artificiat Intelligence Applications, IEEE, Denver, colorado, December 1 9 8 4 ,p p . 7 0 - 7 4 . 19. R. A. Brooks, Visual Map Making for a Mobile Robot,Proceed,ings of the Conference on Robotics and. Automation, March lggb. 85CH2L52-7, IEEE, New York, pp. g24 -929. 20- R. Chavez andA. Meystel, Structure of Intelligence for an Autonomous Vehicle , Proceedingsof the International Conferenceon Robotics,March 1984, B4CH}008-1,IEEE, New york, pp. bg4_bgl. 21. J. Iijime, Y. Kanayah&, and S. Yuta, A LocomotionControl System for Mobile Robots, Proceedings of the SeuenthInternational Joint Conferenceon Artificial Intelligence, August 1981, University of British columbia, vancouver, BC, pp. T7g-7g4. 22. D. Keirs€y, J. Mitchell, B. Bullock, T. Nussmeier,and D. Tseng, Autonomous Vehicle Control Using AI Techniques,Proceed,ings of the Seuenth International Computer Soft*ori and, Applications conference,November 1988, 8BcH1g40-6, IEEE, New york, pp. L73_178. 23. S. Harmon, Coordination between Control and Knowledge Based Systemsfor Autonomous Vehicle Guidan ce,ProceedingsilTrends and Applications,May 1988,8gcH1gg7-g,IEEE, New york, pp. 8-11.
45
24. J. P. Laumond, Model Structuring and ConceptRecognition:Two Aspectsof Learning for a Mobile Robot,Procered,ings if tt , Eighth International Conference on Artificiat Intettigrirr, Karlsruhe, FRG, 1983,pp. 889-841. 25- C. Isik and A. Meystel, Knowledge-BasedPilot for an Intelligent Mobile Autonomous System, Proceed,ingsof the First Conference on Artificial I ntelligenceApptications,December1984, 84CH21021, IEEE, New York, pp. 5T-68. 26' B. H. Krogh, A GeneralizedPotentiat Fietd, Approach to Obstacle Auoidance Control, MS84-484, RoboticsResearchConferencepapers, RoboticsInternational of sME, Dearborn, MI, 1gg4. 27' O. Khatib, Real-Time Obstacle Avoidance for Manipulators and Mobile Robots,Proceedingsof the Conferenceon Roboticsand,Automation, March 198b, SbcHzrsz-7,IEEE, New york, pp. 500_ 505. 28' A. Parodi, Multi-Goal Real-Time Global Path planning for an Autonomous Land Vehicle Using a High-SpeedGraph SeJch processor'Proceedingsof the Conferenceon Roboticsand,Automation, March 1985,8bcH2L52-7,IEEE, New york, pp. 161-L67. ^E. 29. c. R. weisbin, G. desaussure,J. Barhen, T. swift, and J. c. White, Strategy Planning by an Intelligent Machine, MSg 4-50L, Robotics Research ConferencePapers, Robotics International of SME, Dearborn, Mf, 19g4. 30' A' H' Bond and D. H. Mott, Learning of Sensory-Motor Schemasin a Mobile Robot, Proceedings of the Seuenth International Joint Conferenceon Artificial Intelligence, August 1g81, {Jniversity of British columbia, vancouver,BC, pp. 1b9-161. 31' D' R' BlidberE, An Underwater Automation Employing Distributed Microcomputers, Proceed.ingsof the Second International computer Engineering confererir, vol. 2, Robotsqnd Robotics, ASME, New York, IggZ, pp. 27_BL. 32. s. Y. Harmon, D. w. Gage, w. A. Aviles, and G. L. Bianchini, Coordination of Intelligent Subsystemsin Complex Robots,proceedingsof the First Conferenceon Artificial Intelligence Applications, December1984,g4cHzL07-1, IEEE, New york, pp.'e+_ag. 33' S' Y' Nof (ed.),Handbook of Ind,ustrial Robotics,Wiley, New york, 1985. 34' J' Vertut, "Experience and Remarks on Manipulator Evaluation, PerformanceEvaluation of Programmable Robots and Manipulators," NBS Specialpublication 4Eg,97_I12 (Oct. 1976),U.S. De_ partment of Commerce,Washington, D.C.
S. Y. HenuoN RobotIntelligenceInternational
on the size of the search space. Consider the eight-queens problem, an extremely naive formulation associates64 values with each domain: a search spaceof 648 points. By realizing Concepts that a solution can neither have two queens on the same row be (qv) can and optimization nor on the same column, the search spaceis reducedto the 8! Almost all problems of search ao o1, values of permutations of the vector (L,2, . a set , 8). The other extreme is , formulated as follows: Compute (with o; from the domain D,) that satisfy a given constraint having full knowledge about the problem: The size of the (seeConstraint propagation). In other words, find an optimal search space is equal to the number of solutions. SimilarlY, ' X Dn The most popular knowledge can be added to the modified constraints; for exampoint in the product spaceD1 x problem. For n _ 8, ple, in the four-queensproblem no solution exists with a queen n-queens L*u*ple of this problem is the such that chessboard a queens on in a corner. All partial solutions having a queen in a corner eight Put is: the formulation no queen can attack another. Each variable cti identifies a can be eliminated. The idea that a modification of the formulaposition on the board; the domains are identical and consist of tion can reduce the size of the searchspaceunderlies a number of optimization methods. the 64 possible positions (seeChess-playing programs)' to techniques special exist A domain-dependentapproachto reducethe searchspaceis there For special product spaces linear calculus, symmetry (3). In the queens problem rotation and differential exploit to for example, problems, solve r,rlh reflection of the board gives different states but preserves the progra**ittg, dynamic programming: However' backtracking by works which existence of a solution: Onty one of these equivalent states constitutes a completely general approach, (41, an). . . . partial solution needs to be in the search space. , a continually trying to extend 2 1. Reference given in Ref. was general description The first has a large list of useful references. of Backtracking Advantagesand Disadvantages
BACKTRACKINC
Backtracking.The brute force approach to the problem con' X Dn The size siders all points in the solution spaceD1x x with lD) the x is then space lD"l, search of the lDr I number of elements rn D;. The basic idea of backtracking, , an) a however, is to construct the solution vector (o1, (or, partial solution a whether test to componentat atime and comin n constraint the satisfy to a chance has still . , o) ponent.. th. constraint restricted to the first fr componentsis oft"n called the "modified constraint." A large part of the brute ' x lD"l force search space can be eliminated (lD6*tl x problem, n-queens the of case In fails. test points) when the ln. modified constraint consists of verifying whether the k queens,already placed on the board, do not attack each other. fhor, each time a queen is added,a test is done to seewhether it attacks one of the already placed queens. In general, the modified constraint can be formalized as a predicate P(q, . , xn).Sometimes it is possible to do better than simply restricting the original constraint to the k known components; however, to avoid the loss of solutions, the modified constraint constraint. P(rr, . , xn)must be implied by the original smaller the the constraint, modified the The stronger search space.hrt" best possible one is the constraint allowing , ak), which can be extendedto only purtiul solutions (or, . (or, a solution all modified constraint is not possible without first finding backtrack the solutions to the stated ptotl"*. The size of (ar, search space consists of all considered partial solutions than larger potentially is space s search n. This . , oi), k the brute force search space; with sufficient strong modified of the constraints however, it will be smaller. The estimation (2)' elsewhere discussed size of the backtrack search spaceis
The great advantage of backtracking is its universal applicabitity; the great disadvantage is its potential inefficiency. A simple example can clarify this. Assume ay and a2 ranging over the domain {a, b, c}, anrangesover {a,b), and there are a set of componentsos, , ak-r . Part of the constraints are the conditions C1, stating that the values of al and a2have to be different, and Cz, that op has to be different from both o1 and az. A backtrack search starts with a1 a,and, becauseof Ct, az : b. Then a solution for as, . . . , ak-1 is searchedfor' Attempting to find a value for op fails due to Cz. At this point . , ak-r is rn .*haustive search over the subproblem o3, ts made ap find to attempt started; for each solution another doSeveral thrashing. as known is behavior This and fails. main-independent approaches to remedy this behavior are known. Reorderingthe Variables.Extending the partial solution in the order at, az, ak, aB,. . . solvesthe problem with the above example. For large problems selecting a different order can result in extremely large differences. Actually, the ordering can be dynamic and can be different in different branches of the search space. This ordering can be done at search time using a look ahead and selecting the variable that gives the smallest branching factor in the search tree (A,il' constraint propagation. sets of m componentsare analyzed and stronger constraints derived. In the above example anaconstraint Iyzing cLL,az, and ap (m _ 3) results in a stronger - b and aL : b, : a2 a, cL! combinations ir, *hich excludesthe o, : a. Now the search fails for these combinations without , ak-r (6)' thrashing over the set o3, .
IntelligentBacktracking.In the example abovethe failure at Effectof ProblemFormulationon SearchSpace ap is due to a1 and a2. The intelligent backtracking system apavoids thrashing and backtracks directly to a2. Moreover, if backtracking a Whether usin g a brute force approach or as, . , ak-tis independent of 01 and o2, this proach, a goodproblem formulation can have a dramatic effect the search over 46
BACKTRACKING, DEPENDENCY DTRECTED 47 search will be saved; the components are dynamically reordered (7) (seealso Dependency-directedbacktracking). All of the above-mentioned methods cannot prevent that the solution of someproblems still requires an extremely large computation time. Applicationin ProgrammingLanguages Floyd (8) was the first to proposea language construct facilitating the writing of backtrack programs and to point out the mechanical translation in constructs of conventional languages.A survey of useful primitive language constructs can be found in Ref. 2. Backtracking plays a fundamental role in a number of AI programming languag€s, most notably pLANNER (9) and PROLOG (10).In these languagesthe basic operation consistsof nondeterministically applying an operator on a state to derive a new state. Backtracking is used to exhaustively explore all possibilities. Programming consistsof formulating the problem in such a way that the backtracking search turns out to perform an efficient computation (see also Problem solving).
inconsistent. The goal is, in a single operation, to change the problem solver's current state to one that contains neither the contradiction just uncovered nor any contradiction encountered previously. This is achieved by consulting records of the inferences (see Inference) the problem solver has performed (called dependencies)and records of previous contradictions (called nogoods),which dependency-directedbacktracking has constructed in responseto previous contradictions. Contrastto ChronofogicalBacktracking Dependency-directedbacktracking was developedto avoid two deficienciesof chronological backtracking. Consider the application of chronological backtracking to the following task (see Fis. 1): First do one of A or B, then one of c or D, urd then one of E or F. Assume that each step requires significant problemsolving effort and that A and C together or B and E together producea contradiction that is only uncoveredafter significant effort. Figure 1 illustrates the sequenceof problem-solving states that chronological backtracking goesthrough to find all solutions(6,7,11,and l4).
Backtrackingto an Appropriate Choice. The first deficiency of chronological backtracking is illustrated by the unnecesBIBLIOGRAPHY sary state 4. The contradiction discoveredin state 3 depends on choicesA and C and not E. Therefore, replacing the .hoi.. 1. R. L. Walker, "An Enumerative Techniquefor a Class of Combi- E with F and working on state 4 is futile, as this .hrrrg. does natorial Problems," Combinatorial Analysis (Proceed,ings of Sym- not remove the contradiction. Unlike chronologicalbacktrackposium in Applied Mathematics, Vol. X) Amer. Math. Soc..proviirg, which replaces the most recent choice, dependency-didence,R.I., pp. 91-94, 1960. rected backtracking replaces a choice that causedthe contra2. J. Cohen, "Non-deterministic algorithms," Comput. Suru., LL(Z), diction. The discovery that state 3 is inconsistent causes 79-94 (1979). immediate backtracking to state 5. To be able to determine 3. J. R. Bitner and E. M. Reingold,"Backtrack program techniques,,, which choices underlie the contradiction requires that the CACM, 18(11),651_656(1975). problem solver store dependency records with every datum 4' P. W. Purdom, C. A. Brown, and E. L. Robertson,"Backtracking that it infers. When an inconsistency is encountered, these with multilevel dynamic search rearrangement Acta Inf., l5(Z), ," dependencies are consulted to determine which choices con9 9 - 1 1 3( 1 9 8 1 ) . tribute to the inconsistency. 5. E. C- Freuder, "A sufficient condition for backtrackfree search,,, JACM, 29(1), 24_32 (1992). 6. E. c. Freuder, "synthesizing constraint expressions,,,1ACM, 21(11)9 , 5 8 _ 9 6 6( 1 9 7 8 ) . 7' M. Bruynooghe, "solving combinatorial problems by intelligent backtracking," Inf. proc. Let., lh(t), 86_g9 (1gg1). 8. R. w. Floyd, "Nondeterministic algorithim s,,,JACM, L4(4), 6g6_ 644 (1967).
9. D. G. Bobrow and B. Raphael, "New programming languages for artificial intelligence research," comput. suru., 6(g), lss_r14 (r974). 10' R. A. Kowalski, Logic for Problem Soluing, North Holland, Rotterdam, Ig7g.
Avoiding RediscoveringContradictions.The second deficiency of chronological backtracking is illustrated by the unnecessary state 13. The contradiction discovered in state l0 dependson B and E. As E is the most recent choice,chronological and dependency-directedbacktracking are indistinguish-
0 -{ ) 1-(A) 2-(A,C) 3-(A,C,E) 4-(A,C,F)
M. BnuyNoocHE Katholike Universiteit Leuven, Belgium R. VnNKEN BIM, Belgium
BACKTRACKI NG, DEPENDENCY DIRECTED Dependency-directedbacktracking is aproblem-solving (qv) technique for efficiently evading contradictions. It is invoked whenever the problem solver discoversthat its current state is
5-{A,D) 6-(A,D,E) 7 - ( A ,D , F ) 8-(B) 9 - (B,C) 10-(B,c,E) 1l - (8, c, F)
re- (e,o) 1 3- { 8 ,D , E ) 1 4 - ( 8 , D ,F ) Figure
l. chronological backtracking state sequence.
BASEBATL
able, both backtracking to state 11. However, as B and E are known to be inconsistent with each other, there is no point in rediscovering this contradiction by working in state 13. Dependency-directedbacktracking avoids this. Whenever a contradiction is discovered,the set of choices (called a nogood) contributing to the contradiction is added to a nogood database.Then, whenever backtracking introduces a new choiceto the current state, this set is checked whether it contains any known nogoodbefore problem solving is permitted to proceed. The two nogoodsin this examples ate {A' C} (discoveredin state 3) and {8, E} (discoveredin state l0)' SomeDisadvantages For some applications the disadvantages of dependencydirected backtracking may outweigh its advantages. Dependency-directed backtracking incurs a significant time and spaceoverhead as it requires the maintenance of dependency record.sand an additional nogood database. Thus, the effort required to maintain the dependenciesmay be more than the problem-solving effort saved. If the problem solver is logically complete (see Completeness) and finishes aII work on a state before considering the next, the problem of backtracking to an inappropriate choice cannot occur (in this example, the contradiction found in state B would instead be found in stat e 2). In such casesmuch of the advant ageof dependency-directedbacktracking is irrelevant. However, most practical problem solvers are neither logically complete nor finish all possiblework on a state before considering another. HistoricalPersPective
2. J. Doyl€, "A truth maintenancesystem,"Artif. Intell. 12(3),23L272(197e). 3. J. de Kleer, J. Doyle, G. L. Steele, and G. J. Sussman,Explicit Control of Reasoning,in P. H. Winston and R. H. Brown (eds.), Artificial Intelligence:An MIT Perspectiue,Vol. 1, MIT Press,Cambridge, MA, pp. 94-116, 1979,also in R. J. Brachman and H- J. Levesque (eds.),Readings in Knowledge Representation,Morgan Kaufman, Los Altos, CA, pp. 345-355, 1986. 4. D. McAllester, An Outlook on Truth Maintenance, MIT Artificial Intelligence Laboratory, AIM-551, Cambridge,MA, 1980. 5. G. L. Steele,The Definition and Implementation of a ComputerProgramming Language Based on Constraints, MIT Artificial IntelligenceLaboratory, TR-595,Cambridge,MA, 1979' TMS," Artif- Intell- 28(2),I276. J. de Kleer, "An assumption-based 1e6 ( 1986). J. oB Klnnn Xerox PARC
BASEBALL Two distinct AI systems have chosen the name of Baseball. One is a program that answers questions posed in English about baseball scoreswritten in the early 1960s.It syntactically parsesthe sentencesinto templates (or specificationlists) for the processorto look up in a data base (see B. F. Green, A. K. Wolf, C. Chomsky, and K. Laughery, "Baseball:An Automatic QuestionAnswerer", in E. A. Feigenbaumand J. Feldman (eds.),Computersand Thought,McGraw-Hil1,New York, pp. 2A7-2L6, 1963). The secondsystem,written by Elliot Soloway,is a learning program that uses "snapshots" as its training instances. It uses domain knowledge (the rules of baseball), knowledge of physics,and the goals of the players to processthese snapshots (see E. Soloway, "Learning _ Interpretation + Generaltzation: A Case Study in Knowledge-Directed Learning", Report No. COINS-TR-78-L2, Computer and Information SciencesDepartment, University of Massachusetts,Amherst, 1978; also seeE. Solowayand E. M. Riseman,"Levelsof Pattern Description in Learning" , Proceedingsof the Fifth IJCAI, Cambridge, MA, pp 801-81I, 1977).
The idea of dependency-directedbacktracking grew out of applications of constraint propagation (qv) to electronic circuit analysis (1). However, dependency-directedbacktracking, like chronological backtracking upon which it is an improvement, also suffers from avoidable deficiencies.For example, for both backtracking schemes inferences made in state 3 based on choicesC and E are rederived in state 10. Fortunately, with more extensive use of dependencyrecords, these and similar inefficiencies are avoided. The result is a truth maintenance system (2),which is a general problem-solving tool. Reference J. RosnNBERG 2 contains the best description of dependency-directedbackSUNY at Buffalo tracking, although it, as all the references,presents dependency-directed backtracking in the context of a larger truth maintenance system. Reference3 describesthe use of dependency-directed backtracking in the problem-solving system BACKWARDCHAINING. see processing,bottom-up and topAMORD. Reference4 describesthe use of dependency-directed down. backtracking in another kind of truth maintenance system. Referenceb includes extensive discussionsand examplesintegrating dependency-directedbacktracking into a constraint DECISIONMETHODS BAYESIAN tt guug.. Reference 6 presents a truth maintenance system ut d an approach to problem solving that achieves most of the goals of aependency-directedbacktracking but without back- BasicFormulation tracking (seealso Belief revision). Bayesian methods provide a formalism for reasoning about partial beliefs under conditions of uncertainty. In this formalir* propositions are quantified with numerical parameters BIBLIOGRAPHY signrfyi"s the degree or u.nef accordedthem by some body of krrowiedg", and these parameters are combined and manipuand depen1. R. M. Stallman and G. J. Sussman, "Forward reasoning lated accord,ingto the rules of probability theory. For example, circuit dency-directed backtracking in a system for computer-aided (1977)' if A stands for the statement "Ted Kennedy will seek nominaanalysi s," Artif. Intell. 9(2),135-196
BAYESIAN DECISION METHODS tion in 1988," then P(AIK) stands for a person's subjective belief in A given a body of knowledge K that may include that person'sassumptionsabout American politics, specificproclamations made by Kennedy, an assessmentof Kennedy's past and personality, and so on. The symbol K, indicating the sourceof the belief in A, is often suppressedfrom belief expressions, and one simply writes P(A) or P(-A). This is justified when K remains constant since the main purposeof the quantifier P is to summarize K without explicating its details. However, when this background information undergoes changes, one needs to identify specifically which assumptions account for one'sbeliefs, and an explicit mentioning of K or someof its elements is then required. In Bayesian formalism belief statements obey the three basic assumptions of probability theory: 0
(1)
P(sureproposition)-1
(2)
p(A or B) : p(A) + p(B) if A and B are incompatible
(3)
Thus, a proposition and its negation must be assigneda total belief of unity,
49
combining Prospectiveand Retrospective supports The essenceof the rule in Eq. b is conveniently portrayed using the oddsand likelihood ratio parameters.Dividing Eq. b by the complementary form for P(- Hle), one obtains P(Hle) _W)
P(H)
P(-H/:trffi)ffi
(7)
Defining the prior odds on I/ to be
: P(5,)==. o(H)-i?(4I:, 1 - p(H)
(8)
andthe rikelihood r"r,:56 P(elH)
L(erH):ffi,
(e)
O@ld:m,
(10)
the posterior odds
is given by the product
o ( H l e )- L ( e l H ) o ( H )
(11)
Thus, Bayesian rule dictates that the overall strength of belief in a hypothesis I/ based on both one's previous knowledge K to account for the fact that one of the two is certain to be true. and a given evidence e should be the product of two factors:the The heart of Bayesian techniques lies in the celebratedin- prior odds O(H) and the likelihood ratio L(elH). The former version formula: measures the causal or prospective support accordedto I/ by the background knowledge alone and the latter representsthe P(Hld- P(elry)?@)' (5) diagnostic or retrospective support given to H by the evidence P (e) actually observed. Strictly speaking, the likelihood ratio L(elH) may also destating that the belief one accordsa hypothesis.F/upon obtainpend on other propositions in the tacit knowledge base K. ing evidence e can be computed by multiplying one's prior However, the power of Bayesian techniques comesprimarily belief P(H) and the likelihood P(elH) that e will materi alize from the fact that in causal reasoning (qr) the relation p(elH) assuming ^F/is true. The denominator P(e) of Eq. b hardly is fairly local; namely, given that ^F/is true, the probability of e enters into consideration becauseit is merely a constant that can be estimated fairly naturally and is not dependent on can always be computed if one requires that P(HIe) and many other propositions in the database.For example, once it P(-H le) sum to unity. Whereas a formal mathematician will dismiss Eq. 5 as a is established that a patient suffers from a given disease,it is straightforward identity stemmittg from the definition of con- fairly natural to estimate the probability that the patient wilt develop a certain symptom. This is what physicians learn in ditional probabilities, medical schools;a symptom is considereda stable characteristic of the diseaseand, therefore, shoutd be fairly independent P(A'B) P ( A I r o- P ? : ? ) rP @ 1 i l- p*) (6) of other factors such as epidemic conditions, previo,6 dir.ases, W the tests that help identify the disease,and so on. It is for this the Bayesian subjectivist regards Eq. 5 as a normative rule for reason that the conditional probabilities p(tlH) can meet the updating beliefs in responseto evidence.The left side of Eq. 5 modularity requirements of rule-based expert systems (qv) in expressesa quantity P(Hle) that people often find hard to that it can serve to quantify confidence in rules such as "if II, assess,in terms of more readily judged quantities, often avail- then e" and,retain its viability regardless of other rules or facts able directly from the way experiential knowledge is encoded. that may reside in the knowledge base at any given time. For example, if a person at the next gambling table declares an outcome "twelve" and one wishes to know whether he was Example 1. Imagine being awakened one night to the shrill rolling a pair of dice or turning a roulette wheel, the quantities sound your of burglar alarm system. What would be your deP (twelveldice) and P(twelvelroulette) are readily known from gree of belief that a burglary attempt has taken place? For the model of the gambling devices (giving 1 l1Gto the former illustrative purposes, the following judgments are made: (a) and 1/38 for the latter). Similarly, one can judge the prior There is a 957ochancethat an attempted burglary will trigger probabilities, P(dice) and P(roulette), by estimating the ,r,r*the alarm system, P(alarm lburglary) : 0.9b;(b) there is sfignt ber of roulette wheels and dice-rolling tables at the gambling (0.01) chance that the alarm sound would be triggered by a casino.However, issuing a direct judgment of p(dice ltwelve) is mechanism other than an attempted burglary; thus, p(alarml a much harder mental task, which could not be rendered relino burglary) - 0.01; (c) previous crime patterns indicate that ably except by a specialist of such guessestrained at the very there is a 1 in 10,000chancethat a given house will be bursame casino. glarized on any given night; that is, p(burglary) - 10*4. P(-A):l-P(A)
(4)
BAYESIANDECISIONMETHODS
Putting these assumptions together, using Eq. b,
Thus, the individual characteristics of each detector are sufficient for determining the combined impact of any group of detectors.
O(burglary Ialarm) : L(alarm Iburglary)O(burglary) 4 - 00.95 . 0 1 101-16-4:0'0095
MultihypothesisVariables
and so, from
The assumptions of conditional independencein Eqs. 1b and justified if the failure of a detector to react to an (12) 16 will be attempted burglary and the factors that may cause it to fire prematurely both depend solely on mechanisms intrinsic to one has the individual detection systems such as insufficient sensitivity, internal noise, and so on. However, if these can be caused : P(burglarv|ararml o.oo941 ffi: by external circumstances affecting a selected group of senThus, the retrospective support imparted to the burglary hy- sors, such as a- powerful failure or an earthquake, the two burglary and -H : no burglary may be too pothesis by the alarm evidence has increased its degree of hypotheses H coarse to induce sensors' independence,and additional refinebelief from 1 in 10,000to 94.1 in 10,000.Note that it was not ment of the hypotheses' space may be necessary.This usually necessaryto estimate the absolute values of the probabilities P(alarm lburglary) and P(alarm Ino burglary), only their ratio happens when the negation of a proposition entails several enters the calculation, and therefore, a direct estimate of this possiblestates of the world, each having its own distinct characteristics. For example, the state of no burglary entails the ratio could have been used instead. possibilities of an "ordinary peaceful night," a "night with earthquake," an "attempted entry by the neighbor's dog," and Poolingof Evidence so on, each influencing the sensorspresent in a unique way. Equation 16 might hold with respect to each one of these conAssume that the alarm system consistsof not one but a collecditions but not with respect to their aggregate, no burglary. tion of N burglary detection deviceseach sensitive to a different physical mechanism (e.g., air turbulences, temperature For this reason it is often necessaryto refine the hypotheses' variations, pressure, sound,etc.) and each producing a distinct spacebeyond that of binary propositions and group them into multivalued variables, where each variable consistsof a set of sound. exhaustive and mutually exclusive hypotheses. Let I/ stand for the event that a burglary took place and let e&stand for the evidence obtained from the kth detector, with e! representing an activated detector and ef representing a Example 2. One may chooseto assign the variable name H Hz, He, H+} to the following set of conditions: {Hr, silent detector. The reliability (and sensitivity) of each detector is characterizedby the probabilitiesP(ellul and P(e!,]-n) Hr: no burglary, equipment malfunction (-b, m) or, more parsimoniously, by their ratio: H2: attempted burglary, ho malfunction (b, -m)
o(A) P ( A \, - = L + o(A)
g L\kv k tt _H^ ,, r-
P<e!lH)
(13)
p@l:_nl
If some detectors are triggered while others remain deactivated, there is conflicting evidence,and the combinedbelief in the hypothesis I/ would be computed by Eq. 11: O ( H l e r ,e 2
, e N )- L ( e r , . .
e Nl H ) o ( H )
(14)
Strictly speaking, Eq. L4 requires an enormous database because one needs to specify the probabilities of activation for every subset of detectors conditioned on H and on -H. Fortunately, reasonable assumptions of independencecan drastically cut this storage requirement. Assuming that the state of activation of each detector dependsonly on whether a burglary took place but is thereafter independent of the activation of other detectors, one can write N
P(nt, e2,
(15)
, €NIH) k:l
and P ( e L ,€ 2 , .
, € Nl * n ) -
1||;il1|:*:irt;:il
P ( e h l- H )
(16)
k,--l
H+ : no burglary, no malfunction ( -b, -m) Each evidencevariable ek eanalso be multivalued (e.g.,e!_ no sound,ef : lo* sound,tt : high sound),in which caseitte causal link between H and ee will be quantified by an m x n matrix where m and n are the number of values that H and ek might take, respectively, and the (i, j)th entry of Me standsfor 1,tfi - PkllH)
(18)
For example, the matrix below could represent the various sensitivities of the kth detector to the four conditions in ,F/: e\ (no sound)
Hr H2 Hs Ha1
el 0.4 0.5 0.1 0
0.5 0.06 0.5
ef
(low sound) (highsound)
0.1. 0.44 0.4 0
' ,€k Given a set of evidence readings eL, €2, the overall belief in the lth hypothesis is given by
P(Hiler N
,eN)-O(H)
combined withequipment
N
il
which lead to O(Hlrt,.
H3:
nL ( e u l H )
k:l
(12)
, eN)
where a - [P(er, . . ., computed by requiring
e Nl H ) P ( H i )
, €N,
(1e)
eN)J-l is a normalizing constant to be that Eq. 19 sums to unity (over il.
BAYESIAN DECISION METHODS Assuming conditional independencewith respect to each Hr, one obtains
p@ulryf , eN): ap(H,,t[.U &:1
P(Htlrr,.
UncertainEvidence(Cascadedtnference)
(20)
Thus, one seesthat the matrices P(eklHr) now play the role of the likelihood ratios in Eq. L7.If , for each detectorreading efr, the likelihood vector is defined as
51
One often hears the claims that Bayesian techniques cannot handle uncertain evidence becausethe relation p(AlB) requires that the conditioning event B be known with certainty. To see the difficulties that led to this myth, consider a slight modification in the story of the alarm system:
Example 4. Mr. Holmes receives a telephone call from his neighbor Dr. Watson stating that he hears a burglar alarm r,&- (rf, xt,. , x h ) (2L) sound from the direction of Mr. Holmes's house. preparing to : rf P@olH,) (22) rush home, Mr. Holmes recalls that Dr. Watson is known to be a tastelesspractical joker, and he decidesto first call his other Eq. 20 is computed by a simple vector product process.First, neighbor, Mrs. Gibbons, who, despite occasional drinking the individual likelihood vectors are multiplied together, term problemS,is far more reliable. . trN, by term, to form an overall likelihood vector A - trt namely, Since the evidence variable S - sensor output is now uncertain, it cannot be used as evidencein Eq. 11 but, rather, Eq. 11 N must be applied to the actual evidence at hand: w _ (Dr. &;: lf rreolH,) (23) Watson's testimony) h=L Then the overall belief vector P(Hilrt, by the product P(Hiltt,
, eN) is obtained
, eN) : aP(H)Li
(24)
reminiscent of Eq. L7. Note that only the relative magnitude of the conditional probabilities in Eq. 22 needbe estimated; their absolute magnitude doesnot affect the result becausea is to be determined by the requirement>iP(Hrlrt, . . , eN)_ 1. Example 3. Assume that the system contains two detectors having identical characteristics, given by the matrix above. Further, let the prior probabilities for the hypothesesin Example 2 be representedby the vector P(H) - {0.099, 0.009, 0.001,0.891) and assumethat detector 1 was heard to issue a high sound while detector 2 remained silent. From Eq. 22 one has t r t _ ( 0 . 1 , 0 . 4 4 , 0 . 40, )
) . 2- ( 0 . b ,0 . 0 G 0, . b , 1 )
A - trltr2- (0.05,0.02G,0.2, 0) P ( H i l t ' , u ' ) : a ( 4 . 9 5 ,0 . 2 3 8 ,0 . 2 0 ,0 ) 1 0 * s _ (0.919,0.0439,0.0375,0) Thus, the chanceof attempted burglary (Hzor Hs) is 0.043g + 0.0375_ 8.l4vo. The updating of belief need not wait, of course,until all the evidenceis collectedbut can be carried out incrementally. For example, if one first observese t - high sound, the beliei in H calculates to
o(Hlw) - L(wlH)o(H)
P(HilG, W) - aP(G, WIH)pfH,) : aP(H,) ) p(G, wlH,, sJ)p( silH,)
P(Hilu') _ a(0.0099,0.00996,0.0004,0) - (0.694,0.277,0.028,0) This now serves as a prior belief with respect to the next datum, and after observing e2 : no sound, it updates to P ( H i l t t , u , ) - o ' ) \ ? p ( H , l u r -) a , ( 0 . 8 4 70, . 0 1 6 6 0 , . 0 1 4 ,0 ) _ (0.919,0.0439,0.032b,0), as before. Thus, the quiescent state of dete ctor 2 lowers the chancesof an attempted burglary from B0.s to g.L4vo.
(25)
Unfortunately, the task of estimating L(WIH) will not be as easy as that of estimating ^L(SlH) becausethe former requires the mental tracing of a two-step process,as shown in Figure 1. Moreover,even if L(WIH) could be obtained,one *orrtd not be able to combine it with other possibletestimonies, say Mrs. Gibbons's (G), by a simple processof multiplication dq. zg) becausethose testimonies will no longer be conditionally independent with respect to H. What Mrs. Gibbons is abouf to say dependsonly on whether an alarm sound can be heard in the neighborhood,not on whether a burglary actually took place. Thus, it will be wrong to assumep(Glburglary, W) _-p(Gt burglarY) becausethe joint event of a burglary together with Watson's testimony constitutes a stronger eviden.. fo1"the occurrence of the alarm sound than the burglary alone. Given the level of detail used in the story, it is *or. reasonableto assumethat the testimonies W and G and the hypothesisH are independent of each other once one knows whether the alarm sensorwas actually triggered. In other words, each testimony dependsdirectly on the alarm system (S) and is only indirectly influenced by the possible occurrenceof a burgla ry (H) or by the other testimony (seeFiS. 1). These considerations can be easily incorporated into Bayesian formalism; using Eq. 3, Eq. 19 is simply conditioned and summed on all possiblestates of the intermediate variable S:
(26)
Gibbons's testimony Burglary
--'P Watson's testimony
Figure 1. A diagram illustrating cascadedinference through an intermediate variable S.
52
BAYESIANDECISIONMETHODS
where Sj U - 1, 2) stands for the two possibleactivation states alarm sound should be accordeda confidencemeasure of 80Vo, The task is to integrate this probabilistic judgment into the oftheaIarmsystem'namely,Sr:alarmtriggeredandS2 alarm not triggered. Moreover, the conditional independence body of hard evidence previously collected. In Bayesian formalism the integration of virtual evidence af G,IV, and I/; with respectto the mediating variable S yields is straightforward. Although the evidencee cannot be articu(27) P(G,WlHt Sj) : P(GIS;)P(W|S;) lated in full detail, one interprets the probabilistic conclusion as conveying likelihood ratio information. In the story, for and Eq. 26 becomes example, identifying e with G : Gibbons's testimotrY, Mr. P ( H i l G , W ) - a P ( H ) T P ( G l S j ) P ( W l S ; ) P ( S J l H , ) ( 2 8 ) Holmes's summary of attributing 80Vocredibility to the alarm t sound event witl be interpreted as the statement P(Gl alarm The computation in Bq. , ,unbe interpreted as a three- sound):P(Glno alarm sound) : 4:L More generally,if the state process:first, the local likelihood vectors P(GIS;) and variable upon which the tacit evidence e impinges most di. , S;, . . . the P(W lS; ) are multiplied together, componentwise,to obtain rectly has several possible states Sr, Sz, the likelihood vector Aj(S): P(elS;), where e standsfor the interpreter would be instructed to estimate the relative magtotal evidencecollected,G and IV. Second,the vectorP(elS,) it nitudes of the terms P(elS,) [e.g.,by eliciting estimatesof the multiptied by the link matrix n4rj: P(Sj lgt) to form the likeli- ratios P(elS;) :P(elSr)1,and sincethe absolutemagnitudesdo hood vector of the top hypothesis Li(H) - P(elH)- Finally, not affect the calculations, one can proceedto update beliefs as using the product rule of Eq. 5 (see also Eq. 19 or 24), Li(H) if this likelihood vector originated from an ordinary, logically is multiplied by the prior P(H) to give the overall belief crisp event e. For example, assuming that Mr. Watson's phone in I/;. call already contributed a likelihood ratio of 9 : 1 in favor of the This processdemonstrates the psychologicaland computa- hypothesis alarm sound,the combinedweight of Watson's and tional role of the mediating variable S. It permits one to use Gibbons'stestimonies would yield a likelihood vector A;(S) : local chunks of information taken from diverse domains [e.9., P(W,G lS;) : (36,1). P(H), P(GlS; ), P(W lS; ), and P(Sj lH)l and fit them together This vector can be integratedinto the computationof Eq. to form a global, cross-domaininference P(H le) in stages,us- 28, andusing the numbersgiven in Example1, onegets ing simple and local vector operations. It is this role that prompted somephilosophersto posit that conditional indepenA;(r/): ) nr(s)P(srlHr) j passively must one for which nature of an accident is not dence wait but rather a psychological necessity that one actively dictates, as the need develoPs,bY, for example, coining names to new, hypothetical variables. In medical diagnosis, for inP(Hil G, W) _ at\;(H)P(Ht) stance, when some symptoms directly influence each other, - a ( 3 4 . 2 51 the medical profession invents a name for that interaction , . 3 5 ) ( 1 0 - 41, 1 0 - 4 ) (e.g., complication, pathological state, etc.) and treats it as a (30) : (0.00253,0.99747) new auxiliary variable that induces conditional independence; knowing the exact state of the auxiliary variable renders the Note that it is important to verify that Mr. Holmes's 807o interacting symptoms independent of each other. summarization is indeed based only on Mrs. Gibbons's testimony and does not include prejudicial beliefs borrowed from Virtual (lntangible)Evidence previous evidence (e.g., Watson's testimony or crime rate information); otherwise one is in danger of counting the same Holmes: Mr. of story in the development Imagine the following information twice. The likelihood ratio is, indeed, unaffected practitioners claim that people Example 5. When Mr. Holmes calls Mrs. Gibbons, he soon by such information. Bayesian of their beliefs and of anorigins the retracing of capable ur. his answering realizes that she is somewhat tipsy. Instead of as "What if you didn't question directly, she goes on and on describing her latest swering hypothetical questions such increment increasein the "estimate or operation and how terribly noisy and crime ridden the neigh- receive Watson's call?" alone." testimony to Gibbons's due belief borhoodhas become.When he finally hangs up, all Mr. Holmes An effective way of eliciting pure likelihood ratio estimates can make out of the conversation is that there probably is an by previous information would be to first let one unaffected from sound alarm an hear did 80Tochance that Mrs. Gibbons imagine that prior to obtaining the evidence, one is in the her window. standard state of total ignorance and then estimates the final given to a proposition as a result of observing The Holmes-Gibbons conversation is the kind of evidence degree of belief this example, if prior to conversing with Mrs. In evidence. the estimate that is hard to fit into any formalism. If one tries to had a "neutral" belief in s, that is, the probability P(el alarm sound), one would get ridiculous Gibbons Mr. Holmes : t, the postconversationestimate _ p(alarm) alarm) P(no numbers becauseit would entail anticipating, describing, and p(alarmlc) indeed correspondto a likelihood would 80Vo assigning probabilities to all possible coursesMrs. Gibbons's alarm. of favor in 1 4: of ratio circumstances. the conversation might have taken under These difficulties arise whenever the task of gathering evidenceis delegated to autonomous interpreters who, for various PredictingFutureEvents reasons,cannot explicate their interpretive processin full defeatures of causal modelsin the Bayesian tail but, nevertheless, often produce informative conclusions One of the attractive they lend to the prediction of yet-unobease that summarize the evidence observed. In this case Mr. formulation is the possible denouementsof social epithe as such events served Holmes's conclusion is that, on the basis of his iud8mental given test, prognosesof a given disease, a of outcomes sodes, (alone!), hypothesis the interpretation of Gibbons'stestimony
: ("1;") (2e (3:3? 3:33)(T)
BAYESIAN DECISION METHODS and so on. The need to facilitate such predictive tasks may, in fact, be the very reason that human beings have adopted causal schema for encoding experiential knowledge Example 6. Immediately after his conversation with Mrs. Gibbons, as Mr. Holmes is preparing to leave his office, he recalls that his daughter is due to arrive home any minute and, if confronted by an alarm sound, would probably (0.2) phone him for instructions. Now he wonders whether he shouldn't wait a few more minutes in case she calls. To estimate the likelihood of the new target event: D : daughter will call, one has to add a new causal link to the graph of Figure 1. Assuming that hearing an alarm sound is the only event that would induce the daughter to call, the new link should emanate from the variable S, and be quantified by the following P(DIS) matrix:
-D (will not call)
on
D (will call) 0.7
off
0
1
0.3
S Accordingly, P(D lall evidence)is given by P(Dle) which means ;* all the length, .oilodes with Dr. warson and Mrs. Gibbons impart their influence on D only via the belief they induced on S, p(S; le). It is instructive to see now how p(S, le) can be obtained from the previous calculation of p(Hile). A natural temptation would be to use the updated belief p(Hile) and the link matrix P(sj lH) and, through rote, write the conditioning equation
53
Thus, together, one has P(S;le) _ a(36, 1X0.0101,0.9899)_ (0.268G, a.7gr4) (gb) which gives the event sr - alarm-sound-on a credibility of 26.86Vo and predictsthat the event D - daughter-will-call will occur with the probability of
P(Dld f
: (0.2686 ,0.7s14)(ool) : 0.188
(36)
MultipfeCauses and"Explaining Away', TYeestructures like the one used in the preceding section require that only one variable be considereda causeof any other variable. This structure simplifies computations,but its representational power is rather limited because it forces one to group together all causal factors sharing a common consequenceinto a single node. By contrast, when peopleassociatea given observation with multiple potential causes,they weigh one causal factor against another as independent variables, each pointing to a specializedarea of knowledge. As an illustration, consider the following situation: Example 7. As he is pondering this question, Mr. Holmes remernbershaving read in the instruction manual of his alarm system that the device is sensitive to earthquakes and can be triggered (A.D by one accidentally. He realizes that if an earthquake had occurred,it would surely (0.g) be on the news. So, he turns on his radio and waits around for either an announcement or a call from his daughter.
Mr. Holmes perceives two episodesthat may be potential causes for the alarm sound, an attempted burglary and an P(S;le)_ T p(S;lH)P(H;le) (32) earthquake. Even though burglaries can safely be assumed independent of earthquakes, stilt a positive radio announcement would reduce the likelihood of a burglary, as it "explains also known as Jeffrey's rule of updating (1). This equation, away" the alarm sound. Moreover, the two causal u.r.rt, are however, is only valid in a very special set of circumstances.It perceived as individual variables (seeFig. 2); general knowlwill be wrong in the example becausethe changesin the belief edge about earthquakes rarely intersects knowledge about of H actually originated from the correspondingchangesin S; burglaries. reflecting these back to S would amount to counting the same This interaction among multiple causesis a prevailing patevidencetwice. Formally, this objection is reflected by the intern of human reasoning. When a physician discoversevidence equality P(S; lH) + P(SilHt, e), stating that the evidenceobin favor of one disease,it reduces the credibility of other distained affects not only the belief in H and s but also the eases,although the patient may as well be suffering from two strength of the causal link between H and S. On the surface, or more disorders simultaneously. A suspectwho provides an this realization may seem detrimental to the usefulness of alternative explanation for being present at the ,r.r," of the Bayesian methods in handling a large number of facts; having to calculate all links' parameters each time a new piece of evidence arrives would be an insurmountable computational burden. Fortunately, there is a simple way of updating beliefs ( B u r g l a r y , n o b u r g l a r y) that circumvents this difficulty and uses only the original link matrices (2). The calculation of P(S; le), for instance, can be performed as follows. Treatittg S as an intermediate hypothe-E) {Earthquake, sis, Eq. 5 dictates P(S; le) _ oP(elSj )P(Sj)
(33)
The term P(elS;) is the likelihood vector Aj(S), which was calculated earlier to (36, 1), and the prior p(S; ) is given by the matrix multiplication
P(s;) :
)
rrs; |H)P(H,)- (10-4,1
- (0.0101, 0.g8gg)
0.01\ 1o-4)(0.e5 \0.01 0.99/ (34)
(Report, -R
( A l a r mn, o a l a r m)
\
t\r-.
( W i l l c a l l ,w i l f n o t )
Watson's call - true
T9 Gibbons's testimony
Figure 2. A diagram representing the causal dependencies among the variables in Examples l-7.
BAYESIANDECISIONMETHODS
crime appearsless likely to be guilty even though the explanation furnished does not preclude his committing the crime. To model this "sideways" interaction a matrrx M should be assessedgiving the distribution of the consequencevariable as a function of every possible combination of the causal variables. In the example one should specify M _ P(SIE, H), where E stands for the variable E - {earthquake, no earthquake). Although this matrix is identical in form to the one describedin Example 2, Eq. L8, where the two causal variables were combined into one compoundvariable {f/1 , Hz, Hs, Hq}, treatin g E and H as two separateentities has an advantage in that it allows one to relate each of them to a separate set of evidencewithout consulting the other. For example, the relation betweenD andft (the radio announcement)can be quantified by the probabilities P(RIE) without having to consider the irrelevant event of burglary, as would be required by compounding the pair (8, R) into one variable. Moreover, having received a confirmation of R, the beliefs of E and f/ can be updated in two separatesteps,mediated by updating S, closely resembling the processused by people. An updating scheme for networks with multiple-parent nodesis describedin Refs.3 and 4. If the number of causal factors ft is large, estimating M may be troublesomebecause,in principle, it requires a table of size 2k. In practice, however, people conceptualizecausal relationships by creating hierarchies of small clusters of variables, and moreover, the interactions among the factors in each cluster are normally perceived to fall into one of a few prestored, prototypical structures each requiring about k parameters. Common examples of such prototypical structures are: noisy OR gates (i.e., &Dy one of the factors is likely to trigger the effect), noisy AND gates, and various enabling mechanisms (i.e., factors identified as having no influence of their own except enabling other influences to becomeeffective).
sitions, and the strengths of these influences are quantified by conditional probabilities (Fig. 3). Thus, if the graph contains . , rcn,and Si is the set of parents for the variables tc1, variable xi, a complete and consistent quantification can be attained by specifyitg, for each node r;, a subjective assessment P ' (x| S; ) of the likelihood that r; will attain a specific value given the possible states of S,. The product of all these assessments, P(h, . , xn) : fl r'(r; lSr) constitutes ajoint-probability *oa., that supports the assessed quantities. That is, if the conditional probabilities P(xtlS;) dictated by P(h, . , xn)are computed,the original assessmentsare recovered.Thus, for example,the distribution corresponding to the graph of Figure 3 can be written by inspection: P ( x t , X z , X B ,X 4 , x g , X 6 )
- P(xol*r)p(rsIx2, xs)P(xdl*r,*r)P(rg l*t)P(*rl*r)P(rr) An important feature of a Bayesian network is that it provides a clear graphical representation for many independence relationships embeddedin the underlying probabilistic model. The criterion for detecting these independenciesis based on graph separation: Namely, if all paths between f; and xcia;re ,,blocked" by a subset s of variables, r; is independent of xi given the values of the variables in S. Thus, eachvariable r; is independent of both its grandparents and its nondescendant siblings, given the values of the variables in its parent set S;. For this blocking criterion to hold in general, one must provide a specialinterpretation of separation for nodesthat share common children. th. pathway along arrows meeting head to any of its descendants head at noderp is blocked;neither xcptrot is in S. In FigUre 3, for example, n2 and xs &te independent given Sr : {tJ or Sz : {xt, x+} becausethe two paths between i, and,xs &YQblocked by both sets.HowevQt,)c2and tr3m&'' not BayesianNetworks bL independentgiven Sg : {h,ro} becauses6, &s a descendant 2 1 and Figures as such In the preeeding discussiondiagrams of x5, ,rrrblocksthe head-to-headconnectionat xs,thus opening purposes. illustrative or mnemonic for merely not were used a pathway betweerl xz and 13. They in fact convey important conceptual information, far more meaningful than the numerical estimates of the probabilities involved. The formal properties of such diagraffis, Belief Propagationin BayesianNetworks called Bayesian networks (4), are discussedbelow. Once a Bayesian network is constructed, it can be used to Bayesian networks are directed acyclic graphs in which the represent the generic causal knowledge of a given domain and nodesrepresent propositions (or variables), the arcs signify the can be consulted to reason about the interpretation of specific existenceof direct causal influencesbetween the linked propo- input data. The interpretation processinvolves instantiating a sel of variables coruespondingto the input data and calculating its impact on the probabilities of a set of variables designated as hypotheses.In principle, this processcan be executed by an external interpreter that may have accessto all parts of the network, may use its own computational facilities, and may scheduleits computational steps so as to take full advantagl of the network topology with respect to the incoming data. However, the use of such an interpreter seemsforeign to the reasoning process normally exhibited by humans. one's limited short-term memory and narrow focus of attention combined with the resistance to shifting rapidly between alternative lines of reasoning seem to suggest that one's reasoning process is fairly local, progressing incrementally along prescribed pathways. Moreover, the speed and ease with which x6 one performs someof the low-level interpretive functions, such r, ,L.ognizing scenes, comprehending text, and even underwith six variables' Figure
3. A typical Bayes network
BAYESIAN DECISION METHODS standing stories, strongly suggestthat these processesinvolve a significant amount of parallelism and that most of the processingis done at the knowledge level itself, not external to it. A paradigm for modeling such an active knowledge base would be to view a Bayesian network not merely as a passive parsimonious codefor storing factual knowledge but also as a computational architecture for reasoning about that knowledge. That means that the links in the network should be treated as the only pathways and activation centersthat direct and propel the flow of data in the process of querying and updating beliefs. Accordingly, one can imagine that each node in the network is designated a separate processorthat both maintains the parameters of belief for the host variable and managesthe communication lines to and from the set of neighboring, logically related variables. The communication lines are assumed to be open at all times, that is, each processor may at any time interrogate the belief parameters associated with its neighbors and comparethem to its own parameters. If the compared quantities satisfy some local constraints, no activity takes place. However, if any of these constraints is violated, the responsible node is activated to revise its violating parameter and set it straight. This, of course, will activate similar revisions at the neighboring processorsand will set up a multidirectional propagation process,which will continue until equilibrium is reached. The fact that evidential reasoning involves both top-down (predictive) and bottom-up (diagnostic)inferences(seeprocess_ irg, bottom up and top down) has causedapprehensionsthat, oncethe propagation processis allowed to run its courseunsupervised, pathological casesof instability, deadlock,and circular reasoning will develop(b). Indeed, if a stronger belief in a given hypothesis means a greater expectation for the occurrence of its various manifestations and if, in turn, & gr€ater certainty in the occurrenceof these manifestations adds further credence to the hypothesis, how can one avoid infinite updating loops when the processors responsible for these propositions begin to communicate with one another? It can be shown that the Bayesian network formalism is supportive of self-activated, multidirectional propagation of evidence that convergesrapidly to a globally consistent equilibrium (4). This is made possible by characierizing the belief in each proposition as a vector of parameters similar to the likelihood vector of Eq. 20, wtth each componentrepresenting the degree of support that the host proporition obtains from one of its neighbors. Maintaining such a breakdown record of the origins of belief facilitates a clear distinction between belief basedon ignorance and those basedon firm but conflicting evidence.It is also postulated as the mechanism that permits peopleto trace back evidenceand assumptionsfor the prrrpo." of either generating explanations or -odifying the -oa.i. As a computational architecture, singly connectedBayesian networks exhibit the following characteristics: New information diffuses through the network in a single pass;that is, equilibrium is reachedin time proportional to the diameter of the network. The primitive processors are simple and repetitive, and they require no working memory exceptthat used in matrix multiplication. The local computations and the final belief distribution are entirely independent of the control mechanism that acti-
55
vates the individual operations. They can be activated by either data-driven or goal-driven (e.g., requests for evi_ dence)control strategies, by a clock, or at random. Thus, this architecture lends itself naturally to hardware implementation capable of real-time interpretation of rapidly changing data. It also provides a reasonable model of neural nets involved in cognitive tasks such as visual recognition, reading comprehension,and associativeretrieval wher" ,rrr*rrpervised parallelism is an uncontestedmechanism. RationalDecisionsand euality Guarantees Bayesian methods, unlike many alternative formalisms of uncertainty, provide coherent prescriptions for choosingactions and meaningful guarantees of the quality of these choices.The prescription is basedon the reahzation that normative knowledge-that is, judgments about values, preferences,and desirability-represents a valuable abstraction of actual human experienceand that, like its factual knowledge counterpart, it can be encodedand manipulated to produce useful ,..o**.rrdations. Although judgments about the occurrenceof events are quantified by probabilities, the desirability of actionconsequencesis quantified by utilities (also called payoffs, or values) (6). Choosing an action amounts to selecting a set of variables in a Bayesian network and fixing their values unambiguously. Such a choice normally alters the probability distribution of another set of variables, judged to be conseq,.,"rr."*of the decision variables. If to each configuration of the consequenceset C a utility measure u(C) is assignedthat representsit, d.gr.. of desirability, the overall expected utility associated with action o is given by
U(a)_ )
C
e) "G)P(Cla,
(37)
where P(Cla, e) is the probability distribution of the consequence set C conditioned upon selecting action a given the evidencee. Bayesian methodologiesregard the expectedutil tty u (d as a figure of merit of action o and treat it, therefore, as a prescription for choosingamong alternatives. Thus, if one has the option of choosingeither action e,1ol^ a2, orle,can calculateboth U (a) and U (a) and select that action that yields the highest value. Moreover, since the value af U (a) dependson the evidencee observedup to the time of decision,the outcomeof the maximum expectedutility criterion witl be an evidence-dependent plan (or decision rule) of the form: If elis observed,choose a1; if e2isobserved,choos€o2, and so on (seeDecisiontheory). The same criterion can also be used to rate the usefulnessof various information sourcesand to decide which piece of evidenceshould be acquired first. The merit of querying variable tr can be decided prior to actually observing its value, by the following consideration.If one queries r and finds the value ,r, the utility of action @will be U (alr*) one is able, dtthis oorrrr, to choosethe best action among all pending alternatives and attain the value
U (u,) -*3* U (alr,)
(3e)
s6
BEAM SEARCH
However, since one is not sure of the actual outcomeof querying r, one must average (J(u") over all possiblevalues of v*, weighed by their appropriate probabilities. Thus, the utility of querying tc calculates to
(J":2P{w - v,le)U(v")
(40)
8. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behauior, 2nd ed., Princeton University Press, Princeton, NJ, t947. General References Bayesian Methodology
where e is the evidence available so far. This criterion can be used to schedule many control functions in knowledge-basedsystems.For example, it can be used to decidewhat to ask the user next, what test to perform next, or which rule to invoke next. The expert system PROSPBCTOR (7) employeda schedulingprocedure(calledJ") basedon similar considerations (seeRule-basedsystems).If the consequence set is well defined and not too large, this informationrating criterion can also be computed distributedly, concurrent with the propagation of evidence.Each variable r in the network stores an updated value of U, and as more evidence arrives, each variable updates its U, parameter in accordance with those stored at its neighbors. At query time, attention will be focused on the observable node with the highest U, value. It is important to mention that the maximum expectedutility rule was not chosenas a prescription for decisionsfor sheer mathematical convenience.Rather, it is founded on pervasive patterns of psychological attitudes toward risk, choice,preferLn..r, and likelihoods. These attitudes are captured by what came to be known as the axioms of utility theory (8). Unlike the caseof repetitive long series of decisions(e.g.,gambling), where the expectedvalue criterion is advocatedon the basis of a long-run accumulation of payoffs, the expectedutility criterion is applicable to single-decisionsituations. The summation operation in Eq. 3? originates not with additive accumulation oi pryoffs but, rather, with the additive axiom of probabitity theory (Eq. 3). In summary, the justification of decisions made by Bayesian methods can be communicated in intuitively meaningful terms, and the assumptions leading to these decisionscan be traced back with ease and claritY.
R. O. Duda, P. E. Hart, P. Barnett, J. Gaschnig,K. Konolig€, R. Reboh, and J. Slocum, Development of the PROSPECTOR Consultant System for Mineral Exploration, Final Report for SRI Projects b821 and 6915, Artificial Intelligence Center, SRI International, 1978. M. Ben-Bassat,R. W. Carlson, V. K. Puri, E. Lipnick, L. D. Portigal, and M. H. Weil, "Pattern-basedinteractive diagnosis of multiple disorders: The MEDAS system," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-L, 148-160 (1980). J. Kim, CONVINCE: A CoNVersational /Nference Consolidation Engine, Ph.D. Dissertation, University of California, Los Angeles, 1983. D. J. Spiegelhalter and R. P. Knill-Jones, "statistical and knowledgebased approachesto clinical decision-supportsystems, with an application to gastroenterology,"J. R. Stat. Soc. A(L47), 35-77, 1984. G. F. Cooper, NESTOR: A Computer-BasedMedical Diagnostic Aid that Integrates Causal and Probabilistic Knowledge, Report No. STAN-CS-84-103L, Stanford University, November 1984' Quasi -Bayesian M ethods Medical Consultation:MYCIN, ElseE. H. Shortliffe , Computer-Based, vier, New York, 1976. C. Kulikowski and S. Weiss, Representation of Expert Knowledge for Consultation: The CASNET and EXPERT Projects,in P. Szolovitz (ed.),Artificial Intelligence in Medicine, Westview Press,Boulder, CO, pp.21-55, t982. R. A. Miller, H. E. Pople, and J. P. Myers, "INTERNIST-1, 8r experimental computer-baseddiagnostic consultant for general internal medicine,"N. EngI. J. Med. 307(8), 468-470 (1982). J. R. euinlan, INFERNO: A Cautious Approach to Uncertain Inference,Rand Note N-1898-RC,September1982. J. PPenr, UCLA
BIBLIOGRAPHY 1. R. Jeffrey, The Logic of Decisions,McGraw-Hill, New York, chapter 11,1965. Z. J. pearl, ReverendBayes on InferenceEngines: A Distributed Hierarchical Approach, Proceedingsof the second AAAI Conferenceon Artifi,cial Intelligence, Pittsburgh, Pennsylvania, pp' 133-136, 1982. B. J. Kim and J. Pearl, A Computational Model for Combined Causal and Diagnostic Reasoningin Inference Systems, Proceedingsof the Eighth interlational Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, PP. 190-193, 1983' 4. J. Pearl, Fusion, Propagation and structuring in Belief Networks, Technical Report CSD-850022, Cognitive Systems Laboratory, ucLA, June 1985;A.1.29(3), 24t-288 (sept. 1986). s. J. Lowrance, Dependency-Graph Models of Evidential support, COINS Technical Report 82-26, University of Massachusettsat
This work was supported in part by the National ScienceFoundation, Grant #DSR 83-13875.
BEAMSEARCH
Beam search is a heuristic search technique in which a number of nearty optimal alternatives (the beam) are examined in parallel. Beam search is a heuristic technique becauseheurislic rules are used to discard nonpromising alternatives in order to keep the size of the beam as small as possible. Some of the ,rr.."rsful applications of beam search include speech recognition (1), job rttop scheduling (2), vision (3), and learning (4). Beam searchcan easily be explained by using a searchstate Amherst, 1982. described by a directed graph in which each node is a space un6. H. Raiffa , Decision Analysis: Introductory Lectures on Choices slate and each arc represents the application of an operator d,er (Jncertainty,Addison-wesl"y, Reading, MA, 1968. a successorstate. A solution is a Bayesian that transforms a state into 7. R. O. Duda, P. E. Hart, and N. J. Nilsson, "subjective a goal state. A few operators are to state initial path an from methods for rule-based inference systems,"Proc. 1976 Natl' Com' (NEXT) to expand a state, that is, gennecessary;an operator ConferenceProceedings),45, 1075-1082 (1976)' put. Conf. AIFPS
BEAM SEARCH
erating all the successornodes of a given node; an operator (SCORE)to evaluate a state, that is, generating the likelihood that a node belongs to the optimal solution; an operator (PRUNE) to select the alternatives that are most promisirg, that is, choosing the best nodes; and an operator (FOUND) to check if the goal has been reached. The operation implemented by PRUNE is often called forward pruning. Beam searchalso requires two data structures: one that coniains the set of states that are being extended (called cuRRENr. srerus) and one that contains the set of new states that is being created (called cANDIDATE. srArps).At each iteration of the Ltgorithm a new set of states is generated and becomesthe current set of states for the next iteration. Given these operators and data structures, beam searchcan be expressedby this simple program: Start:
cURRENT.sTATES:: initial . state while (not FOUND (cunnaxr. srArEs)) do CANDIDATE. STATES:: NEXT (CUNNENT. SIATNS) SCORE (caNomArES. srerps) CURRENT.STATES:: PRUNE
(CANUOATE. STATPS)
5/
in terms of how expensive the search is and in terms of the ability of the algorithm to reach the goal. In general, e ,,permissive" PRUNE will reach the goal most of the time at the expense of examining many unpromising paths (in the extreme case, beam search simply becomes a breadth-first search).On the contraty, & very "strict" PRUNE will limit the amount of computation but will increase the risk of pruning the path that leads to the goal. Therefore, one would like to use the strictest PRUNE that does not prevent the algorithm from finding the optimal solution. How well (if at all) such a compromisecan be reached is a function of the domain being searchedand of the quality of the scoring function. For example, in a speechsystem, if the SCORE operator generateshigh scoresfor only a few allophones(including the correct one) and low scoresfor the other allophones,the algorithm will tolerate a very narrow beam without losing accuracy. In general, the pruning function is no substitute for the quality of the scores since poor and confusedscoreswill generate sets of states for which the score does not truly reflect the likelihood that a state is on the correct path. Finally, it should be noted that although beam search is a very cost-effectivesearch method, becauseit only examinessome of the alternatives, it doesnot guarantee that the optimal solution is found. One of the reasonsthat beam search is attractive is that it reduces computation by reducing the number of states that have to be examined. The amount of saving dependson the specific search domain; experiments with speechrecognition programs showed an improvement of a few orders of magnitude over an exhaustive search.Nevertheless,the large size of some search spacesrequires even higher performance.To this end, the design of parallel beam search algorithms has been investigated. Although it would appear thai parallelism could be readily exploited by performing the NnX1' and SCORE operators in parallel, it has been found (b) that beam search needsto be partitioned into such small componentsthat their synchronization, using the primitives available on generalpurpose multiprocessors, results in too much overhead. This problem can be solved by designittg special architectures for beam search.For example, the Harpy machine (6), a five-processor architecture using small microprocessors,was able to execute the beam search for a speechrecognition application in real time and twice as fast as a large *ui.tftame^. Another example, describedin Ref. 7, is the custom VLSI architecture that can executebeam searchthree orders of magnitude faster than a million-instruction-per-second general purpose processor.
The algorithm is started by providing an initial state (e.g., the initial node of the graph to be sea"ched). Then the NEX-T and SCORE operators are applied to generate all the possible new states and give them a score. When all the new states have been generated, the PRUNE operator is applied to the set of new states, and the unpromising alternatives are discarded. The algorithm iterates until the goal has been reached. For example, beam search is used in the Harpy speech recognition system (1) to search a graph that embodies the syn_ tactic and vocabulary constraints of the language as a sequence of basic speech sounds (allophones). This graph is such that any path from the initial state to the finai state represents a pronunciation of a legal sentence. Given an unknown utterance, Harpy segments the signal and computes the likelihood that each segment represents an allophone. The sequence of labeled segments is then compared against each of the alternative paths in the graph thaC represent acceptable allophone sequences in the language. The operator NEXT extracts from the graph all the nodes that can follo* the nodes in cuRRENr.srArEs. The operator SCORE compares the allophone in each node with a segment of speech and returns a value that indicates how well they match. The PRUNE operator computes a threshold score as a function of the best score and then discards all the nodes that have a score that is worse than the threshold. Therefore, in the Harpy system the pruning is anchored to the best path, and all the nodes that are so close to the best node to have a chance to be on the best path are kept. The FOUND operator simply triggers when all the input speech data have been evaluated. At this point, if the search BIBLIOGRAPHY was successful, the set caNDIDATE. sTATES contains the last node in the network, and the correct utterance can be retrieved by 1. B. T. Lowerre and R. D. Reddy, The Hurpy SpeechUnderstanding tracing the best path backward (a simple look,rp operation if System, in W. A. Lea (ed.;, Trend,sin SpeechRecognition.prenticethe pointers for each path in the beam rtu kept until the end of Hall, EnglewoodCliffs, NJ, 19g0,pp 840_860. the search). Note that the best node at each segment during 2. M. S. Fox, Constraint-Directed Search: A Case Study of Job-Shop the search is not necessarily on the gtobally best path discovScheduling,Ph.D. Thesis,Carnegie-Mellon University, pittsburgh, ered at the end of the search. Thus, local ulrorr, fbr example, PA, computer science Department, December 1ggg. errors due to errorful acoustic data, are recovered by delaying 3. s. Rubin, The ARGos Image understanding system, ph.D. Thesis, commitment to a particular path until the end. carnegie-Mellon university, pittsburgh, pA, computer science As one can see from the Harpy system example, the NEXT Department, November 1928. and SCORE opertors depend on the problem being searched 4' T. G. Dietterich and R. S. Michalski, "fnductive learning and do not directly influence the performance of strucof the search. tural descriptions: Evaluation criteria and comparative review of The PRUNE operator instead influences the performance both selectedmethods," Artif. Intell. 16, zsT-zg4 (November 19g1).
s8
BEUEFREVISION
5. P. N. Oleinick, The implementation and Evaluation of Parallel Algorithms on C.mfrp, Ph.D. Thesis, Carnegie-MellonUniversity, Pittsburgh, PA, Computer ScienceDepartment, 1978. 6. R. Bisiani, H. Mauersberg,and R. Reddy,"Task-Oriented Architectures," Proceedingsof the IEEE,885-896, July, 1983. 7. T. Ananthamaran and R. Bisiani, "Hardware Accelerators for SpeechRecognition Algorithms," in Proceedingsof the 13th International Symposium on Computer Architectu.re, IEEE I4(2) 2L6223 (June 1986). R. BlstnNt Carnegie-Mellon UniversitY
REVISION BETIEF The ability to reason about and adapt to a changing environment is an important aspectof intelligent behavior. Most computer programs constructed by researchers in AI maintain a model of their environment (external and/or internal environment) that is updated to reflect the perceived changes in the environment. One reason for model updating is the detection of contradictory information about the environment. The conventional approach to handling contradictions consists of changing the most recent decisionmade [chronologicalbacktracking (qv)1. An alternative solution [dependency-directed backtracking (qv)l consists of changing not the last choice made, but an assumptionthat provokedthe unexpectedcondition. This secondapproach generated a great deal of research in one area of AI, which becamelooselycalled belief revision. Belief revision is an area of AI researchconcernedwith the issuesof revising setsof beliefs when new information is found to contradict old information. Researchtopicsin belief revision include the study of representation of beliefs, in particular how to represent the notion of belief dependency;the development of methods for selecting the subset of beliefs responsible for contradictions; and the development of techniques to remove somesubsetof beliefs from the original set of beliefs. The research on betief revision is related to the research on nonmonotonic logic, which aims at capturing parts of the logic of belief revision systems (seeReasoning,nonmonotonic). The fietd of belief revision is usually recognized to have been initiated by J. Doyle, who, basedon the work of Stallman and sussman (1), developedan early domain-independentbelief-revision system (2,3), although a system which performs belief revision was developedat approximately the same time by P. London (5). Following Doyle, several researchers pursued this topic, most of them building on the system of Doyle. Some of the important systems developed for belief revision are: TMS (/),RUP (6,7),MBR (8,9),and ATMS (10,11)'In the last few years some commercial systems that perform belief revision becomeavailable, for example, DUCK (from Smart Systems Technology),ART (I2) (from Inference Corporation), and LOOPS (from XEROX). Rootsof the Problemin Al Belief-revision systemsare AI programs that deal with contradictions. They work with a knowledge base, containing propositions about the state of the environment, performing reasoning from the propositions in the knowledge base, and "filtering" the propositions in the knowledge base so that only
part of the knowledge base is perceived-the set of propositions which is under consideration. This set of propositions is usually called the set of believed propositions. When the belief-revision system switches from one of these sets to another, we say that it changes its beliefs. Typically, belief revision systemsexplore alternatives, make choices,explore the consequencesof the choices,and comparethe results obtained when using different choices.If during this processa contradiction is detected, the belief-revision system revises the knowledge base, "erasing" some propositions so that it gets rid of the contradiction. Belief-revision systems have their roots both in the problems raised during searchand in the frame problem of McCarthy and Hayes (13). The frame problem (13,14,15)is the problem of deciding which conditions change and which conditions do not change when a system undergoessome modification. The basis of the problem is that although it is possible to specify the ways in which a system'senvironment might changein terms of effects of actions it still remains to specify someway of deciding what stays unchanged in face of the actions. Early systems approaching these probleffis, (e.g., STRIPS (16) and PLANNER (17,18)) basically worked in the same way: for each of the actions allowed there was a list of conditions which were deleted by the action and a list of conditions which were added by the action. When one action was executed the conditions associated with these lists would be added to and deleted from the knowledge base. In what concerns the revision of the model of the environment, this approach presents two problems: the conditions to be added and deleted have to be carefully tailored as a set to avoid unintended infinite loops of adding and deleting information to the knowledge base; and if a proposition depends on another one that is deleted by some action then the former may be kept in the knowledge base if it is not part of the set of propositions explicitly deleted by the action. An alternative approach, context-layered knowledge bases, divides the knowledge base into smaller knowledge bases so that the consequences of the effect of an action can be grouped with a reference back to a causing action. Such an approach was taken by Fikes (19), who stores situations of a model in a tree, the context tree, in which each node represents a situation. The root of the context tree represents the initial situtation. Since most of the information in a given situation is the same as the information in the previous situation, as a matter of space efficiency, only the differences between the new situation and the old one are actually stored in the node of the context tree representing the new situation. Actions have the effect of creating a new situation in the context tree or returning to some previous situation. Fikes's approach presents the fotlowing drawbacks: The propositions about a given situation of the model are scattered along a path in the context tree and there is no record about the sequence of actions performed. similar approaches were taken in Refs. 2A-23. A new research direction was created by Stallman and sussman, who designed a system,.called EL, in which depen(1). EL maindencies of propositions are permanently recorded it both to using (trace) reasoning, its of tains a comptete record something when make to choices alternative decide which goes wrong and to explain its line of reasoning. Along with each derivid proposition, EL stores the set of all propositions directly used in its derivation and the rule of inference used to derive it, the d.ependency record of the proposition'
B E L I E FR E V I S I O N
EL solveselectric circuit problems.While searchingfor the values of the circuit parameters, EL may have to "guess" the operating range of some devices.Later, if an inconsistency is found, EL knows that somewhere along its way it guesseda wrong state for some device. The novelty of EL's approach to backtracking is that the assumption that is changed during backtracking doesnot necessarilycorrespondto the last choice made but rather to the assumption that provoked the inconsistency fdependency-directedbacktracking (qv)1. When an inconsistencyis detected,EL searchesthrough the chain of dependencyrecords of the inconsistent propositions until it finds all the assumptionsupon which the inconsistentpropositions depend. This set of assumptions is recorded as leading to a contradiction and is never tried again. Then heuristics are used to select one of them to rule out. Stallman and Sussman'swork (1) had two major influences in AI: it opened a new perspectiveto the handling of alternatives (dependency-directedbacktracking) and it triggered the researchon belief-revision systems.
59
inlists of the propositions in the knowledge base,starting with the SL justifications of the contradictory propositions,until it finds all the assumptionsconsideredby the contradictory propositions. One of those assumptionsis selectedas the culprit for the contradiction and is disbelieved. To disbelieve this assumption, TMS believes in one of the propositions referenced in the outlist of the assumption and justifies this proposition with an SL justification whose inlist contains the proposition representing the contradiction. After selecting the culprit for the contradiction, it is necessary to disbelieve all the propositions depending on it. This is done by following the chain of dependencyrecordsand disbelieving each proposition that has no SL justification other than the one that includes the selected culprit in its inlist. This "disbelieving process"is not as simple as it may seem owing to the possibility of circular proofs. Suppose,following an example from Ref. 25, that the knowledge base contains the following propositions: (Vr)[Man(r) -> Person(r)] (VrXPerson(r) -+ Human(r)l (VrXHuman(r) - Person(r)1.
Adding Man(Fred) to the knowledge base will causethe derivation of Person(Fred),which in turn will causethe derivation Building upon Stallman and Sussman'swork, Doyle (2,3) de- of Human(Fred). The addition of Human(Fred) causesPersonsigned the truth-maintenance systems (TMSs), the first do- (Fred) to be rederived. Figure L represents the dependencies main-independent belief revision system. TMS maintains a among the propositions in the knowledge base. In this figure, two directed arcs (labeled PR, for premises) knowledge base of propositions each of which is explicitly marked as believed or disbelieved.TMS may be told that some pointing to a circle mean that the two propositions at the end propositions are contradictory, in which caseit automatically of the arcs were combined to produce the proposition that is revises its beliefs so that no inconsistent propositions are si- pointed by the arc leaving that circle (labeled C, for conclumultaneously believed. sion): The inlist of the SL justification of a proposition pointed TMS is basedon the definition of two kinds of objects:propo- by a conclusion arc contains the propositions at the end to the sitions and justifications. Justifications represent the reasons premisesarcs leading to that proposition. If there exists a path of arcs from the proposition A to the proposition B, it means that TMS believes or disbelieves a certain proposition. Attached to each proposition in the knowledge base there is one that B depends on A. In Figure 1 Human(Fred) depends on (or more)justification(s) that supports TMS's belief or disbelief in the proposition. Although Doyle points out the usefulnessof (Vx) [Man (x) - Person (x)f Man (Fred) four kinds of justifications (4), he mainly implemented one of them, the SL (Support List) justifications. This type of justification contains two lists of propositions, the inlist and the outlist. The proposition supported by an SL justification is believed if and only if every proposition in its inlist is believed and every proposition in its outlist is disbelieved. Whenever one proposition is derived, it is justified by an SL justification (Vr) [Person(x) - Human(r/l containing all the propositions directly used in its derivation and the rule of inference used to derive it. Person Based on the Sl-justifications, there are two distinguished types of propositions in TMS: premises are propositionswhose current Sl-justification has empty inlist and empty outlist (premises are always believed); and assurnptionsare propositions whose current Sl-justification has nonempty outlist. Assumptions are propositions whose belief dependson the disbelief in other propositions. TMS may be asked to add a new proposition to the knowledge base or to change (add or retract) a justification for a proposition. In either caseTMS tries to find disbelievedpropoHuman (Fred) sitions that will be believed by such addition or retraction and tries to find believed propositions that will be disbelieved by the addition or retraction. ( V r ) [ H u m a n ( x ) * P e r s o n( x ) ] In addition, TMS may be told that two believedpropositions are contradictory. In this case the dependency-directedback- Figure 1. Knowledge base dependencies: PR _ premise; C : contracking mechanism is invoked, which will searchthrough the clusion. ExplicitConcernabout RevisingBeliefs
BELIEFREVISION
Person(Fred),which in turn dependson Human(Fred). This is called a circular proof. Supposenow that Man(Fred) is disbelieved. The dependencyarcs leaving Man(Fred) lead to Person (Fred). However, Person(Fred) has another justification and one is faced with the problem of whether to disbelieve Person (Fred) since, although one of its justifications is no longer valid, Person(Fred) may still be believed owing to the other justification. Handling circular proofs raises several problems. A discussionof the possiblesolutions to those problems can be found in Refs. 3 and 24. Doyle's researchtriggered the developmentof several belief revision systems (6,26-29). These systems share two characteristics: They are mainly concernedwith implementation issues, paying no special attention to the logic underlying the system, and each proposition is justified by the propositions that directly originated it.
Concernsfor Foundations The early 1980s saw the development of new research directions in belief revision systems, characterized by an explicit concern about the foundations of the systems independent of their implementations (8,9,30,31)and the use of a new type of justification (8- LL,32). One such system, the MBR (multiple belief reasoner) system of Martins (8,9), is describedhere. There are two distinct aspectsto consider concerning MBR: the logic underlying the system and the way the propositions in the knowledge base (generated according to the rules of inference of the logic) are interpreted by MBR. Any logic underlying belief-revision systems has to keep track of and how to propagate propositional dependencies.The concern for this problem is shared, although for different reasons,with the relevance logicians whose main goal is to avoid the paradoxes of implication. Relevance logicians developed logics that keep track of and propagate propositional dependencies.The logic underlying MBR, the SWM system was influenced by the relevance logic of Shapiro and Wand (33) and on the FR system of Anderson and Belnap (34). The SWM systemassociateseachpropositionwith one (or more) tripls(s), its support, which justifies the existence of the proposition. Each triple contains the following information: 1. The origin tag (OT) tells how the propositionwas obtained. Propositions can be hypotheses,normally derived propositions, or specially derived propositions(propositionswhose derivation sidesteps some of the relevance logic assumptions). This latter case is not discussedhere; see Ref. 8 for further details. 2. The origin set (OS) contains all the hypothesesthat were really used in the derivation of the proposition. 3. The restriction set (RS) contains every set known to be inconsistent with the proposition's origin set. A set is known to be inconsistent with another if it is inconsistent and a contradiction was in fact derived from that union. If the same proposition is derived in multiple ways, its support contains multiple triples. The OT and the OS reflect the way the proposition was derived. The Rs, on the other hand, reflects the current knowledge about how the hypotheses underlying that proposition relate to the other propositions.Once
a proposition is derived, its OT and OS remain constant, whereas its RS may change as contradictions are uncovered. The rules of inference of SWM use the RSs to prevent the derivation of propositions whose OSs would be known to be inconsistent. MBR is a belief-revision system that works with a knowledge base containit g propositions generated according to the rules of inference of SWM. In this knowledge base each proposition is associatedwith a support (in SWM's sense).MBR relies on the notions of context and belief space.A context is any set of hypotheses.A context determines a belief space,the set consisting of every propositionwhose OS is a subsetof the context which definesthat belief space.At any moment there is one active context, the current context, and the knowledge base retrieval operations are defined such that they only retrieve the propositions in the belief spacedefined by the current context. Figure 2 shows MBR's knowledge base originated by the example of the last section. In this figure a circle pointed to by an arc labeled DO (derivation origin) represents the support of the proposition at the end of the arc. Note that Person (Fred) has two supports. The arcs labeled OS leaving the support point to the hypothesesfrom which the proposition was derived. Since each proposition is directly connectedwith the hypotheses that underly it, there are no circular proofs. When a contradiction is detected,the origin sets of the contradictory propositions are inspectedand their union becomes a set known to be inconsistent. Every proposition in the knowledge base whoseorigin set is not disjoint from this newly discovered inconsistent set has its restriction set updated in order to reflect the current knowledge about inconsistent sets in the knowledge base. In MBR's implementation there is a considerableamount of sharing between knowledge base structures, namely, origin sets and restriction sets,which is possiblesince SWM's formalism guarantees that two propositions with the same OS have the same RS as well. systems versusAssumption-Based fustification-Based Any belief revision system must keep a record of where each proposition in the knowredge base came from. These records .r"lrrrpected while searching for the culprit of a contradiction. corThere are two ways to recoia tne origin of propositions' assumption-based to and responding to 5ustincation-based proposition ,yi"-, fsz). In justification-based systems each origdirectly that propositions the about information contains and 6,26-29' 2, 3, inated it. This approachwas used in Refs' contains 81. In assumption-basedsystems each proposition information about the hypotheses (nonderived propositions) that originated it. This approach was taken in Refs. 8-11, and 32. A5o*ption-based systems present several advantages over justification-based systems. These advantages are summarized by a comparison of the two systems discussedin this entry, TMS and MBR. An excellent comparison of the two can be found in Ref. (32). The advantages of as"pprouches sumption-based systems over justification-based systems are presented as follows: 1. Changing setsof beliefs.In TMS changing one set of beliefs into another can only be accomplished upon detection of a contradiction, in which casethe dependency-directedbacktracking goesthrough the entire knowledge base,marking
BELIEFREVISION
6I
H u m a n( F r e d )
(Vr) [Person(x) - Human (x)]
DO
(Vr) [Man (x) * Person(x)l
Man(Fred)
Person (Fred)
DO (Vr) [Human (x) - person(x)f
OS
Figure 2' Knowledge base dependencies:DO - derivation origin; OS : origin set.
and unmarking propositions. In MBR changing sets of be- the techniques developedby belief-revision systems.However, liefs is done by changing the current context. Afterward the there are someareas in which the techniques discussedin this knowledge base retrieval operations will only consider the entry are of paramount importance, some of which propositions in the new belief space.There is are listed no marking or below. unmarking of any kind. 2' Comparing setsof beliefs.In TMS it is impossible to exam- 1' Reasoning based on partial information, d,efault assumpine two sets of beliefs simultaneously. This may tions, and potentially inconsistentd,ata.This kind be impor_ of reasontant when one must outweigh the outcomeof ,"rr.r"l possiittg is likely to generate contradictions. Thus, it is of prible solutions. In MBR t"n"r"l sets of beliefs may mary importance that the system be able to determine colxist; the thus, it is simple to compare two solutions. causesof contradictions, remove them, and after doing so, 3' Backtracking- TMS relies on the dependency-directed be able to find every proposition in the knowredge backbase tracking mechanism, which follows the dependency depending on the selectedculprit (seeReasonirg, recdefault). ords, identifying all the assumptions leading to u gi,o.r, 2. Learning. A potential source of learning (qv) consists of contradiction. In MBR there is no backtracking of analyzittg the mistakes so that the same anyi.irra. mistake is not Upon detection of a contradiction, all the assumptions made twice. This calls for belief revision and unassignment of derlying that contradiction are directly identifiabre (they credit to the source of the mistake. are the union of the origin sets of the .ont"rdictory proposi- 3. Replanninq from failures. In any planning (qv) system tions). there should be a component that analyzes sourcesof prob4. Finding faulty assumptions.In MBR, upon lems and prevents thl generation detection of a of a plan that leads to contradiction, the hypotheses underlyi"g it trouble. Again, berief revision techniqrrl, are immedican be used to ately identified, making it easy to compur. sets detect the sourceof the problems and prevent of hypothe_ to the generases underlying contradictions. tion of ill-formed plans. 4' Reasoning about the _beriefsof other agents. Any program However, using only assumptions as support disables the that reasons about the beliefs of othei agents (see Belief explanation of the reasoning sequencefoliowed by the prosystems) should maintain a clear-cut distinction between gram.The system of Refs. 10, 11, and 32 uses its beliefs and the beliefs of the others. both assumpBelief-revision techtions and justifications, offering the advantages of both niques contribute to this application in aptheir concernswith proaches. the changing of belief rp".*. The program must be able to changebelief spaces,must know *rti.r, belief spaceis being considered,and must fail to consider the Appfications information from the other(s) belief space(s). The capability of determining the source of information cou- 5. Systemsfor natural-Ianguage und,erstand,ing(qv) (in which pled with the possibility of chlnging beliefs are essential feaone needsto considerseveral competing interpretations tures of any intelligent system. In general, any of a system that sentence)and uision (qv) (in which one needs has to chooseamong alternatives can use (and benefit to revise hyfrom) pothesesabout the contents of images).
62
BELIEF REVISION
6. Qualitatiue reasoning (qv), a kind of reasoning that requires making choicesamong alternatives (see,for example, ref 35). 7. Systemsthat selectbetweendesign alternatiues,which may have to change choices made. 8. Diagnoses(seeMedical Advice Systems). It should be kept in mind, however, that belief revision is only applicable in caseswere credit for consequences of choices is assignable. Referencesto other work in the domain of belief revision (both in AI and in other disciplines) can be found in Ref. 36, which presents an extensive reference list. References32 and 37 presents an excellent discussion of belief-revision techniques and problems. References3 and 8 give overviews of the field and discuss in detail the two systems presented here, TMS and MBR respectively.
BIBLIOGRAPHY 1. R. M. Stallman and G. J. Sussman, "Forward reasoning and dependency-directed backtracking in a system for computer-aided circuit analysis,"Artif. Intell.9, 135-196 (1977). 2. J. Doyle, Truth Maintenance Systemsfor Problem Solving, Technical Report AI-TR-4L9, MIT AI Laboratory, Cambridge, MA, 1978. 3. J. Doyl€, "A truth maintenance system," Artif. Intell. Lzr 23L-272 (197e). 4. Reference3, pp. 239-244. 5. P. London, DependencyNetworks as Representation for Modelling in General Problem Solvers, Technical Report 698, Department of Computer Science,University of Maryland, CollegePark, MD, 1978. 6. D. McAllester, An Outlook on Truth Maintenance, AI Memo 551, MIT AI Laboratory, Cambridge, MA, 1980. 7. D. McAllester, "A Widely Used Truth Maintenance System," unpublished, MIT, Cambridge, MA, 1985. 8. J. Martins, Reasoningin Multiple Belief Spaces,Technical Report 203, Department of Computer Science,State University of New York at Buffalo, Buffalo, NY, 1983. 9. J. Martins and Shapiro S. C., "Reasoning in Multiple Belief Spaces,"Proc. of the Eighth IJCAI, Karlsruhe, FRG, 1983, pp. 370-373. 10. J. DeKleer, "An Assumption-BasedTMS," Arfificial Intelligence 28, (L996). 11. J. DeKleer, "Problem Solving with the ATMS," Artifi,cial Intelligence28, (1986). L2. B. D. Clayton, "ART Programming Primer," Inference Corporation, April 1985. 13. J. McCarthy and P. Hayes, SomePhilosophicalProblems from the Standpoint of Artificial Intelligence, in B. Meltzer and B. Michie (eds.),Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, U.K., pp. 463-502, L969. L4. P. J. Hayes, The Frame Problem and Related Problems in Artificial Intelligence, in B. Elithorn and B. Jones (eds.),Artificial and Human Thinking, Jossey-Bass,San Francisco, CA, pp. 45-59, 1973. B. Raphael, The Frame Problem in Problem Solving Systems,in 15. N. Findler and B. Meltzer (eds.),Artificial Intelligenceand Heuris' tic Programming, American Elsevier, New York, pp. 159-169, L97L. 16. R. Fikes and N. Nilsson, "STRIPS: A new approachto the applica-
tion of theorem proving to the problem solving," Artif. Intell. 2, 189-208 (1971). L7. C. Hewitt, Description and Theoretical Analysis of PLANNER: A Language for Proving Theorems and Manipulating Models in a Robot, Technical Report TR-258, MIT, Cambridge, MA, L972. 18. G. Sussman, T. Winograd, and E. Charniak, MICRO-PLANNER ReferenceManual, Technical Report Memo 203, MIT, Cambridge, MA, L97L. 19. R. Fikes, Deductive Retrieval Mechanisms for State Description Models, Proceedingsof the Fourth IJCAI, Tbilisi, Georgia, pp. 99106, 1975. 20. S. Fahlman, "A planning system for robot construction tasks," Artif. InteII. 5, I-49 Q974). 2I. P. J. Hayes, A Representationfor Robot Plans, Proceedingsof the Fourth IJCAI, Tbilisi, Georgia,pp. 181-188, L975. 22. D. McDermott and G. Sussman,The CONNIVER ReferenceManual, Technical Report Memo 259, MIT, Cambridg., MA, 1972. 23. J. Rulifson, J. Derksen, and R. Walding€r, QA4: A Procedural Calculus for Intuitive Reasoning,Technical Report Note 73, SRI International, Menlo Park, CA, L972. 24. E. Charniak, C. Riesbeck,and D. McDermott, Artificial Intelligence Programming, Lawrence Erlbaum Associates, Hillsdale, NJ, 1980. 25. Reference24, p. 197. 26. J. Goodwin, An Improved Algorithm for Non-Monotonic Dependency Net Update, Technical Report LITH-MAT-R-82-23, Department of Computer and Information Science,Linkoping University, Linkopirg, Sweden,L982. 27. D. McDermott, Contexts and Data Dependencies:A Synthesis, Department of Computer Science,Yale University, New Haven, CT, L982. 28. H. Shrobe, Dependency-DirectedReasoning in the Analysis of Programs which Modify Complex Data Structures, Proceedingsof the Sixth IJCAI, Tokyo, Japan, pp. 829-835, 1979. 29. A. Thompson, Network Truth Maintenance for Deduction and Modeling,Proceedingsof the Sixth IJCAI, Tokyo, Japan, pp. 877879, 1979. 30. J. Doyle, SomeTheories of ReasonedAssumptions,Carnegie-Mellon University, Pittsburgh, PA, L982. 31. J. Goodwin, WATSON: A DependencyDirected Inference System, Proceedings of the Non-monotonic Reasoning Workshop, AAAI, Menlo Park, CA, pp. 103-Lt4, 1984. 32. J. deKleer, Choices without Backtracking, Proceedings of the Fourth AAAI, Austin, Texas, 1984. 33. S. C. Shapiro and M. Wand, The Relevanceof Relevance,Technical Report 46, Computer ScienceDepartment, Indiana University, Bloomington, IN, 1976. 34. A. Anderson and N. Belnap, Entailment: The Logic of Releuaruce and Necessity,Yol. 1, Princeton University Press,Princeton, NJ, r975. 35. B. C. Williams, "Qualitative Analysis of MOS Circuits," MIT, AILab, Technical Report TR-567, 1983. 36. J. Doyle and P. London, "A selecteddescriptor-indexedbibliography to the literature on belief revision," SIGART Newslett.7L,723 (1980). 37. J. de Kleer and J. Doyle, "Dependenciesand Assumptions," in The Handbook of Artificial Intelligence,Vol. 2, A. Barr and E. Feigenbaum (eds.),William Kaufmann, Inc., Los Altos, CA, 1982, pp. 72-7 6.
J. Mnntns Instituto Superior Tecnico,Lisbon
BEIIEFSYSTEMS
BEIIEFSYSTEMS
63
sentation [e.g.,Moore (2)]; and (c) psychologicalheuristic theories, also concernedwith reasoning but ,rcing techniques that make some explicit claim to psychological adequ..y-ruch theories typicalty are not concernedwith representational issues per se [e.g., Colby and Smith (19) and Wilks and Bien (20)1.
A belief system may be understood as a set of beliefs together with a set of implicit or explicit proceduresfor acquiring new beliefs. The computational study of belief systemshas focused on building computer systems for representing or expressing beliefs or knowledge and for reasoning (qr) with o, utout beliefs or knowledge. Such a system is often expressedin terms of a formal theory of the syntax and semantics of belief and PhilosophicalBackground knowledge sentences. Much of the data, probleffis, and theories underlying AI research on formal belief systems has come from philosophy, in Reasonsfor Studying Such Systems.There are several dis- particular, epistemology, philosophy of langu dge, and logic tinct, yet overlapping, motivations for studying such systems. (especially modal and intensional logics). As McCarthy and Hayes, two of the earliest contributors to this field, have explained (1), Philosophical lssues.There are several philosophical issues-logical' semantic, and ontological-that tr,ave been faced by AI researchersworking on belief systems. A computer program capable of acting intelligently in the world must have a general representation of the world. . 1. The problem of the relationship between knowledge and [This] requires commitments about what knowledge is and belief. This problem, dating back to Plat o'sTheaetetus,i", ,rr,rhow it is obtained. . . . This requires formalizing concepts ally resolved by explicating knowledge as justified true belief of causality, ability, and knowledge. (seeRef 2L for the standard critique of this view and Ref 22 for a discussion in the context of AI). Thus, one motivation is as a problem in knowledge representa2. The problem of the nature of the objectsof belief, knowltion (see Representation, knowledge). In the present context edge,and other intentional (i.e., cognitive) attitudes: Are such this might less confusingly be refened to as "information repobjectsextensional (e.g., sentences,physical objectsin the exresentation" since not only knowledge but also beliefs are repternal world) or intensional (i.e.,nonextension"t;..g., proposiresented. A secondmotivation is as a componentof computations, concepts,mental entities)? tional studies of action. Subcategories of the latter include 3. Problems of referential opacity: the failure of substitutplanning systems (e.g., Ref 2), systems for planning speech ability of co-referential terms and phrases in intentional conacts (e.9., Ref 3), and systems for planning with rn,rltipl. texts. This can best be illustrated as a problem in deduction: agents (e.g., Ref 4). These systems frequently involve repreFrom senting and reasoning about other notions as well (such as can, wants, etc.). Susan believes that the Morning Star is a planet A third motivation is the construction of AI systems that and can interact with human users, other interacting AI systems, or even itself (..9., Refs S and G). Among the subcategories The Morning Star is a planet if and only if the Evening Star here are the study of user models for determining appropriate is a planet, output (e.9.,Refs 7 and 8) and the prediction of others' blhavior and expectationson the basis of their beliefs (e.g.,Ref g). A it does not logically follow that fourth motivation is directly related to such inteLction: the study of AI systems that can converse in natural language -base,, Susan believes that the Evening Star is a planet. (e.g., Ref 10), either with users or with a "knowledge (e.g.,Ref 11). A fifth motivation is the study of reasorrlng' how Nor from a particular individual reasons (Ref L2) or how reasoni"g can be carried out with incomplete knowledge (e.g., Ref 18) or in Ruth believes that Venus is a planet the face of resource limitations (e.g.,Ref 1 ). Finally, there is and the ever-present motivation of modeling a mind (..g., Refs lb Venus - the Evening Star and 16) or providing computational theories of human reasoning about beliefs (e.g.,Refs LZ and 1g). does it logically follow that Typesof Theories. There are four overlapping types of theoRuth believes that the Evening star is a planet. ries identifiable by research topics or by rlr.urch methodologres. One is belief revision (qv), which is concernedwith the 4. The problem of quantifying in (i.e., into intentional conproblem of revising a system's databasein light of new, possitexts): From bly conflicting information; such theories ur. dealt with in another entry. The other types of theory can be usefully cateCarol believes that the unicorn in my garden is white, gorized [by augmenting the scheme of McCarthy and Hayes (1)l as (a) epistemologicaltheories, concerned primarily with it does not logically follow that representationalissues[e.g.,McCarthy (9)]; (b) formal heuris_ tic theories, concernedprimarily with the logic of belief and There is a unicorn in my garden such that Carol believes knowledge, that is, with reasoning in termr oi a formal reprethat it is white.
64
SYSTEMS BELIEF
or b. problems of logical form (or semantic interpretation, .,knowledge repres.tttution" in the sense of AI): How should what are the following kinds of sentencesbe understood, and knowledge? and belief of cases simpler with relatioiships their is the same as Margot knows whether Ben's phone number Ariana's. Mike knows who SallY is' philosopher. Jan believes that stu believes that he is a the movie at that believe Harriet and Frank mutually Loew's starts at 9 P.m' de dicto 6. The problem of the distinction between de re and is not one actions' person's a of a cause is belief beliefs: When a how in but also only interested in what the person believes, in a interested only the person believes it. That is, one is not in also but beliefs, agent's the third-person charac terization of that suppose beliefs. those of the agent,s owncharacterization janitor stealRalph seesthe person whom he knows to be the to ing some go.r.rrrment documents, and suppose-unknown Ralph Then lottery. just the won Ralph_triat the janitor has believes de re believes d.ed,ictothat the janitor is a spy, and he Ralph would asked, if is, That spy. a is that the lottery winner he merely but janitor spy"; a is "Th; proporition assent to the winner lottery believes of th; *un known to the hearer as the winlottery "The to that he is a spy-Ralph would not assent referis a dicto de ef beli a viewed, ner is a spy." iraditionally rs referentially entially opaque context, whereas a belief de re inference the Thus, transparent. Ralph believes lde dictol that the janitor is a spy' The janitor - the lottery winner' a spy' Ralph believes lde d,icto)that the lottery winner is
(A4) v(KoP --- P). (A5) r(Kop -> KoKoP). (A6) '([Kop A K"(p + g)1-
Koq)
Roughly, (AB) says that o knows all theoreffis, (A4) says that whai is known must be true (recall that knowledge is generally consideredto be justified true belief), (A5) says that what is tnown is known to be known, and (A6) says that what is known to follow logically from what is known is itself known. A (propositional) logic of belief (a propositional doxastic logic) (Aa); other can be obtained by using operators Bo and deleting simitaking by epistemic and doxastic logics can be obtained logics' lar variants of other modal possible-worldssemantics for epistemic and doxastic logics can be provided as in ordinary modal logics by interpreting the accessitility relation between possible worlds as a relation of epistemic or doxastic alternativeness. Thus, for example, Kop is true in possible world w rf and only tf p is true in ' possibleworld w' for all w that are epistemic alternatives to w. Intuitively, o knows that p if and only tf p is compatible with everything that a knows lsee Hintikka Q3,24) for details]. (or accessibility) Various restrictions on the alternativeness relation yield correspondingly different systems.Thus, s4 can be char actertzedsemantically by requiring the relation to be seonly reflexive and transitive. If symmetry is allowed, the p -' KoP : mantics chara ctertzesthe stronger system S5 54 * a Ko -- Kop. (Roughly, what is unknown is known to be un-
known.) Note that none of these systems is psychologically plausible. For example, no one knows or believes aII tautologies or all logical consequencesof one's knowledge or beliefs as suginfalse presents gested by (A6). Nor is it clear how to interpret (A5)-is the is invalid. Moreover, its conclusion not only namelY, information' of loss a consequentto be read as"aknows that o knows that p" or as"a, formation but it also represents Ralph's of "content" propositional the knows that he (or she) knows that p,,?-rtot whether it is of the information aboui hand, plausible. Indeed, some philosophers feel that there are no other the belief. On axioms that charac tertze u pry.hologicalty plausible theory of spy. a is janitor he that the of rel believes Ld,e Ralph belief. There is a large philosophical literature discussing (1967), these issues [e.g.,Ref i5, thu special issuesof Noas 1 epistemic a spy' and synthdse 2r (1970)1.other formalizations of Ralph believes lde ref ofthe lottery winner that he is (26) logics that are of relevance to AI are to be found in Sato philosophijust information little the of as and McCarthy et al. Qn Further discussion is valid. But the conclusion conveys the first premise. cal issuu, *uy be found in Ref 28, The Encyclopediaof Philosoabout Ralph's actual belief d,ed,ictoas does recommending phy (2g), an&through The Philosopher'sIndex.Interesting reAn AI system that is capable of explaining or with two kinds these between aistinguish to able be must cent work on the semantics of betief sentencesdealing behavior Refs in representing found of be means may linguistic and computational issues of belief reports by having two distinct 30-33. them. point of EpistemicLogic. of central importance from the proknowledge and view of AI have been the logics of belief Hinof fragment propositional posed by Hintikka (23). The logic) can tikka's logic of knowledge (propositional epistemic modal logic s4 the of variant notational a as be axiomatized family (seeModal logic), replacing the necessityoperator by a a individual each for Ko, of proposition-forming operators are axioms The p"). that knows tK"p is to be read"a (A1) If P is a tautologY, then FP' (A2) If rP and '(P - g), then Fg' (A3) If vp, then vKoP'
Surveyof Theoriesand SYstems In this section the major published writings on belief systems types of are surveyed following th; three-part cat egorizationof reminded is reader The types. the *itttin liner by arrd theories that the categt rrzatton is highly arbitrary and that virtually aII of the research falls into more than one category. Theories Epistemological iarly Wori. One of the earliest works on AI belief systems' system of by Mccarthy and Hayes (L), begins by considering a are detergiven time a at states whose ata autom interacting
BELIEF SYSTEMS 65 mined by their states at previous times and by incoming signals from the external world (including other automata). A personp is consideredto be a subautomaton of such a system. Belief is representedby a predicateB, where B o(s, w) is true if p is to be regarded as believing proposition w when in state s. Four sufficient conditions for a "reasonable" theory of belief are given: l. p's beliefs are consistent and correct. 2. New beliefs can arise from reasoning on the basis of other beliefs. 3. New beliefs can arise from observations. 4. If p believesthat it ought to do something, then it doesit.
(generatedby the de relde dicto distinction) because it doesnot allow for the full hierarchy of Fregean senses(gb). The three readings are: believes(pat,Wants{Mike, Meetg{Mike$, Wifeg Jimg}}) believes(pat,Exist p$.Wants{Mike, Meetg{Mike$,p$}} And Conceptof{Pg,Wife Jim}) lP$ P.believes(pat, wants{Mike, Meetg{Mikeg, p$}}) n conceptof(Pg,P) n conceptof(p,wife jim)
Here, if mike is the name of a person whose concept is: Mike, then Mike is the name of that concept and its concept is: Mike$, etc. It is not clear, however, that such a hierar.hy i, neededat all (cf. Ref. 8T) nor whether McCarthy's notation is However, criterion 1 is psychologically implausible and seems indeed incapable of representing the ambiguity. Creary does, to better characterize knowledge; criterion 4 is similarly too however, discuss reasoning about proposiiional attitudes of strong. Knowledge is represented by a version of Hintikka's other agents by "simulating" them using ,,contexts,,-temposystem (23): The alternativeness relation, shrug(p, sl, sz is rary databasesconsisting of the agent's beliefs plus common ), true if and only if: if p is in fact in situation s2, then for all he beliefs and used only for reasoning, not for ,.pr.r.ntation knows he might be in situation sr. (A "situation" is a complete, lthus escaping certain objections to "database approaches,, actual or hypothetical state of the universe.) Koq is true (preraised by Moore (seeRef. z)l.Creary,s system was subjected sumably at s) if and onry if vttshrug(p, t, s) A(D),where q(t) to criticism and refinement by Barnden (Bz). is a "fluent" -4 Boolean-valued function of situations-that BeliefSpaces.The problems of nested beliefs and of the de "translates" Q, and where shrug is reflexive and transitive. re-de dicto distinction suggest that databasescontaining repAlthough this paper is significant for its introduction of philoresentations of beliefs should be partitioned into units (often sophical conceptsinto AI, it discussesonly a minimal reprecalled "contexts," "spacesr"or "views") for each believer. one sentation of knowledge and belief. of the earliest discussionsof these issues in a computational A more detailed representation is offered by McCarthy (b,g) framework was by Moore (36), who developeda LlSir-like lanin which individual concepts-that is, intensional entities guage, D-SCRIPT, that evaluates objectsof belief in different somewhat like Fregean senses-are admitted as entities on a environments (see also Ref. Zg.) Another early use of such par with extensional objects,to allow for first-order expression units was Hendrix's (88) partitioning of semantic networks of modal notions without problems of referential opu.ity. No- into "spaces"and "vistas": The former can be usedto represent tationally, capitalized terms stand for concepts, lowlrcase the propositions that a given agent believes; the latler are terms for objects.Thus, know(p, X) is a Boolean_valued(extenunions of such spaces.similarly, schneider (gg) introduced sional) function of a person p (an extensional entity) and a "contexts" to represent different views of a knowledge base, conceptX (an intensional entity), meaning "p knows the value and Covington and Schubert (40) used "subnets" to ,frr.rent of x," defined as true Know(p, x), wherl ir* is a Booleanan individual's conceptionof the world. Filman et al. f+f l treat valued function of propositions, and where Know(p, x) is a a context as a theory of some domain, such as an agent,s beproposition-valued(i.e., concept-valued)function of a person liefs, with !h" ability to reason with the agent's beliefs in the conceptp and a concept x. Nested knowledge is handled by context and about them by treating the context as an object in Know rather than know; thus, "John knows whether Mary a metacontext. knows the value of X', ts Knoru(John Know(Mary, X)). , The Fully lntensionalTheories.The notions of intensional entiHintikka-style knowledge ("knowledge-that") is r"pr"rented ties and belief spacescometogether in the work of Shapiro and by a function K(P, e), defined as (e And Knout(p, e)); thus, his associates.Maida and Shapiro (16) go a step beyond the "John knows that Mary knows the value of x,, is K(John, approach of McCarthy by dropping extensional entiii., altoKnow(Mary, X)). A denotation function maps intensional con- gether. Their representational schemeusesa fully intensional ceptsto extensional objects,and a denotation relatio rt,d,enotes, semantic network in which all nodes represent distinct conis introduced for conceptsthat lack corresponding objects.An cepts, all represented concepts are represented by distinct existence predicate can be defined in terms of th; laiter: true nodes,and arcs represent binary relations between nodesbut ExistsX if and only if IIr ld,enotes(X,r)1. Belief is not treated in cannot be quantified over (they are ,.nonconceptual,,). The en_ nearly as much detail. FunctionsBelieueand belieueare intro- tire network is considered to model the belilf system of an duced,though so are functions belieuespyand notbelieuespy(to intelligent agent: nondominated propositional nodls represent handle a celebrated pu zzle of referential opacity concerning the agent's beliefs, and "base" ttoh", represent individual conspies; see Linsky (28)), yet no axioms are provided to relate cepts. [Similar philosophical theories are those of Meinong them to each other or to the ordinary belief functions. [A simi- (42) and Castafleda (48); see Rapaport (44).1Two versions lar theory in the philosophical literature was of described in 'know' are treated (both via .g"trt-verb-object case frames): Rapaport (84).1 hnowl for "knows that" and inow2 for "knows by acquaintCreary (I7) extended McCarthy's theory to handle concepts ance." There are correspondingversions of ,believJ, of concepts.According to Cre dty, McCarthy's notation rtrr""gh it cannot is not clear what belieue2is); the fundamental principle represent three distinct readings of con_ necting knowledge and belief is that the system believesl that an agent knowsl that p only if the system believesl both that Pat believes that Mike wants to meet John,s wife the agent believesl that p and that ift" agent believesl that p
BELIEF SYSTEMS
for the right reasons.Unlike other belief systemS,their system Roughly, (A7) says that if a is common knowledge, then it is can handle questions,as queries about truth values (which are common knowledge that S knows it; (A8) says that if B follows represented by nodes).Thus, whereas most systems represent from a in K4, then F is true in the context of a in KI4; and (Ag) "John knows whether p" as "John knows that p or John knows saysthat if B doesnot follow from a in K4, then it is not true in that -p," Maida and Shapiro (16) consider these to be merely the context of a in KI4. The context operator may be explained logically equivalent but not intensionally identical; instead, as follows: If a - [S]9, then [a] identifies S's theory whose they represent it as "John knows2 the truth value of p." axiom is g. Thus, "all S knows about p is that et or q2" canbe Among the consequencesof the fully intensional approachare representedas: [a][S]p, where a - [S]qr V [S]gz. (1) the ability to represent nested beliefs without a type hierKobsa and Trost. Kobsa and Trost @il use the KL-ONE archy [see Maida (18)], (2) the need for a mechanism of co- knowledge representation system, augmented by their version referentiality (actually, their "a EQUIV b" represents that the of partitions: "conte;f,s"-collections of "nexus" nodes linked systembelieuesthat a and b arc coreferential), (3) the dynamic to "concept"nodes,representing that the agent modeledby the introduction of new nodes, through user interaction, in the context containing the nexus nodes believes propositions order they are needed (which sometimes requires node merg- about the concepts. There is a system context and separate ing by means of EQUIV arcs), and (4) the treatment of all contexts for each agent whose beliefs are modeled, with extransitive verbs as referentially opaqueunless there is an ex- plicit (co-referential-like) links between isomorphic structures plicit rule to the contrary. in the different contexts (instead of structure sharing or patRapaport and Shapiro (45) lsee also Rapaport (a6)] make tern matchirg). Of particular interest is their use of "embedessential use of the notion of a "belief space"to represent the ded" (i.e., nested) beliefs to represent recursive beliefs (the distinctions between de re and de dicto beliefs. In dynamically special case of nesting where a lower level context models a constructing the system's belief space,they follow the princi- higher level one, as in the system'sbeliefs about John's beliefs ple that if there is no prior knowledge of coreferentiality of about the system's beliefs) and mutual beliefs (by linking the conceptsin the belief spacesof agents whose beliefs are being context for one agent embeddedin the context for another with modeled by the system, then those concepts must be repre- the embedding context). sented separately. This has the effect of reintroducing a kind of hierarchy [see the discussion of Creary (17), above], but FormalHeuristicTheories there is a mechanism for "merging" such entities later as new Moore. One of the most influential of the formal theories information warrants. Thus, the conjunctive de dicto proposi- (both epistemological and heuristic) has been that of Moore (2,47,56).His was the first AI theory to offer both a representation "John believes that Mary is rich and Mary believes that Lucy is rich" requires four individuals: the system'sJohn, the tional scheme and a logic and to show how they can interact system's John's Mary, the system's Mary, and the system's with other notions to reason about action. For his representaMary's Lucy. But the de re proposition "John believes of Mary tion, Moore uses a first-order axiomatization of the possiblethat she is not rich" only requires two: the system'sJohn and worlds semantics of Hintikka's 54 lrather than the modal axithe system's Mary. This technique is used to represent quasi- omatic version; it should be noted that Moore (2) erroneously indicators: Virtually all other systems fail to distinguish be- addedthe S5 ruleJ. Specifically,he introduces a predicate T (w, tween "John believes that he. is rich" and "John believesthat p) to represent that the object language formula p is true in John is rich" [although Moore (47) briefly discussesthis]; the possibleworld w, and the predicateK(A, wl, w2) to represent starred, quasi-indexical occurrenceof "he" is the system'sway that w2 ts possibleaccordingto what A knows tn wL. "A knows of depicting John's use of 'I' in John's statement, "I am rich." thatp" is then representedby Know(A,p), which satisfiesthe This is represented as a de dicto proposition requiring two axiom:T (utL,Know(aL,p1)) - S w2(K(aL, wt, w2) -+ T (w2 , individuals: the system'sJohn and the system'sJohn's repre- p1)). Since Moore is concernedwith using knowledge to reason sentation of himself (which is distinct from the system'sJohn's about actions, he formulates a logic of actions, where complex John). actions are built out of sequences,conditionals (defined in Other Theories.Among other theories that may be classi- terms of Knorp), and loops,and a logic fot "can," understoodas fied as epistemological (though some have considerableover- "knowing how to do." The criticisms one can offer of Moore's lap with formal heuristic theories) are the important early work are both two-sided: (1) its psychologicalinadequacy (priwork of Konolige (48), a series of papers by Kobsa and his marily due to his reliance on Hintikka's system)-but, of colleagues(49-52), Xiwen and Wiede (53), and Soulhi (54). course,this is shared by most other formal theories-and (2) Konolige.Konolige (48) is concernedwith the other side of its similarity to much work that had been going on in philosothe coin of knowledge: ignorance. In order to prove ignorance phy during the 1960sand 1970s,but here it must be noted that based on knowledge limitations l"circumscriptive ignorance"; one advant ageof (some)AI theories over (some)philosophical seeMcCarthy (55)1,he uses a representation schemebasedon theories is the former's attention to detail, which can often a logic called KI4, an extension of the work of Sato (26). KIA indicate crucial gaps in the latter. (Moore's critique of the has two families of modal operators:knowledge operators,[S], database approach is discussedbelow.) for each agent S, and (what might be called "context") operaKonolige.Konolige and Nilsson (6) consider,from a tbrmal tors, [a1,for each proposition a; and it has an agent 0 ("fool"), point of view, a planning system involving cooperatingagents. where [0]a means "A is common knowledge." The axioms and Each agent is representedby a first-order language, d "simularules of KIA include analogs of (A1)-(A6)(system K4), plus: tion structure" (a partial model of the language), a set of facts (expressedin the language and including descriptions of other -+ t0ltSla (A7) r-[0]CI agents), a "goal structure" (consisting of goals and plans), a deduction system, and a planning system. An agent uses a (A8) If a tsx+F,then FKI+tal,B formal metalanguage to describe the langUages of other (Ag) If not-(orrx+F),then FKI4- talF agents and can use its representation of other agents (or it-
BETIEF SYSTEMS
67
self-but not quasi-indexically) to reason by simulation about knowledge base but as a query language. He defines a firsttheir plans and facts in order to take them into account when order language g that has its singular terms partitioned by making its own plans. Belief, rather than knowledge, is taken means ofa relation u into equivalence classesofcoreferential as the appropriate cognitive attitude, to allow for the possibil- terms; the classesare referred to by numerical ,,parameters" ity of emor [not allowed by axiom (A4), above],and ,,agentA0 (for the knowledge base to be able to answer wh-questions). g believesthat agent A1 believesthat agent A0 is holdinl object has a truth value semantics based on a set s of ,.primitive,' B" is representedby FACT(AI, ,HOLDING(AO, B),) (true) sentences,and g is said to describe a ,,world structure" "pp.uring in AO's FACT-list. Although an analog of axiom (A5) is taken (s, u). Levesque argues that although g may be sufficient to as an axiom here, the analog of (4,.6)is not, since (1) their query the knowledge base about the world, it is not sufficient system allows different agents to have different deduction sys- to query it about itself. For this, g is extended to a language tems and (2) the deductive capabilities of the agents are con- ?f9, containing a knowledge operator -I( and satisfying two sidered to be limited. principles: (1) "every logical consequenceofwhat is known is This theory was made more rigorous in Konolige (14) [see also known," but not everything is known (i.e., the knowledge also Ref. 57l.Here, a planning system with multiple agents base is "an incomplete picture of a', possible world); and (2) ia has a "belief subsystem" consisting of (1) a list of ,,base,'sen- pure sentence(i.e., one that is about only the knowledge base) tences(about a situation) expressedin a formal language with is true exactly when it is known" (i.e., the knowledge base is a modal belief operator and a Tarski-like truth lrul,r" ,"*unan accurate picture of itself). The operator K satisfies slightly tics; (2) a set of deduction processes(or deduction rules) that modified axioms for I (which are like those for a typicalhrsi_ are sound, effectively computable, have "bounded,'input, and order logic), plus: are, therefore, monotonic; and (B)a control strategy (for apply_ ing the rules to sentences).Belief derivation is "total"; thal is, all queries are answered in a bounded amount of time. The If rso, then ryyKc. system is deductively consistent (i.e., a sentenceand its negarys((Ka A K(q- F)) -- KF). tion are not simultaneously believed), but it is not logically rws,(YxKa --+ I{Vra). consistent (i.e., there might not be a possible world in which If c is pure, then rys(a = K6). all beliefs are true). Thus, somern""r,rre of psychologicalplausibility is obtained. A system can be deductivety though not logically consistent if there are resourcelimitations on deducThe first ofthese says, roughly, that ifa is provable in g, then tions; that is, the deductive processesmight be incomplete "c islmown" is provable inXtg; the secondis similar to (A6); becauseof either weak rules or a control stratery that doesnot the third says, roughly, that if everything is such that o is perform all deductions. Konolige uses the former (though his k-nownto hold of it, then it is known that everything is such sample of a weak rule- modus ponensweakenedby conjiirrirrg that c holds of it; and the fourth says, roughly, thut th" If a "derivation depth" to each sentence-seems to require a operator is redundant in pure sentences.Semantically, ifft is a -modus nonstandard conjunction in order to prevent ordinary set of world structures (i.e., those compatible with the knowl_ ponens from being derivable). The system satisfiestwo properedge base), then I{a is true on s, u, & if and only if a is true ties: closure (sentencesderived in the system are closedunder on all (s', u') in k. It should be observedthat K is more like the deduction rules; i.e., all deductions are made) and recura belief operator since Ka -> a is not a theorem, whereas sion (the belief operator tSl is interpreted as another belief KjKl--> p is. Two operations on an abstract data type KB can system). Thus, [S]a means that a is derivable in S's belief then be defined roughly as follows: (I) ASK: KB x if-g _{yes, system. A "view" [similar to Hendrix's ,,vista,, (Bg)] is a no, unknown), where ASK : yes if Ko is true in KB; .A,sit : belief system as "perceived through a chain of agents,,; for no if K- a is true in KB; and ASK is unknown otherwise. (II) example u - John, Sue is John's perception of Sue'sbeliefs. To TELL: KB x 1fg -+ KB, where TELL : the intersection of KB bound the recursive reasoning processes,the more deeply with the set of all world structures on which the query is true. nested a system is, the weaker are its rules. Konolige presents Although the query language is epistemic, Levesque proves a a Gentzen-style propositional doxastic logic B consisting of: representation theorem stating that the knowledge in KB is the axioms and rules of propositional logt; a set of rules for representable using g [essentially by trading in KJ for Fy(k ___> each view u; and,for each r, (1) a rule Cuj* (essentialtymodus c),_where k may be thought of as thl conlun-"ctionof seniences ponens) that implements closure (2) a rule 85 that formalizes , in KBl. agent i's deductive system in view z (roughly, the rule is that if a sentence 6 from some set of sentencesA can be -Jn Ref. 59, principle 1 is weakened, for several psychologi_ inferred cally interesting reasons: (a) it ignores resource limitatiois; using the rules of the view v, i from a set of sentences f that (b) it requires beliefofall valid sentences;(c) it ignores differ_ are believedby S;, then [S;,lAcan be inferred using the rules of encpg-between logically-equivalent, yet distinct, sentences; z from [s;]f), and (3) a rule B,that says that *rryihing can be and (d) it requires belief of ail sentences if inconsistent ones derived from logically inconsistent beliefs. B is strons", than are believed. To achieve an interpretation sensitive to these, might be desired, since, if the z rules are complete and recur- two belief operators are used: Ba for,.o is explicitly (or ac_ sion is unbounded, B is equivalent to Sb (A4). Konolige tively) believed" and, La for ..a is implicit in what is blheved.,, points out, however, that it can be weakened (A4). to s4 To distinguish (A) situations in which only a and a __> Levesgue.A very different approach was taken B are by believed from (B) those,in which they are believed togltfr", Levesquein a seriesof papers (11,bg,59)on knowledge bases. wit! F-without being forced to distinguish (C) situati-ons in The problem he confronts is that of treating a knowlfige base which only a y B is believed from (D) tt or" in which only B y a that is incomplete (i.e.,that lacks someinformation neededto is believed-Levesque uses ,,partial possible worlds,,'in whicfr answer queries) as an abstract data type. However, his use of n9t sentencesget truth values. A formal logic is defined in 1ll epistemic logic is not as a representation device within the which Z is logically'bmniscient', (much like Levesque,s ear_
58
SYSTTMS BETIEF
-- Lc is valid, but lier -[f), but B is not. More precisely: (i) Ba -+; (iii) B need not its converseis not; (ii) B is not closedunder logically equivaof two both to apply to all valid sentencesor of great philobeliefs. (iv) inconsistent B allows lent ones; and -'> BB if and only if a Ba that a theorem is interest sophical logic (see Ref. B, where entails comes from relevance "itol, 60)'
(BA) TaBELIEVE(P) V aBELIEVE(Q) aBELIEVE(P V Q) (B4) TaBELIEVE(P & Q) -->
"knowledge" what appears to be a notational variant of an accessibility relation iefined, however, not between possible worlds but between possible sets of answers to a questiorl ;;il;; ;;;: McAllester (13) add knowredge operators ," " ilgi;;;; (;;5;":d;;; Rabin soning about likelihood. [Halpern and 4r'rveowr64vtvrlv et al. (62,68, and Gb)have extended thei" f""*;ii;;;;;it"t into these and other logics.l
(Bg) r All agents believe that all agents believe
aBELIEVE(p) & aBELIEVE(e) (B5) TaBELIEVE(P) --+ -aBBLIEVE(-P) (86)
r-aBELIEVE(P -+ Q1-.' (aBELIEVE(P) -
aBELIEVE(Q)) :'lii::::",Tfi-T"Ifft:iJ:ffi11ffi --- aBELIEVE(3xP(x)) ",f.'T:"'*".i'i;,ffi colleagues(18,62-6b).Nilsson (61) attempts a formalization of G?) r!lxlaBELIEVE(P(x))l bt.$,iig (81)-(87) (actually, belief) without a K operator
psychologicalHeuristic Theories. This category of rryearcf, which attempts to be more psychologically realistic than eia ther of the preceding two, may be further subdivided along psychomore the to formal more ruur'-_-rr the urtt spectrum ranging from logical. More Formal than psychologicaLThere are two major, and related, topics investigated under this heading: speechact theory and mutual belief. "Spe".h ActTheory. Speechacttheory, developedbythe phi-Grice, and Searle eonsidersthe basic unit of Austin, losofhe"s iirrsoirti. communication to be the rule-governed production an of a token of a sentence(or word) in the performance of a statement making of (such act the as act illocutionary speech
plausiThey admit that this is too strong to be psychologicallv axiomatized' not but represented arso are bre."Agents' wants and' Leuesque' Cohen and Levesque (67) claim that .--Cot'en iJlocutionary act definitions can be derived from statements this redescribing ihe recognition of shared plans and that the perhaps offer They beliefs' qures a lefinition of mutual of representation plausible' psychologically most honest' if not (BEL r p) is true if and only if p follows from what r believes (KNow*p)isdefinedas_(ANDp(BElrp))andKNowlFrp) is used to as (oR txifuw r p)(KNolil r (Nor p)))' The latter (2)' MuMoore of lines the d"firr" u., if-then-else rule, along by is characterized below) detail more (discussed in tual belief two axioms: If rp, then r(MB r y p)' = (BELr (ANDp (MB yxflD' r(MBryp)
*':lr"l"-;"*l';1.ffi:l:f"nilTiJ:':'il"$ll"i;l? f ;*itffiiltTJi::i":XT:r:i'::5lT';'.il splaker S meanssomethingbv his or !e1 "lt"l1ll"-,i f: ;."*hlr);believes'thatpeimpliesthattheresultofrdoingo dressedto hearerl/ if and onlv if, *"ChI{}: t:1":1":":f ir-qr-;d t6atp; implies thai r's making q;-1 true therebv by II in effect a certain produce utterance of U to , ft). Various illocutionary opera;.il"r g; true (fir i = 1, . 1'e1lt ctetarrs :1T lurther and (see references intention recognitionofthis ilir --- ."" .fr.tacterized using notionssuchas these' 66). This researchprogramwas continued Ref. in efi"" perrault (68) in orderto model"helpful" linguisCohenand,Perrauh. Cohen and Perrault !!l ajtgmgi to bv Allen ""aperrault. and intentrorts possible the models by a hearer(much provide"a theory that formally Ii. b"h.uior, that is, appropriateresponses plans" intentions.as treating by They offer a sim. . below). acts see untlerlying speech i' tn" *urrrrer of user modeling; is presented which person), first (stated the in involving;thecommunicationofbeliefs."Plans,areT::l:_11? ;il;;fit someof the of "action" operators,.wli.l .?"::t1:l illustrate (in to order prespeciiedsequences gene.ality *ore ir* i" preconditions,bodies,andeffectsandareevaIuatedrelatrveto
(incruai,,g *oa"i.':?T;"';i:ffi!;:T"tlifHJi,k;;ffi1[#liH:;X"T,"r'"ff'r:*;il moder worrd thepranner,s beliefs). When the action operatol i1.a speech must be two preconditions: that S inteilocutor's for the approact, it takes beliefs and goals and returns plans th9ory for of adequacv 1 9r f"i.tu ,p"".h act. Their criteria b:tl:f: AGTl's agent (1) distinguish beliefs are that it must from AGTl's beliefs about AGT2's beliefs t"1(2l3U"Y +::l to represent (a) that AGT2 knows whether-P without,AGl'r (b) that having to know which of P and -P AGTZ believes and
(S) is tired, there F;il|h"thai he- is tired and that he (S) intend thatH believe ;;i; ii"i fr". (S) is tired, and there should be the effect that I/ (1) i"ii"u. that S is tired. Their methodologv is as follows: to wants agent an if example, planning rules; for ift"i" "." ""fri*"panddoesnotknowwhetherPistrue,thentheagent
knowswhat the r such l9T1 of beliei *iui 6CT'Z thinks the r suchthat Rac is' Their logic takes BELIEVE as a relation (though thev call,iteTfl operator)betweenan agent and a proposition'satisfyingthe iJrio*i"g axioms(for eachagenta):
irr-o*f"ig"ofplanningandhisorherbeliefsabouttheagent's actions;for iii nt""" u"" ilrrf"."rr.e rules for inferring if S beplanningrule_above, the ""lfr. to :;;;pi;'**"sponding then S true' P is iii""r ifr"t A has a go"l of ktto*ing whether believe may S or goal of achievingP *uv L"turru thatA has a
that3' R"'.;^"*"l,ril-igTrand thatRob berieves AGr2 Fc"Y#ili"x""-f##t"j,TJrllixiiii$L?':"tff;"1 kl",Illc" that Ror is without
thenTaBELIE'E(P) losic, ofnrst-order (81) rfpisanaxiom l{'-tlm .uz)
r-aBELIEVE(p)--+ aBELIEVE(aBELIEVE(P))
;
;i";
'.,E;::}:1fl11!
n:*:l"lt"";;lt;"rtlT:l
schema of the form (though in different notation)
BELIEFSYSTEMS
(Ba(P * 8) n BeP) -t BaQ, although their commentary suggests that such schemata are really of the form Bs (Ba (P - 8) A BaP) -+ BsBaQ.Knowledge is defined as true belief: Ke,P : (P A BaP), interpreted as BSKAP if and only if Bs(S and A agree that P). Knowing-whether and knowingwho are defined as follows: KNOWIFAP - (P n BAP) V (-P A Bo-'p;. KNOWREF4P - 3y[y - the r such that D(x) A Ba (y - the r such that D(x))1. There are also numerous rules relating these forms of belief and knowledge to wants and actions. Other theories include those of Allen, Sidner, and Israel. Allen (69) continued this line of research, embedding it in a theory of action and time; here, BELIEVES(A, p, To, Tu) is taken to mean that A believes during time interval 'io that p holds during time interval Te. Sidner and Israel (20) and Sidner (71) attack similar problems, treating the "intended meaning" of utterance Uby speaker S for hearer H asa set of pairs of propositional attitudes (beliefs, wants, intentions, etc.) and propositional "contents" that are such that S wants F/ to hold the attitude toward the content by means of u. Mutual Belief. The problems of mutual belief and mutual knowledge, notions generally acceptedto be essential to research programs such as these, are most clearly stated by Clark and Marshall (72). They raise a paradox of mutual knowledge: To answer a successful definite reference by speaker ,Sto hearer H that term / refers to referent R, edoubly infinite sequenceof conditions must be satisfied:Ks(r is R), KsKn(r is R), KsKHKs(r is R), . , and KnG is R ), KHKy(/ is R ), . . But each condition takes a finite amount of time to check, yet successful reference does not require an infinite time. Their solution is to replace the infinite sequencesby mutual knowledge defined in terms of "copresence":s and I/ mutually know that f is R if and only if there is a state of affairs G such that S and I/ have reason to believe that G holds, G indicates to them that they have such reason, and G indicates to them that f is R. Typically, G will be either (1) community membership (i.e., shared world knowledge),for example, when f is a proper name; (2) physical copresence(i.e.,a shared environment), for example, where t is an indexical; or (3) linguistic copresence(i.e.,a shareddiscourse),for example, where / is anaphoric (seeRef. 78 for a critique.) Mutual knowledge has been further investigated by Appelt (4,74) and Nadathur and Joshi (75). Appelt's planning system is an intellectual descendantof the work of Allen, Cohen, perrault, and Moore. It reasonsabout A's and B's mutual knowledge by reasoning about the knowledge of a (virtual) agent_ the "kernel"-whose knowledge is characterizedby the union of sets of possibleworlds that are consistentwith A's and B,s knowledge. Nadathur and Joshi replace Clark and Marshall's (72) requirement of mutual knowledge for successfulreference by a weaker criterion: if S knows or believes that // knows or believes that / is R, and if there is no reason to doubt that this is mutual knowledge, then S conjectures that it is mutual knowledge. This is made precise by using Konolige's KI4 to formulate a sufficient condition for S's usin g t to refer to R. Other Theories.Other formal psychologicalheuristic work has been done by Taylor and Whitehiil (26) on deception and by Airenti et al. (77) on the interaction of belief witl conceptual and episodic knowledge.
69
More Psychologicalthan Formal Wilks and Bien. The various logics of nested beliefs in general and of mutual beliefs in particular each face the threat of infinite nestings or combinatorial explosions of nestings. Wilks and Bien (10,20)have attempted to deal with this threat by using what might be called psychologicalheuristics. Their work is based on Bien's (78) approach of treating naturallanguage utterances as programs to be run in "multiple environments" (one of the earliest forms of belief spaces):a global environment would represent a person P, and local environments would represent P's models of his or her interlocutors. The choice of which environment within which to evaluate a speaker'sutterance [/ dependson P's attitude toward the discourse:if P believes the speaker,then U would be evaluated in P's environment, else inP's environments for the speakerand hearer. Wilks and Bien use this technique to provide an algorithm for constructing nested beliefs, given the psychological reality of processinglimitations. They offer two general strategies for creating environments: (1) "presentation', strategies determine how deeply nested an environment should il to represent information about someone.The "minimal,, presentation stratery, for simple cases,constructs a level onlyfor the subject of the information but none for the speaker;the "standard" presentation stratery constructs levels for both speaker and subject; and "reflexive" presentation strategies construct more complex nestings. (2) "Insertional" strategies determine where to store the speaker's information about the subject; for example, the "scatter gun" insertion strategy would be to store it in all relevant environments. A local environment is representedas a list of statements indexed by their behavior and nested within a relatively global environment: A{B} representsA's beliefs about B, A{B{c}-} representsA,s beliefs about B's beliefs about C. Supposea USER informs the SYS"EM about personA. To interpret the USER's utterance, a nested environment within which to run it is constructed, only temporarily, as follows: SYSTEM{Ai and SYS"E Ili[{usERI are constructed, and the former is "pushed down into,, the latter to produce SYSTEM{usnn{A}1.p,rctting is done according to several heuristics: (1) "Contradiction,'heuristics:The SySTEM's beliefs about the USER's beliefs about A are assumed to be the SYSTEM's beliefs about A unless there is explicit evidenceto the contrary.(2) Pragmatic inference rules change some of the sys?EM's beliefs about A into the sysTEM,s beliefs about A's beliefs about A. (B) ,,Relevance',heuristics: Those of the SYSTEM's beliefs about the USER's beliefs that explicitly mention or describe A "become part of the SySTEM's beliefs about A. (4),.percolation,,heuristics: Beliefs in sys"E Mtus'Rtolt th.t are not contradicted remain in sysTEM{A} when the temporary nested environment is no longer neededfor evaluation purposes.Thus, percolation seemsto be a form of learning by means of trustworthiness, though there is no memory of the source of the new beliefs in SyS 7gy{el after percolation has occurred; that is, the SYSTEM changes its beliefs about A by merely contemplating its beliefs about the USER's beliefs. Other difficulties concern"self-embedded,, beliefs: In SySTEM{svs?EM},there are no beliefs that the SYS"EM has about the SyS?E M thatare not its own beliefs, but surely a SYS TEM might believe things that it does not believe that it believes;and there are potential problems about quasi-indicators when SYS TEM{A} is pushed down into itself to produce SYS"E 114vtAtt. Colby. Although the work of Wilks and Bien has a certain
70
BETIEFSYSTEMS
formality to it, they are not especially concernedwith the explicit logic of a belief operator, an accessibility relation, or a formal logic. The lack of concern with such issues may be taken to be the mark of the more psychological approaches. The pioneers of this approach were Colby and Abelson and their co-workers. Colby and Smith (19) constructed an "artificial belief system," ABS1. ABS1 had three modes of operation: During "talktime" a user would input sentences,questions, or rules; these would be entered on lists for that user (perhaps like a belief space;but seebelow). If the input were a question,ABS1 would either search the user's statement list for an answer (taking the most recent if there were more than one answer),or deducean answer from the statement list by the rules, or else generate an answer from other users' lists. During "questiontime" ABS1 would searchthe user's statement list for similarities and ask the user questions about possiblerules; the user's replies would enable ABS1 to formulate new rules. ABS1 would also ask the user's help in categorizing concepts.During "thinktime" ABSL would infer new facts (assignedto a "self"list) and compute "credibility" weightings for the facts, rules, and user. It should be noted that beliefs in this system are merely statements on a user's list, which makes this approach seem very much like the database approach criticized by Moore (2). Moore's objections are as follows: (1) If the system does not know which of two propositionsp or g a user believes,then it must set up two databasesfor the user, one containing p and one containing g, leading to combinatorial explosion. (2) The system cannot represent that the user doesnot believe that p, since neither of the two database alternatives-omitting p or listing -'p-is an adequate representation. Although these are serious probleffis, Colby and Smith's ABSI s€€trIsnot to have them. First, ABS1 only reasons about explicit beliefs; thus, it would never have to represent the problematic cases. Of course,a more psychologically adequatesystem would have to. Second,ABS1 doesnot appear to reason about the fact that a user believes a statement but only about the statement and ABSl's source for its believing the statement. In Colby (79) a belief is characterized as an individual's judgment of acceptance,rejection, or suspendedjudgment toward a conceptual structure consisting of concepts-representations of objectsin spaceand time, together with their properties-and their interrelations. A statement to the effect that A believesthat p is treated dispositionally (if not actually behavioristically) as equivalent to a series of conditionals asserting what A would say under certain circumstances. More precisely, "U BelieveE'C, t" if and only if experimenter E takes the linguistic reaction (i.e., judgment of credibility) of language user U ta an assertion conceptualizedas C as an indicator of U's belief in C during time ?. Thus, what is represented are the objects of a user's beliefs, not the fact that they are believed. Various psychologically interesting types of belief systems (here understood as sets of interacting beliefs)-neurotic, paranoid, and so on-can then be investigated by "simulating" them. The most famous such system is Colby's PARRY (80,81),which has been the focus of much controversy [see Colby (82) and Weizenbaum's (83) critiquel. Abelson.A similar research program has been conducted by Abelson and co-workers (L2,15).Underlying their work is a theory of "implicational molecules," that is, sets of sentences that "psychologically" (i.e., pragmaticallY) imply each other;
for example, a "purposive-action" molecule might consist of the sentence forms "person A does action X," '.X causesoutcome Y," and "A wants F." The key to their use in a belief system is what Abelson and Reich consider a Gestalt-like tendency for a person who has such a molecule to infer any one of its members from the others. Thus, a computer simulation of a particular type of belief system can be constructedby identifying appropriate molecules,letting the system'sbeliefs be sentences connected in those molecules (together with other structures, such as Schank's "scripts") and then having the system understand or explicate input sentencesin terms of its belief system. A model of a right-wing politician was constructed in this manner [see also the discussionsof Colby's as well as Abelson'swork in Boden (84)1. U serModels. An extended,databasetype of belief system is exemplified by user models such as those investigated by Rich (7,8).Here, instead of the system being a model of a mind, the system must construct a model of the user's mind, yet many of the techniques are similar in both cases.A user model consists of properties of the user ("facts") ranked in terms of importance and by deglee of certainty (or confidence)together with theirjustifications. The facts comefrom explicit user input and inferencesbased on these, on "stereotypes"(so that only minimal explicit user input is needed), and on the basis of the user's behavior (sothat the model is not merely the user's selfmodel). The user model is built dynamically during interaction with the user. Discussionand Conclusions If there is any criticism to be leveled at the wide variety of current research, it is that the formal systems have not been sufficiently informed by psychology (and, hence, behave more like logicians than like ordinary people),and the psychological theories have not been flexible enough to handle some of the logical subtleties (which ordinary people, perhaps with some instruction, are certainly capable of). What is neededis a robust system whose input-output performance (if not the intervening algorithms) is psychologically plausible but whose underlying logic is competent,if needed,to handle the important (if often ignored) formal subtleties. In spite of radically differing approachesand terminolory, it seemsclear that AI research into belief systemssharescommon issues and goals. This can be brought out by discussing Abelson's (85) characterization of a belief system.For Abelson, a "system" is a "network of interrelated conceptsand propositions" and rules, with proceduresfor accessingand manipulating them. Such a system is a "belief system" if: 1. The system's elements are not consensual. This can be taken, perhaps, either as a rejection of Bp - p or as Wilks and Bien's heuristics. By contrast, a "knowledge system" would be consensual.Abelson urges that 1 be exploited by AI belief systems even though it makes them nongeneralizable. 2. The system is concerned with existence questions about certain conceptual objects. The need to have a logic of the intensional objectsof belief may be seen as a version of 2, even though 1 and 2 make it difficult to deal with beliefs that are held in common.
BELIEFSYSTEMS
3. The system includes representations of "alternative worlds." 2.
This desideratum may be taken as covering the notions of possible worlds and of nested and mutual beliefs. 4. The system relies on evaluative and affective components. 5. The system includes episodic material. A "knowledge system" would rely more on general knowledge and principles. Clearly, though, a full system would need both. 6. The system's boundaries are vague. 7. The system's elements are held with different degrees of certitude. Although these criteria are psychologically oriented, many of them are also applicable to formal approaches.In particular, 1-3 and 7 are relevant to logical issues; 4-7 are relevant to psychological issues. Indeed, except for the choiceof underlying logic, most of the systems discussed here seem compatible, their differences arising from differences in aim and focus. For instance, Abelson and Reich's implicational molecules could be among the z rules in Konolige's system. Note that the rules do not have to be "logical" if they do not need to be consistent;moreover, as mentioned earlier, there might not be any (psychologically plausible) logic of belief. As a consequence,a psychologically plausible belief system, whether "formal" or not, must be able to deal with incompatible beliefs. This could be done by a belief revision mechanism or by representational or reasoning techniques that prevent the system from becoming "aware" of its inconsistencies(with, of course,occasionalexceptions,as in real life). It is, thus, the general schemesfor representation and reasoning that seemmost important and upon which, as a foundation, specific psychological heuristics may be built. In this w&y, too, it may be possibleto overcomethe computational complexity that is inevitably introduced when the underlying inference package is made to be as powerful as envisagedby, say, Konolige or when the underlying representational schemeis made to be as complete as proposedby, say, Rapaport and shapiro. A psychologicallyadequate"shell" that would be efficient at handling ordinary situations could be built on top of a logically adequate "core" that was capable of overriding the shell if necessaryfor correct interpretation. The trade-offs between psychological and logical adequacy that have been made in most current systemscan, in prin.ipt", be overcome. (They have, after all, been overcome in those humans who study the logic of belief yet have not been hindered from interacting in ordinary conversational situations.) Whether it is more feasible to make a formally adequate system psychologically adequate or to "teach" a psychologically adequate system to be logically subtle remains an interesting research issue.
BIBLIOGRAPHY 1. J. McCarthy and P. J. Hayes, "Some philosophical problems from the standpoint of artificial intelligence," in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, pp. 46s *s02, 1969, reprinted in B. L. webber
3.
45. 6. 7. 8.
71
and N. J. Nilsson (eds.),Readings in Artift,cial Intelligence,Tioga, Palo Alto, CA, pp. 431-450, 1981. R. C. Moore, "Reasoningabout knowledge and action," Proc. of the Fifth IJCAI, Cambridge, MA, 223-227 Q977); reprinted in B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial Inteltigence,Tioga, Palo Alto, CA pp. 478-472, 1981. P. R. Cohen and C. R. Perrault, "Elements of a plan-basedtheory of speechacts," CognitiueScienceg, r77 -zr2 (1979);reprinted in B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial IntelIigence,Tioga, Palo Alto, CA, pp. 428-495,1981. D. E. Appelt, "A planner for reasoning about knowledge and action," Proc. of the First AAAI, Stanford, CA, 131-133 (1980). J. McCarthy, "Epistemologicalproblems of artificial intelligence," Proc. of the Fifth IJCAI, Cambridge, MA, 10gg-1044 (1977). K. Konolige and N. J. Nilsson, "Multiple-agent planning systems," Proc. of the First AAAI, stanford, cA, 1Bg-144, 1gg0. E. Rich, "Building and exploiting user models," Proc. of the Sixth IJCAI, Tokyo, Japan, 720-722, Ig7g. E. Rich, "LJser modeling via stereotyp€s," Cognitiue Science g, 329_354 (1979).
9. J- McCarthy, "First-order theories of individual concepts and propositions,"in J. E. Hayes, D. Michie, and L. I. Mikulich (eds.), Machine Intelligence, Vol. 9, Ellis Horwood, Chichester, pp. 12gL47, 1979. 10. Y. Wilks and J. Bien, "speech acts and multiple environments," Proc. of the sixth IJCAI, Tokyo, Japan, 96g-920, 1gzg. 11. H. J. Levesque,"Foundations of a functional approach to knowledge representationi' Artif. Intell. 28, Ls5-2r2 (lgg4). 12. R. P. Abelson and C. M. Reich, "Implicational molecules: A method for extracting meaning from input sentenc€s,"Proc. of the First IJCAI, Washington, D.C., 64I-642, 1969. 13. J. Y. Halpern and D. A. McAllester, Likelihood, Probability, and Knowledg", IBM ResearchReport RJ 4B1g (47L4L),19g4; shorter version in Proc. of the Fourth AAAI, LBT-L4r,1994. 14. K. Konolig€, "A deductive model of belief," Proc. of the Eighth IJCAI, Karlsruhe, FRG, BT7-991, 1gg3. 15' R. P. Abelson, "The structure of belief systems," in R. C. Schank and K. M. colby (eds.), computer Mod,ers of rhought and Language, w. H. Freeman, san Francisco, cA, pp. igz-ggg, 1973. 16' A' S. Maida and S. C. Shapiro, "fntensional conceptsin propositional semantic networks," cognitiue science 6, zgl-gilo irggzl. 17- L. G. Creary, "Propositional attitudes: Fregean representation and simulative reasoning," proc. of the sixth IJCAi, Tokyo, Jap&tr,176-181,1g7g. 18' A. S. Maida, "Knowing intensional individuals, and reasoning about knowing intensional individuals," Proc. of the Eighth IJCAI, Karlsruhe, FRG, 382-894, 1ggg. 19' K' M. Colby and D. C. Smith, "Dialoguesbetweenhumans and an artificial belief system," Proc. of the First IJCAI, Washington, D.C., 319_324,1969. 20' Y' Wilks and J. Bien, "Beliefs, points of view, and multiple environments," cognitiue scienceT, gs-116 (1gg3). 2L' E. L. Gettier, "Is justified true belief knowledge?," Analysis 28, L2t-I23 (1963);reprinted in A. P. Griffiths (ed.),Knowled,ge and Belief, oxford university press, oxford, 1962. 22. J. H. Fetzer, "on defining'knowledge,,,,AI Mag.6, 19 (spring 1985). 23' J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell university press, Ithaca, Ny, 1962. 24' J- Hintikka, "semantics for propositional attitudes," in J. W. Davis et al- (eds.),Philosophicq,I Logic,D. Reidel,Dordrecht,1969, pp. 2L-45, reprinted in ref. 2g, pp. L4S,L6T.
72
BELIEFSYSTEMS
25. H.-N. Castafleda,Review of Ref. 28, J. Symbolic Logic zg, LBz* 134 (1964).
46. W. J. Rapaport, "Logical Foundations for Belief Representation," Cognitiue Science10, 37L-422 (1936).
26. M. Sato, A Study of Kripke-Type Models for Some Modal Logics by 47. R. C. Moore, Reasoning about Knowledge and Action, Technical Gentzen'sSequential Method, Kyoto University Research InstiNote No. 191, SRI International, Menlo Park, CA, 1980. tute for Mathematical Sciences,Kyoto, Lg7G. 48. K. Konolige, "Circumscriptive ignorance," Proc. of the Second 27. J. McCarthy, M. Sato, T. Hayashi, and S. Igarashi, On the Mod,el AAAI, Pittsburgh, PA, 202-204, L982. Theory of Knowledge, Stanford Artificial Intelligence Laboratory 49. A. Kobsa and H. Trost, "Representing belief models in semantic Memo AIM-312, Stanford University, 1978. networks," Cybern Sys.Res.2,753-757 (1984). 28. L. Linsky (ed.),Referenceand Modality, Oxford University Press, 50. A. Kobsa, "VIE-DPM: A user model in a natural-language diaOxford, 1977, correctededition. logue system,"in Proc. Bth Germa,nWorkshopon Artificial Intelti29. P. Edwards (ed.),Encyclopediaof Philosophy, Macmillan and Free gence,Berlin, 1984. Press,New York, 1967. 51. A. Kobsa, "Three stepsin constructing mutual belief models from 30. B. H. Partee, "The semanticsof belief-sentences," in K. J. J. Hinuser assertiors," in Proc. 6th European Conferenceon Artificial tikka, J. M. E. Moravcsik, and P. Suppes (eds.),Approaches to Intelligence, Pisa, Italy, 1984. Natural Language: Proceedingsof the 1970 Stanford Workshopon 52. A. Kobsa, "Generating a user model from wh-questions in the Grammar and Semantics, D. Reidel, Dordrecht, pp. B09-BBO, VIE-LANG system,"inProc. GLDV Meeting on Trends in Linguis1973. tischer D atenuerarbeitung,1984. 31. B. H. Partee, "Belief-sentencesand the limits of semantics,"in S. 53. M. Xiwen and G. Weide, "W-JS: A modal logic of knowledge," Peters and E. Saarinen (eds.),processes,Beliefs, and euestions: Proc. of the Eighth IJCAI, Karlsruhe, FRG, 398-401, 1989. Essays on Formal Semantics of Natural Language and Natural 54. S. Soulhi, "Representingknowledge about knowledge and mutual LanguageProcessing,D. Reidel, Dordrecht, pp. 87-106, 1982. knowledg"," Proc. COLING, 194-199, 1984. 32. J. Moravcsik,"Commentson Partee'spaper," in K. J. J. Hintikka, 55. J. McCarthy, "Circumscription-A form of non-monotonicreasonJ. M. E. Moravcsik, and P. Suppes (eds.),Approaches to Natural irg," Artif. Intell. 13,27-39 (1930). Language: Proceedingsof the 1970 Stanford Workshop on Gram56. R. C. Moore, "Problems in logical form," Proc. ACL 19, 1L7-L24 mar and Semantics,D. Reidel, Dordrecht, pp. 349-369, 1973. (1e81). 33. R. C. Moore and G. G. Hendrix, "Computational models of belief 57. A. Konolige, Belief and Incompleteness,CSLI Report No. CSLIand the semantics of belief sentenc€s," in S. Peters and E. 84-4, Stanford University, 1984. Saarinen (eds.),Processes, Beliefs, and Questions.'Essays on For58. H. J. Levesque, "The interaction with incomplete knowledge mal Semantics of Natural Language and Natural Language Probases: A formal treatment," Proc. of the SeuenthIJCAI, Vancoucessing,D. Reidel, Dordrecht, pp. 107-I27 , L982. ver, Brit. Col., 240-245, 1981. 34. W. J. Rapaport, "Meinongian theories and a Rusellian paradox," 59. H. J. Levesque,"A logic of implicit and explicit belief," Proc. of the Nor2st2, 153-180 (1978);errata, 13, 125 (1979). Fourth AAAI, Austin, TX, 198-202, 1984. 35. G. Frege,"On senseand reference"(1892),translated by M. Black in P. Geachand M. Black (eds.), Translations from the Philosophi- 60. A. R. Anderson and N. D. Belnap, Jr., Entailment: The Logic of Releuanceand Necessify,Princeton University Press, Princeton, cal Writings of Gottlob Frege, Basil Blackwell, Oxford, U.K., pp. NJ, 1975. 56-78, 1970. 61. M. Nilsson, "A logical model of knowledg"," Proc. of the Eighth 36. R. C. Moore, "D-SCRIFT: A computational theory of descriptions," IJCAI, Karlsruhe, FRG, 37 4-376, 1983. Proc. of the Third IJCAI, Stanford, CA, 223-229, L973. 62. J. Y. Halpern and Y. Moses,Knowledgeand commonknowledgein 37. J. A. Barnden, "fntensions as such: An outline ," Proc. of the a distributed enuironment,IBM ResearchReport RJ 4421(47909) Eighth IJCAI, Karlsruhe, FRG , 280-286, 1983. 1984. 38. G. G. Hendrix, "Encoding knowledge in partitioned networks," in 63. J. Y. Halpern, Towards a theory of knoutledge and ignorance: N. V. Findler (ed.), AssociatiueNetworks, Academic Press, New preliminary report, IBM Research Report RJ 4448 (48136) York, pp. 51-92, L979. 1984. 39. P. F. Schneider,"Contextsin PSN," Proc.CSCS/,3,71-78 (1980). 64. J. Y. Halpern and M. O. Rabin, A logic to reasonabout likelihood, 40. A. R. Covington and L. K. Schubert, "Organization of modally IBM ResearchReport RJ 4136 (45774)1983. embeddedpropositionsand of dependentconcepts,"Proc. CSCS/, 3 analy65. R. Fagin, J. Y. Halpern, and M. Y. Vardi, A model-theoretic -94 (1980). 87 sis of knowledge: preliminary report, IBM Research Report RJ 4I. R. E. Filman, J. Lamping, and F. S. Montalvo, "Meta-language 4373 @7631)1984; also in Proc. 25th IEEE Symposium on Founand meta-reasoning," Proc. of the Eighth IJCAI , Karlsruhe, FRG, dations of Computer Science,1984. 365-369,1983. (ed.),Philosophy (1904),in R. Haller (ed.), 66. J. R. Searle,"What is a speechact?,"in M. Black 42. A. Meinohg, "Uber Gegenstandstheorie" in America, Allen and unwin, London, pp. 221-239, 1965; reAlexius Meinong Gesamtausgabe,Vol. 2, Akademische Druck- u. printed in J. R. Searle (ed.), The Philosophy of Language, Oxford Verlagsanstalt, Graz, pp. 481-535, 1971. English translation University Press,Oxford, pp. 39-53, 1971. (ed.), ("The Theory of Objects")by I. Levi et al., in R. M. Chisholm 67. P. R. Cohen and H. J. Levesque,"Speechacts and the recognition Realism and the Background of Phenomenology,Free Press, New of shared plans," CSCS/ 3, pp. 263-271, 1980. York, pp. 76-116, 1960. F. Allen and C. R. Perrault, "Analyzing intention in utter68. J. 43. H.-N. Castafleda,"Thinking and the structure of the world," Phiances,"Artif. Intell. 15, 143-L78 (1980). Iosophia,4,3-40 (L974).Originally written in L972;reprinted in 69. J. F. Allen, "Towards a general theory of action and time," Artif. L975in Critica 6,43-86 (L97D. Intell. 23, 123-154 (1984). propositional seman44. W. J. Rapaport, "Meinongian semantics for 70. C. L. Sidner and D. J. Israel, "Recognizrngintended meaning and tic networks," Proc. ACL,23, 43-48 (1985). speaker's plans," Proc. of the Seuenth IJCAI, Vancouver, Brit. 45. W. J. Rapaport and S. C. Shapiro, "Quasi-indexical reference in Col., 203-208, 1981. propositional semantic networks," Proc. COLING-94, 65-70, 71. C. L. Sidner, "What the speaker means:The recognition of speak1984,
BLACKBOARDSYSTEMS ers' plans in discourse,"in N. Cercone(ed.),ComputationalLinguistics, PergamonPress,Oxford, pp. 7I-82, 1983. 72. H. H. Clark and C. R. Marshall, "Definite reference and mutual knowledge,"in A. Joshi, B. Webber,and I. Sag (eds.),Elementsof Discaurse (Jnderstanding, Cambridge University Press, Cambridge,U.K., pp. 10-63, 1981. 73. C. R. Perrault and P. R. Cohen, "It's for your own good:A note on inaccurate reference,"in A. Joshi, B. Webber, and I. Sag (eds.), Elements of Discourse Understanding, Cambridge University Press,Cambridge,U.K., pp. 2t7 -230, 1981. 74. D. E. Appelt, "Planning natural-language utterances," Proc. AAAI, Pittsburgh, PA, 59-62, 1982. 75. G. Nadathur and A. K. Joshi, "Mutual beliefs in conversational systems:Their role in referring expressions,"Proc. of the Eighth IJCAI, Karlsruhe, FRG, 603-605, 1983. 76. G. B. Taylor and S. B. Whitehill, "A belief representationfor understanding deceptior," Proc. of the SeuenthIJCAI, Vancouver, Brit. Col.,388-393, 1981. 77. G. Airenti, B. G. Bara, and M. Colombetti,"Knowledgeand belief as logical levels of representation,"Proc. Cogn. Sci. Soc. 4, 2L22L4 (1982). 78. J. S. Bieri, "Towards a multiple environments model of natural language," Proc. of the Fourth IJCAI, Tbilisi, Georgia, 379-382, 1975. 79. K. M. Colby, "simulations of belief systems,"in R. C. Schank and K. M. Colby (eds.),Computer Models of Thought and Language, W. H. Freeman,San Francisco,CA, pp. 251-286, 1973. 80. K. M. Colby, S. Weber, and F. Dennis Hilf, "Artificial paranoia," Artif. Intell. 2, I-25 (1971). 81. K. M. Colby, F. D. Hilf, S. Weber, and H. C. Kraemer, "Turinglike indistinguishability tests for the validation of a computer Artif. Intell.3, 199-22L (L972). simulation of paranoid processes," 82. K. M. Colby, "Modeling a paranoid mind," Behau.Brain Sci.4, 5 1 5 - 5 6 0( 1 9 8 1 ) . 83. J. Weizenbaum,"Automating psychotherapy,"ACM Forum, L7, 543 (I974); reprinted with replies, CACM 26,28 (1983). 84. M. Boden, Artificial Intelligence and Natural Man, Basic Books, New York, 1977. 85. R. P. Abelson, "Differences between belief and knowledge systems," CognitiueScience3,355-366 (1979). 86. B. C. Bruce, Belief Systemsand Language Understanding, BBN Report No. 2973, 1975. 87. Terence D. Parsons, "Frege's hierarchies of indirect sensesand the paradox of analysis," in P. A. French et al. (eds.),Midwest Studiesin Philosophy6,3-57 (1981). W. J. Reppeponr SUNY Buffalo
BELLE A chess-playing system (see Computer chessmethods) developed at Bell Laboratories by Joe Condon and Ken Thompson, BELLE won the World Computer Chess Championship in 1983 and was rated at the master level. The system contains specialrzedhardware[seeP. Frey (ed.),ChessSkill in Man and Machine, Springer-Verlag 2F,,New York, 19831.
73
SYSTEMS BLACKBOARD Blackboard systems are domain-specificproblem solving (qv) systemsthat exploit the blackboard architecture and exhibit a characteristically incremental and opportunistic problem solving style. The blackboard architecture was developedby Erman, Hayes-Roth,Lesser,and Reddy (1) for the HEARSAY-II speech understanding system. Since then, it has been exploited in a wide range of knowledge-basedsystems(2-9) (see Expert systems) and psychological simulations (10-L4). Four illustrative blackboard systems, HEARSAY-II, HASP, CRYSALIS, and OPM, and important architectural variations they introduce, are described below. Three blackboard systembuilding environments, HEARSAY-II, AGE, and BB1, are also described. MotivatingObjectivesfor the BlackboardArchitecture The blackboard architecture was designed to achieve several objectives that emerged in the HEARSAY-II speech-understanding project and reappear in a broad range of problemsolving domains: 1. To reduce the combinatorics of search (qv): Even with a restricted vocabulary and domain of discourse,the speechunderstanding problem entailed a spaceof utterances too large for conventional search techniques. 2. To incorporate diverse sorts of knowledge in a single problem-solving system: The speech-understanding problem brought with it several sorts of knowledge (e.g., syntax, phonetics, word transition probabilities) but no method for integrating them in a single program. 3. To compensatefor unreliability in the available knowledge: Much of the available speech-understandingknowledge was heuristics (qv). 4. To compensatefor uncertainty in the available data: The acoustic signal for speechis inherently ambiguous, occurs against a noisy background, and incorporates idiosyncracies in the speaker's articulation, diction, graffimar, and conceptualization of utterances. 5. To apply available knowledge intelligently in the absence of a known problem-solving algorithm: Much of the available speech-understandingknowledge was simultaneously applicable, supporting multiple potential inferences from each intermediate problem-solving state but providing no known algorithm to guide the inference process. 6. To support cooperative system development among multiple system builders: Approximately sevenindividuals cooperated to design and implement HEARSAY-II. 7. To support system experimentation, modification, and evolution: Because HEARSAY-II was an experimental research effort, all aspects of the system evolved gradually over a period of several years. The BlackboardArchitecture:Defining Featuresand CharacteristicBehavior
K. S. Anone SUNY at Buffalo
BIT-MAP DISPIAY. See Visual-depth map.
Defining Features.The blackboard architecture has three defining features: a global database called the blackboard, independentknowledge sourcesthat generate solution elements on the blackboard, and a scheduler to control knowledge
74
BLACKBOARDSYSTEMS
source activity. These features are described directly below and illustrated with examples from HEARSAY-II. HEARSAY-II is discussedin more detail in a later section. All solution elements generatedduring problem solving are recorded in a structured, global database called the blackboard. The blackboard structure organizes solution elements along two axes, solution intervals and levels of abstraction. Different solution intervals represent different regions of the solution on someproblem-specificdimension, for example, different time intervals in the speechsignal. Different levels of abstraction represent the solution in different amounts of detail, for example,the phrases,words, and syllables entailed in the speechsignal. Solution elements at particular blackboard locations are linked to supporting elements in the same solution interval at lower levels. For example, the phrase "Are any by Feigenbaum and Feldman" in interval L-225 in the speech signal might be supported by the word "Feigenbaum" in interval 70-150 and the syllable "Fa" in interval 70-95. Solution elements are generated and recordedon the blackboard by independent processescalled knowledge sources. Knowledge sourceshave a condition-action format. The condition describes situations in which the knowledge source can contribute to the problem-solving process.Ordinarily, it requires a particular configuration of solution elements on the blackboard. The action specifiesthe knowledge source'sbehavior. Ordinarily, it entails the creation or modification of solution elements on the blackboard. Only knowledge sources whose conditions are satisfied can perform their actions. For example, the knowledge sourceMOW's condition requires the appearance of new syllable hypotheses on the blackboard. MOW's action generates new word hypothesesencompassing sequential subsets of the syllables. Knowledge sourcesmay exploit both top-down and bottomup inference methods (see Processing, bottom up and top down). For example, MOW generates new word hypotheses bottom up by integrating syllable hypotheses.The knowledge sourcePREDICT generates new word hypothesestop down by extending phrase hYPotheses. Knowledge sources are independent in that they do not invoke one another and ordinarily have no knowledge of each other's expertise, behavior, or existence.They are cooperative in that they contribute solution elements to a shared problem. They influence one another only indirectly, by anonymously responding to and modifying information recorded on the blackboard. Although implementations vary, in most blackboard systems knowledge source activity is event driven. Each change to the blackboard constitutes an event that in the presenceof specific other information on the blackboard can trigger (satirfy the condition of) one or more knowledge sources.Each such triggering produces a unique knowledge source activation record (KSAR) representing a unique triggering of a particular knowledge source by a particular blackboard event. Because several KSARs may be triggered simultaneously and compete to execute their actions, a scheduler selects a single fbnn to execute its action on each problem-solving cycfe. The scheduler may use a variety of criteria, such as the credibility of a KSAR's triggering information, the reliability of its knowledge source,or the importance of the solution element it would generate.When a KSAR is scheduled,its knowledge source action executes in the context of its triggering information, typically producing new blackboard events.
These events may trigger knowledge sources, creating new KSARs to compete for scheduling priority with previously triggered, not yet executed KSARs (see Agenda-basedsystems). CharacteristicBehavior.Blackboard systemsconstructsolutions incrementally. On each problem-solving cycle a single KSAR executes, generating or modifying a small number of solution elements in particular blackboard locations. Along the way some elements are assembled into growing partial solutions; others may be abandoned.Bventually a satisfactory configuration of solution elements is assembled into a complete solution, and the problem is solved. Blackboard systems apply knowledge opportunistically. On eachproblem-solving cycle the schedulerusesa set of heuristic criteria to select a KSAR to execute its action. Depending on the heuristics available to the scheduler, this may produce a more or less orderly approach to solving the problem. At one extreme the scheduler may follow a rigorous procedure,scheduling a planned sequenceof KSARs that monotonically assemble compatible solution elements. At the other extreme it may apply many conflicting heuristics that are extremely sensitive to unanticipated problem-solving states, scheduling KSARs that assemble disp arate, competing solution elements out of which a complete solution only gradually emerges. The BlackboardArchitecture'sApproachto the Obiectives Each feature of the blackboard architecture is designedto address one or more of the seven objectivesintroduced above. 1. To reduce the combinatoricsof search:First, the blackboard architecture integrates reasoning (qv) at multiple levels of abstraction. An application system can solve a simplified version of a problem and then use that solution to guide and limit exploration of a larger space of more detailed solutions (15,16). Second,the blackboard architecture provides independent knowledge sourcesand opportunistic scheduling. As a consequence,an application system can generate and merge independent solution "islands," potentially reducing the search spacedramatically (17,18). Z. To incorporate diverse sorts of knowledge in a single problem-solving system: The blackboard architecture preserves the distinctions among knowledge sources.It permits different knowledge sourcesto embody qualitatively different sorts of expertise, applying idiosyncratic processesto idiosyncratic representations. It permits them to operate indeplndently, contributing solution elements when and where lft.y can. Thus, the blackboard architecture finesses the ptotl"m of integrating different sorts of knowledge per se. instead, it integrates the results of applying different sorts of knowledge. g. To compensatefor unreliability in the available knowledge: The blackboard architecture permits multiple knowledge sourcesto operate redundantly upon the same subproblem. An application system can combine the implications of several unreliable, but redund ant knowledge sourcesto converge upon the most credible solution elements' 4. To compensate for uncertainty in the available data: The blackboard architecture permits different knowledge sourcesto embody top-down and bottom-up inference meth-
BLACKBOARDSYSTEMS
ods.An application system can exploit top-down knowledge sourcesto prune solution elements generatedby bottom-up knowledge sources operating upon uncertain data. Conversely, it can exploit bottom-up knowledge sources to prune solution elements generated top down from uncertain expectations(seeProcessing,bottom up and top down). b. To apply available knowledge intelligently in the absence of a known problem-solving algorithm (see Problem solving): The blackboard architecture provides an opportunistic scheduler that decides, on each problem-solving cycle, which potential action is most promising. The scheduler can integrate multiple, heuristic scheduling criteria. Its decisions depend on the available criteria and the current problem-solving situation. 6. To support cooperative system development among multiple system builders: The blackboard architecture permits functionally independent knowledge sources.Once a blackboard structure and representation of solution elements have been agreed upon, individual system builders can design and develop knowledge sourcesindependently. 7. To support system modification and evolution: First, the blackboard architecture permits functionally independent knowledge Sources,which can be added,removed, or modified individually. Second,the architecture makes a sharp distinction between domain knowledge and scheduling (see Domain knowledge). Modifications to knowledge sources need not affect the scheduler.Conversely,experimentation with different scheduling heuristics need not affect any knowledge sources.
Four lllustrativeBlackboardSystems This section describesfour blackboard systems:HEARSAY-II (1), HASP (2), CRYSALIS (3), and OPM (4). These systems illustrate the range of problems attacked within the blackboard architecture and important variations on the architecture's major components. HEARSAY-II.HEARSAY-II interprets single spoken sentencesdrawn from a 1000-wordvocabulary that request information from a database.As discussedabove,it operateson an ambiguous signal in the presenceof acousticnoise complicated by idiosyncracies in the vocabulary, syntax, pronunciation, and conceptual style of individual speakers. Given training with a speaker'svoice, HEARSAY-II interprets requests with 90Voaccuracy in a factor of 10 of real time. HEARSAY-II begins with a parameterized representation of the speechsignal and attempts to generate a coherent semantic interpretation of it. Between these two extremes, parameter and database interface, HEARSAY-II generates hypotheses at five additional levels of abstraction: segment, syllable, word, word sequence,and phrase. The blackboard's solution intervals represent different time intervals within the speech signal (see also Parsing; Phonemes;Semantics; Speechunderstanding). HEARSAY-II has 12 knowledge sources.Most knowledge sourcesoperate bottom up, inferring hypothesesat one level of abstraction from data or hypothesesat lower levels. For example, the knowledge source MOW hypothesizesall words that encompasssequential subsetsof previously generated syllable hypotheses.A few knowledge sources operate top down. For
75
example, PREDICT hypothesizesall words that might syntactically precede or follow a given phrase hypothesis. Finally, some knowledge sources operate within a single level of the blackboard. For example, RPOL rates the credibility of each new or modified hypothesis at every level. In HEARSAY-II knowledge source conditions and actions are implemented as programs. Becausethey can be very large prograffis, both condition matching and action execution are scheduted.When a blackboard event occurs at a knowledge source'sblackboard level of interest, it generatesa "condition KSAR." When the condition KSAR is scheduledfor execution, it runs the knowledge source'scondition program. If the condition program concludes successfully, it generates an "action KSAR." When the action KSAR is scheduledfor execution, it runs the knowledge source's action program and produces changeson the blackboard. HEARSAY-II pursues a two-stage strategy. During phase 1 it schedulesa sequenceof KSARs that operate bottom up until it has generated all word-level hypotheses supported by the data. During phase 2 it opportunistically schedulescompeting KSARs. However, HEARSAY-II's scheduler has no explicit representation of the two-phase strategy. It applies a uniform set of control heuristics throughout the problem-solving process.The two-phase strategy is implicit in the engineering of different knowledge sources(see also Control structures). During phase 1 three knowledge sourcesprocessthe data bottom up to the word level. The knowledge source SEG is triggered by input of data at the parameter level and hypothesizesall encompassingsegments.POM is triggered by the segment hypothesesand hypothesizesall encompassingsyllables. MOW is triggered by the syllable hypothesesand hypothesizes all encompassingword hypotheses.Each of these knowledge sourcesis triggered exactly once during phase 1, producesthe single KSAR available for scheduling on its problem-solving cycle, and generates all possiblehypothesesat its target level. Thus, although the scheduler knows nothing about phase 1, it has no alternative but to schedule SEG, POM, and MOW in sequence. During phase 2 multiple knowledge sourcesare triggered on each problem-solving cycle, accumulating in a growing list of pending KSARs. The scheduler assigns each KSAR a priority based on its required computing resources,the credibility of its triggering events, the reliability of its knowledge source, and its potential to extend high-credibility partial solutions already on the blackboard. In general, on each problem-solving cycle the scheduler selects the single, highest priority KSAR to execute its action. However, if several pending KSARs propose to extend existing hypothesesof equal credibility, the scheduler selects all of them, effecting a breadthfirst interlude in an otherwise depth-first search. Processinghalts when the system has pursued all credible partial hypothesesor when the system runs out of computing resources(time or space).In the former case the system produces the most complete and credible solution. In the latter caseit may produce several equally completeand credible partial solutions. As the first blackboard system, HEARSAY-II introduces the basic architectural features and the first specification of knowledge sources and scheduler. Regarding knowledge sources, HEARSAY-II specifies an unstructured, procedural representation for knowledge source conditions and actions. Both condition and action procedures produce KSARs for
76
SYSTEMS BLACKBOARD
pected blackboard modifications. Rules label events with the predefinedlabels used for triggering (seeRule-basedsystems). HASP's scheduler iterates a hierarchical procedurethat sequentially selectsall currently due clock events in LIFO order, sequentially selectsall confirmed expectedevents in LIFO order, and selectsthe highest priority simple event by the LIFO rule. For each selectedevent the scheduler executesa predetermined sequence of knowledge sources triggered by the event's label. HASP explains solution elements recorded on its blackboard by reviewing the sequenceof knowledge source rules that produce them. HASP introduces variations on both knowledge source specification and scheduling. Regarding knowledge source specification,HASP constrainsthe syntax of both condition and action components.The restriction of conditions to event labels provides an efficient mechanism for triggering knowledge sources. However, it requires coordination of all knowledge HASp. HASP (2) interprets sonar signals from a circum- sourcesto produce and respond to a manageably small set of for scribed area of the ocean in real time. Given the locations, event labels. The production system representation used Regarding neat. is conceptually actions hydrosource of several knowledge the outputs of descriptions ranges, and coded is phone arrays, it detects,identifies, localizes,gloups, and char- scheduling, HASP's hierarchical, event-based procedure in flexibility limits it severely but efficient, in the computationally acterizes the movement of each ship or other vessel execution. area. Some of these vesselsare friendly or neutral, and others both the selection and sequencingof KSARs for perform its must HASP In addition, are wary and elusive. CRySA[1S.CRYSALIS determinesthe spatial locationsof a interpretation against the background noise and distortions of of information, a the ocean environment. Finally, becausethe ocean sceneis protein's constituent atoms. It usestwo kinds sequenceand acid protein's amino the of going changing description and complete dynamic, with many ships coming and function that (EDM). is a EDM An map problem density its electron interpretation the their behavior, HASP must "solve" often reprecloud, electron protein's the of "snapgives density presenting the reports of series is a output Its repeatedly. or local Peaks, map. contour sh-ots"of the changing scene.These reports also contain expla- sented as a three-dimensional groups atoms, of or to atoms (see correspond EDM Milithe also in maxima, nations justifying their constituent hypotheses their of function approximate providing an peak height with tary, applications in). peaks on the low-density away Stripping signal number. sonar the of atomic representation line a with HASP begins graph structure approximating and attempts to characterize the situation it represents. Be- EDM reveals its skeleton, a groups of atoms. Finally, identifiable tween these two extremes, Line and Situation Board, HASP the connectivity among meaningful componentsof represent levels: the skeleton of segments generates hypotheses at three additional hypothesis (e.g., or side chain). Using the backbone propellers, protein structure or th; engines as such harmonics in the signal, sources of the EDM, CRYSAfeatures these and solusequence Its acid amino carriers. aircraft or submarines as and vessels such in a day. Like human protein zed tion intervals categorically distinguish different ocean re- LIS can solve a medium-si of the nonhydrogen 757o about locates it crystallographers, gions. (seealso Chemnm 8 of accuracy an protein with of Most the in sources. atims knowledge 40 HASP has approximately them operate bottoffi up, inferring hypotheses at one level of istry, AI in; Medical advice systems). CRYSALIS uses an expanded blackboard. As discussed abstraction from data or hypothesesat lower levels. For examhypotheabove,the EDM data themselvessupport hierarchical analysis ple, the knowledge source CROSS.ARRAYRULES Howindependent of any efforts to interpret them. Accordingly, the harmonics. sizes sources that encompasshypothesized confirming CRYSALIS btackboard has two separate "panels," one for the down, top operate Sources ever, Some knowledge hypotheses.Each blackboard panel emexpectationsimplicit in hypothesesat higher levels of abstrac- EDM data and one for and solution intervals. The EDM panel levels bodies different the knowledge source SouRcE.INcoRtion. For peaks, nodes,and segments'Its points, "*urnpb, EDM in levels: implicit four has are that pORATIONRULpS hypothesizessources solution intervals represent spatial location in the EDM. The vesselhypotheses. levels: atoms, superatoms (meanHASP uses a uniform condition-action syntax for all hypothesis panel has three stereotypes (larger structures' and atoms), groups of knowledge sources.Knowledge source conditions specify one ingful Its solution intervals reprebeta-sheets). or anticialpfra-helices of like classes or more predefined event labels representing protein. The blackboard the in locations spatial different systems production sent are pated blackboard events. Actions data and hypothesis related links between whoserules generate, cate gortze,and label blackboard events. permits interpanel links. vertical conventional the as Rules categorize events as simple, clock, or expected events. elements as *ell like structured are sources knowledge and CRySALIS's Simple ,rrrrrtr add or modify hypotheseson the blackboard production a and labels event predefined exploit time. They HASp's. any at sources can b" processedby triggered knowledge for actions. However' CRYSALIS proClock events also add or modify hypotheses,but they must be system ,"pr".en[ation more complex,referring to 250 semantically are rules processedat particular times. Expected events describe ex- duction
scheduling. This specification allows individual system builders to tailor appropriate representations for different knowledge sources. It permits knowledge sources to examine all blackboard contents and perform any desired computations during both triggering and action execution. On the other hand, this specification entails computationally expensive methods for triggering and executing knowledge sources.Regarding scheduling, HEARSAY-II defines a sophisticated scheduler that incorporates multiple criteria to make purely opportunistic scheduling decisions.It exhibits the power of a global control strategy and implements it in the engineering of individual knowledge sources. These specifications allow HEARSAY-II to make intelligent scheduling decisionsin the absenceof a known algorithm for speechunderstanding. However, the combination of an opportunistic scheduler and carefully engineered knowledge sources is an unprincipled approach to scheduling.
BLACKBOARDSYSTEMS
LISP functions that define a crystallographic language for manipulating data and hypotheses. CRYSALIS uses a knowledge-intensivescheduling procedure. The scheduler uses a domain-specific strategy in conjunction with global solution state to sequencedomain-specific problem-solving tasks. It uses each task, in conjunction with local solution state, to selectindividual blackboardevents.For each selected event it executes a predetermined sequenceof knowledge sourcestriggered by the selectedevent's label. CRYSALIS introduces variations on blackboard specification and scheduling. Regarding blackboard specification, CRYSALIS introduces different panels to distinguish reasoning about data from reasoning about interpretations of the data. GEARSAY-II and HASP effectively finessedthis problem by operating upon hand-coded data.) CRYSALIS introduces a domain-specificscheduling procedure.By exploiting this knowledge, CRYSALIS further improves scheduling efficiency. Its knowledge-basedscheduling procedure also provides a perspicuousframework for interpreting system behavior. Of course,this approachis possibleonly when an effective scheduling procedure is known. OPM. OPM plans multiple-task sequencesin a context of conflicting goals and constraints. Given a list of desirable tasks and a map of the region in which tasks can be performed, OPM plans which tasks to perforffi, how much time to allocate for each task, in what order to perform tasks, and by what routes to travel between successivetasks. The problem is complicated by differences in task priorities and time requirements, constraints on when tasks can be performed, intertask dependencies,and limitations on the time available for performing tasks. OPM's blackboard has four levels of abstraction: outcomes (tasks) the plan should achieve, designs for the general spatial-temporal layout of the plan, proceduresthat sequenceindividual tasks, and operationsthat sequencetask components. Its solution intervals represent different plan execution time intervals. Two coordinated blackboard panels with parallel levels of abstraction record reasoning about data and planning heuristics. Each decisionon the plan panel dependson a coordinated set of decisionson these other two panels;for example: Heuristic: Perform the closest task in the right direction next. The closest task in the right direction is the Data: newsstand. Plan: Go to the newsstand next. OPM has about 50 knowledge sources.Someoperatebottom up. For example, the knowledge source NOTICE-PATTERN detects spatial configurations of tasks at the design level from individual task locations at the procedure level on the data plane. Other knowledge sourcesoperate top down. For example, the knowledge sourceREFINE-DESIGN expandsdesigns as sequencesof procedureson the plan plane. OPM uses a two-part condition structure for knowledge sources.A condition's trigger is an event-basedtest of knowledgesourcerelevance.Its precondition is a state-basedtest of the knowledge source'scurrent applicability. Satisfaction of a knowledge source's trigger generates a KSAR, but a KSAR
can be executed only at times when its precondition is true. Both triggers and preconditions may contain arbitrary LISP code as long as they can be evaluated true or false. As in HEARSAY-II, knowledge source actions are arbitrary programs that produce blackboard events. OPM uses a uniform blackboard mechanism for reasoning about control. Control knowledge sourcesdynamically generate, modify, and execute a control plan out of modular control heuristics on the control blackboard. The control blackboard has different levels to represent control heuristics of varying scope.Its solution intervals represent different problem-solving time intervals. For example, at an intermediate point in the problem-solving process, OPM's control plan might contain this partial plan: Solueproblem P by generating an outcomeleuelplan and successiuelyrefi,ning it at lower leuelsof abstraction.Begin by generating an outcomeleuelplan. Always prefer KSARs with cred' ible triggering information and reliable actions. OPM's schedulerhas no control knowledgeof its own. Instead, it adapts its scheduling behavior to whatever heuristics are recorded on the control blackboard. OPM introduces variations in blackboard structure, knowledge source specification, and scheduling. Regarding blackboard structure, OPM distinguishes reasoning about problem data, planning (qv) heuristics, and the plan itself on separate blackboard panels. It also provides a separate blackboard panel for reasoning about scheduling.Thus, OPM introduces explicit representation of all aspects of the problem-solving process.Regarding knowledge sources,OPM introduces a twopart condition structure that combines an efficient eventbased triggering mechanism with a precondition mechanism for restricting execution of triggered KSARs to appropriate contextual conditions. Finally, OPM introduces a simple schedulerthat adapts to a dynamic control plan and a uniform blackboard mechanism for generating the control plan. This enables OPM to integrate the opportunistic and strategic scheduling heuristics. Further, OPM need not commit to any particular combination of heuristics but can dynamically adapt its control plan to unanticipated problem-solving situations. The control blackboard provides a perspicuous framework in which to interpret system behavior. ThreeBlackboardSystem-Building Environments This section describesthree blackboard system-building environments: AGE, HEARSAY-III, and BB1. A11three environments provide the basic architectural components:blackboard, knowledge sources, and scheduler, which a system builder must specify with LISP expressions.In general, AGE is the most constrained of the three systems and, as a consequence, provides the strongest guidance in system design. HEARSAYIII is the least constrainedand, as a consequence, providesthe greatest freedom in system design.BB1, which was developed several years after AGE and HEARSAY-III, adoptsand elaborates upon selectedfeatures of both systems and incorporates them with new features of its own. Age. AGE permits a userr,todefine a blackboard with any number of named levels and associatedattributes. Anv solu-
78
BLACKBOARD SYSTEMS composeseach blackboard into any desired lower level panels as well as desired levels and attributes. Knowledge source conditions specify a triggering pattern and immediate code. The user must express a knowledge source's triggering pattern as a predicate on AP3 fact templates and any LISP predicates composedwith AND and OR operators (seeANDi OR graphs). Whenever one of the constituent APB fact templates is modified, the entire pattern is evaluated. If it is evaluated as true, HEARSAY-III createsa KSAR that includes the knowledge source'sname, the AP3 context in which the pattern matched, and the values of variables instantiated by the match. At the same time the knowledge source's immediate code,which may be any LISP code,is executed.It records potentially useful scheduling information in the KSAR and places the activation record at a particular level of the scheduling blackboard. Knowledge-sourceactions are arbitrary LISP programs. The default scheduler simply selects any KSAR from the scheduling blackboard and executesits action program. However, the system builder can replace it with another scheduler tailored to the application. The scheduling blackboard provides an environment for explicit control reasoning through the activities of control knowledge sources. In illustrative HEARSAY-III systems the control blackboard typically partitions pending KSARs into different priority levels. Control knowledge sourcestypically assign KSARs to particular levels, adjust KSAR priorities within a level, and generate lists of KSARs for sequential execution by the scheduler. However, HEARSAY-III doesnot place any constraints on the structure of the control blackboard or the activities of control knowledge sources.The system builder can use them in whatever manner appears useful. HEARSAY-III is the least constrained of the three blackboard environments. It provides only the defining features of the architecture: the blackboard, condition-action knowledge sources,and a scheduler.But it imposesalmost no restrictions at all on their specification.The knowledge source conditions and actions and the scheduler can be arbitrary programs. This guidSelect a blackboard event according to a function specified gives the system builder great freedom but very little most HEARSAY-III's system. application an designing in ance by the system builder. domain between distinction its in lies specification important sethe Retrieve the list of knowledge sourcestriggered by and control blackboards and its suggestionthat control knowllected event. edge sources should record information on the control blackExecute each triggered knowledge source'slocal production board to influence the scheduler. However, HEARSAY-III system. leaves the productive use of this specification to the system
tion element created at a given level of the blackboard assumes the associatedattributes. Although AGE does not explicitly distinguish multiple blackboard panels, it permits the system builder to distinguish panels implicitly in the behavior of specific knowledge sources. Knowledge source conditions are lists of event labels that correspondto anticipated blackboard events. When an event with one of these labels is selectedby the scheduler, as discussedbelow, the knowledge source is triggered. Knowledge sourceactions are local production systems.The left side of a rule specifiespredicates that determine its applicability. The right side instantiates a template specifying a change to the blackboard and a label for that blackboard event. AGE provides a variety of blackboard accessfunctions for use in the rules. The system builder can define parameters that determine how many times individual rules can fire, how many rules can fire on each triggering of the knowledge source, and how predicates in the left sides of rules combine to invoke their right sides. These restrictions on knowledge source specification have advantages and disadvantages. First, the use of event labels permits an efficient table-lookup method for knowledge source triggering. On the other hand, it requires that the system builder anticipate all important blackboard events and the distinctive contexts in which they may occur. Knowledge sourcesthat generate and respond to events must be coordinated to use the same labels. Second,AGE's production system representation for actions and its blackboard modification templates provides a neat, uniform syntax with detailed code hidden in referencedfunctions. They also provide a foundation for AGE's explanation (qv) capability (in which it reiterates the sequenceof fired rules that produced a particular hypothesis) and for its elaborate interface for creating and editing knowledge sources. On the other hand, these restrictions sometimes hinder specification of complex knowledge source actions. AGE's scheduler iterates the following procedure:
Efficiency is the primary advantage of this scheduler. How- builder. ever, it severely restricts system behavior and the system BBI . BB1 (21) supports blackboard systemsthat explicitly builder's control over system behavior. The system builder can (seePlanning) their own problem-solvsupply only the event selectionfunction. The scheduleralways and dynamically plan (see Explanation) their behavior in explain predeblhavior, ing a opliates by first choosingan event and then executing plan, and learn (seeLearning) control underlying an of the terms by triggered source termined sequence of knowledge BB1 implements the experience. from event's label. It cannot incorporate heuristics for selecting new control heuristics in Ref. 22, which defined architecture control blackboard among or ordering knowledge sources. makes a sharp distinction between domain problems and the of its potential actions should a system HEARSAY-I|I.Erman, London, and Fickas (19) developed control problem: Which cycle? problem-solving execute on each HEARSAY-III, a general-purposeblackboard architecture. It control blackboards to reand domain (20) explicit defines 88L is built upon the relational database system called AP3 control problems. The and domain for elements solution cord searching and exploits APB's capabilities for representing and domain blackboard the of the structure directed graph structures, defining and preserving context, system builder defines BB1 definesthe levels. within attributes ., ,r"*ed levels and and triggering knowledge sources with a demon mechanism. problem to be the distinguish levels whose blackboard, control and HEARSAY-il partitions its blackboard into domain problem-solving strategies, local attenscheduling blackboards.The system builder hierarchically de- solved, sequential
BTACKBOARDSYSTEMS
tional foci, general scheduling policies, to-do sets of feasible actions, and chosen actions selectedfor execution. It also defines the attributes used to specify control decisions at each level. For example, a focus decision'sgoal attribute describes desirable actions, such as "generate solution elements at the outcome level." Its criterion describes the goal's expiration condition, such as "there is a complete and satisfactory solution at the outcome level." The control blackboard's solution intervals distinguish different problem-solving time intervals in terms of problem-solving cycles. 8BL definesexplicit domain and control knowledge sources. Domain knowledge sourcesoperate primarily on the domain blackboard to solve the domain problem. They are domain specificand defined by the system builder. Control knowledge sources operate primarily on the control blackboard to solve the control problem. Some control knowledge sourcesare domain independent and provided by BB1. For example, the knowledge source implement strategy incrementally refines a stratery decision as a series of prescribed focus decisions.The system builder may define additional domain-specificcontrol knowledge sources.All knowledge sourcesare data structures that can be interpreted or modified. A knowledge source'scondition comprises a trigger and a precondition. The trigger is a set of event-basedpredicates. When all of them are true in the context of a single blackboard event, the knowledge source is triggered and generates a representative KSAR. When running an application system, BB1 generates and uses a discrimination net of trigger predicates used in the system's knowledge sources.The precondition is a set of state-basedpredicates.When all of them are true, which may occur after an arbitrary delay, the triggered KSAR is executable. If the preconditions describe transient states, the KSAR may oscillate between triggered and executablestates. This specification of knowledge source conditions provides an efficient event-basedtriggering mechanism with a state-based mechanism for restricting action execution to appropriate contexts. A knowledge source's action is a local production system. The left sides of rules determine under what conditions they fire. The right sides instantiate blackboard modification templates. Control parameters determine how many times individual rules can fire, how many rules can fire on each triggering of the knowledge source, and how multiple left-side predicates are integrated to fire rules. In addition to its condition and action, each knowledge source has descriptive attributes that are potentially useful in scheduling. Thebe include the blackboard panels and levels at which its triggering events and actions occur, its computational cost, its relative importance compared to other knowledge sources,and its reliability in producing correct results. BB1 provides a variety of functions for inspecting the blackboard, knowledge sources, and blackboard events for use in defining knowledge sources.It also provides a simple menudriven facility for creating and editing knowledge sources. BBl defines a simple scheduler that adapts to foci and policies recordedon the control blackboard and schedulesthe execution of both domain and control knowledge sources.On each problem-solving cycle the scheduler rates executable KSARs against operative foci and policies.It applies a scheduling rule, which is also recorded on the control blackboard and modifiable by control knowledge sources, to the KSAR ratings to select one for execution.
79
BB1 provides a graphical run time interface with capabilities for inspecting knowledge sources, blackboard contents, blackboard events, or pending KSARs; enumerating pending KSARs; recommending a KSAR for execution; explaining a recommendation; accepting a user's recommendation;executing a recommended KSAR; and running without user intervention until a specifiedcondition occurs. The specification of BBl's knowledge sources and control mechanism underlie its capabilities for control, explanation, and learning. BB1 provides a general blackboard mechanism for reasoning about control, incorporating any strategic or opportunistic scheduling heuristics specifiedby the user. Moreover, it can construct situation-specific control plans dynamically out of modular control heuristics, avoiding the need to enumerate important problem-solving contingenciesor to predefine an effective control plan. BB1 explains its problem-solving actions by showing how they fit into the underlying control plan and by recursively explaining the control plan itself. BB1 learns new control heuristics when a domain expert overrides its scheduling recommendations.It identifies the critical features distinguishing the expert's preferred action from the scheduler'srecommendedaction and generates a heuristic favoring actions with those features. Researchlssues Tbo research issues dominate current studies of blackboard systems: effective scheduling and parallel computing. Effective scheduling is crucial to the speed and accuracy with which blackboard systems solve problems. Of the three defining architectural components ftlackboard, knowledge sources,and scheduler),the schedulershowsthe greatest variability among application systems and system-building environments. There is a general trend toward making scheduling decisionsand the reasoning underlying them explicit. In addition to improving performance, explicit control reasoning appears essential for automatic acquisition of more effective scheduling heuristics and for strategic explanation. Blackboard systems appear to have great potential for exploiting parallel computing environments. The modularity of knowledge sourcesmakes them ideal candidates for distribution among multiple processors.In addition, knowledge source triggeritg, KSAR execution, and blackboard modification could operate in parallel. There has been some exploratory work in this are (8,23,24),but the potential gains from a paralIel blackboard architecture remains largely unexplored.
BIBLIOGRAPHY 1. L. D. Erman,F. Hayes-Roth, V. R. Lesser,andD. R. Reddy,"The system:Integratingknowledge Hearsay-Ilspeech-understanding to resolveuncertainty,"Conl.put. Suru. 12r2t3-253 (1980). 2. H. P. Nii, E. A. Feigenbaum, J. J. Anton, and A. J. Rockmore, transformation:HASP/SIAPcasestudy," A/ "Signal-to-symbol Mag.3,23-35 (1982). Ph.D.The3. A. Terry,HierarchicalControlof ProductionSystems, sis,Universityof California,Irvine, 1983. 4. B. Hayes-Roth,F. Hayes-Roth,S. Rosenschein, and S. Cammarata, "Modelling Planning as an Incremental, Opportunistic Process,"Proceedings of the Sixth International Joint Conference on Artificial Intelligence,Tokyo, Japan, pp. 375-383, 1979.
BOTTZMANNMACHINE
BOLTZMANNMACHINE
5. D. D. Corkill, V. R. Lesser, and E. Hudlicka, "Unifying DataDirected and Goal-Directed Control: An Example and Experiments, Proceedingsof the SecortdAAAI, Pittsburgh, PA, pp. 143r47, 1982. 6. A. Hanson and E. Riseman, "VISIONS: A Computer System for Interpreting Scenes,"in A. Hanson and E. Riseman (ed.),Computer Vision Systems,Academic Press, New York, 1978. 7. E. Hudlicka and V. R. Lesser,Meta-Level Control Through Fault Detection and Diagnosis,Technical Report,Amherst, MA, University of Massachusetts,1984. 8. V. R. Lesser and D. Corkill, "Functionally accurate cooperative distributed systems," IEEE Trans. Sys/. Man Cybern SMC-I' 81-e6 (1981). 9. M. Nagao, T. Matsuyama, and H. Mori, "Structured Analysis of Complex Photographs," Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence,Tokyo, Japan, pp. 610616, 1979. 10. B. Hayes-Roth, The Blackboard Architecture: A General Framework for Problem-Solving?,Technical Report HPP-83-30, Stanford CA, Stanford University, 1983. 11. B. Hayes-Roth and F. Hayes-Roth, "A cognitive model of plann i n g , " C o g . S c i .3 , 2 7 5 - 3 1 0 ( 1 9 7 9 ) .
The Boltzmann machine (1) is a massively parallel architecture that uses simple on-off processingunits and stores all its Iong-term knowledge in the strengths of the connectionsbetween processors.Its main difference from other connectionist architectures (2-4) (seeConnectionism;Connectionmachine) is that the units use a probabilistic decision rule to decide which of their two states to adopt at any moment. The network computes low-cost solutions to optimi zation problems by settling to thermal equilibrium with some of the units clamped into their on or off states to represent the current task. For a perceptual interpretation task the clamped units would represent the perceptual input; for a memory retrieval task they would represent a partial description of the item to be retrieved. At thermal equilibrium the units continue to change their states, but the relative probability of finding the network in any global configuration is stable and is related to the cost of that configuration by a Boltzmann distribution:
L2. J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effectsin letter perception:Part 1. An accountof basic findings,"Psychol.Reu.88' 375-407 (1981).
where Po is the probability of being in the ath global configuration and Eo is the cost of that configuration.
13. M. Rose, The Composition Process,Ph.D. Thesis, University of California at Los Angeles, 1981. t4. D. E. Rumelhart and J. L. McClelland, "An interactive model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model," Psychol.Reu.89, 60-94 (1982). 15. A. Newell, J. C. Shaw, and H. A. Simon, "Report on a General Problem-Solving Program," Proceedings of the International Conferenceon Information Processing,UNESCO House, Paris, France, 1959. 16. M. Stefik, J. Aikens, R. Balzer' J. Benoit, L. Birnbauh, F. HayesRoth, and E. Sacerdoti, "The organization of expert systems: A prescriptive tutorial," Artif. Intell. 18, 135-L73 (1982). L7. M. Minsky, Steps Toward Artificial Intelligence, in E. A. Feigenbaum and J. Feldman (eds.), Computers and thought, McGrawHill, New York. PP.406-450, 1961' 1g. M. Stefik and L. Conway, "Towards the principled engineering of knowledge,"AI Mag.3,4-16 (1982). 19. L. D. Erman, P. E. London, and s. F. Fickas, The Design and an Example Use of Hearsay-Ill, Proceedingsof the SeuenthInternational Joint Conference on Artifi,cial Intelligence, Vancouver' BC, pp. 409-4L5, 1981. 20. N. M. Goldman, AP3 ReferenceManual, Technical Report, Los Angeles, cA, Information sciencesInstitute, L982. Zl. B. Hayes-Roth, BB1: An Architecture for Blackboard Systems that Control, Explain, and Learn about Their Own Behavior, Technical Report HPP-84-16, Stanford, CA, Stanford University,
)tr e-(Eu-EB
ft:
(1)
CooperativeComputationof BestFitsby EnergyMinimization Tasks like perceptual interpretation (seeVision) and contentaddressablememory can be formulated as optimization probIems in which there are massive numbers of plausible constraints (see Constraint propagation), and low-cost solutions typically satisfy most but not all of the constraints (5,6).The Boltzmann machine allows the constraints to be implemented directly as interactions between units. If these interactions are symmetrical, it is possible to associatean energy E with each global configuration (7,8): E-
|*,is;sr*)0,t, i<j
(2)
i
where w;i rsthe weight of the connectionfrom theTth to the ith unit, s; is the state of the lth unit (0 or 1), and g; is a threshold. Each unit can compute the difference in the global energy for its off and on statesgiven the current states of all the other units. This energy gap is simply the sum of the weights on the connectionscoming from other on units. So to monotonically reduce the global energy, units should adopt their on state if and only if their energy gap is positive (7). Searchesfor minima of an energy function can be improved by adding thermal noise to the decision rule (9). The thermal noise allows the network to escape from local minima and to pass through higher enerry configurations. By giving each of the very large 1984. number of higher enerry configurations a small chance of beZZ. B. Hayes-Roth,"A blackboard architecture for control," Artif. Ining sampled, it effectively removes energy barriers between telli. J. 26, 25L-321 (1985). -ini-a. In the Boltzmann machine the probabilistic decision ZB. R. Fennell and V. Lesser, "Parallelism in AI problem-solving: rule used to simulate thermal noise is C'26, 98-111 A case study of HSII," IEEE Trans' Comput' (1e77). 24. V. R. Lesser and L. R. Erman, "Distributed Interpretation: A model and experiment," IEEE Trans. Comput. C'29, 1144-1163 (1e80).
B. HeYns-RotH Stanford UniversitY
Pn:fu
(3)
on where pp is the probability that the kth unit adopts the each If temperature. state, [nuis its energ"ygap, and ? is the are unit is sampled withfinite probability and if time delays to network whole the cause will rule decision negligible, lttir a upitouch thermal equilibrium. The fastest way to approach
BRANCHINGFACTOR
low-temperature equilibrium (at which low-cost configurations are far more probable than high-cost ones) is to start with a high temperature and to gradually reduce it, a process called simulated annealing (9,10). Probabilities Representing In a Boltzmann machine the probability that an atomic hypothesis is correct is representedby the probability of finding the corresponding unit in the on state. This allows the machine to correctly represent the probabilities of complex hypothesesthat correspondto configurations of on and off states over many units. Systems that use real numbers to directly represent the probabilities of the atomic hypotheses(seeReasoning, plausible) have great difficulty representing the higher order statistical structure correctly and so they cannot implement Bayesian inference (see Bayesian decision methods) unless exponentially many numbers are used or very strong independenceassumptions are made (11). In a Boltzmann machine the weights implicitly encodethe a priori probabilities of an exponential number of configurations. Learning
81
2. S. E. Fahlman, G. E. Hinton, and T. J. Sejnowski, "Massively parallel architectures for A.I.: Netl, Thistle, and Boltzmann Machines,"Proc. of the Third Nat. Conf.Artif. Intell. 109-113, 1983, Washington, D.C. 3. J. A. Feldman, and D. H. Ballard, "Connectionistmodelsand their properties"Cogn. Sci. 6, 205-254 (1982). 4. G. E. Hinton, and J. A. Anderson (eds.),Parallel Models ofAssociatiueMemory. Erlbaum, Hillsdale, NJ, 1981. 5. D. H. Ballard, G. E. Hinton, and T. J. Sejnowski,"Parallel visual computation," Nq,ture306, 2I-26 (1983). 6. D. E. Rumelhart and J. L. McClelland (eds.),Parallel Distributed Processing:Bxplorations in the Microstructures of Cognition. Vol, 7, Foundations, MIT Press,Cambridg", MA, 1986. 7. J. J. Hopfield, "Neural networks and physical systemswith emergent collective computational abilities," Proc. Nat. Acad. Sci. usA 79, 2554-2558 (1982). 8. R. A. Hummel and S. W. Zucker, "On the foundations of relaxation labeling processes,"IEEE Trans. Pattern Anal. Mach. Intell. PAMI-', 267-287 (1983). 9. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecci, "Optimization by simulated annealing," Science22O,671-680 (1983). 10. S. Geman, and D. Geman, "Stochasticrelaxation, Gibbs distributions, and the Bayesian restoration of images,"IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6,72L-74L (1984).
There is a simple but powerful learning (qv) algorithm (1,11) 11. G. E. Hinton and T. J. Sejnowski,"Optimal perceptualinference," Proc.IEEE Conf. Comput. Vision Pattern Recog.,Washington DC, that allows a Boltzmann machine to learn weights that constipp. 448-453,1983. tute an internal, generative model of the structure of an envi12. F. Crick and G. Mitchison, "The function of dream sleep,"Nature ronment in which it is placed.The environment clamps config3 0 4 , 1 1 1 - 1 1 4( 1 9 8 3 ) . urations of on and off states on a "visible" subset of the units. The learning algorithm modifies the weights so as to maximize G. E. HwroN Carnegie-Mellon University the likelihood that the same probabitity distribution of configurations will occur over the visible units when the machine is run without environmental input. The learning works in two phases.In the positive phasethe environment clampsthe visi- BORIS ble units, the network settles to thermal equilibrium at a finite temperature, and the weights between units are in- An understanding program written by Michael Dyer at Yale creasedby an amount proportional to how often the units are in L982, BORIS can read and then answer questions about both on together at equilibrium. In the negative phase the several complex narrative texts (seeQuestion Answering) and visible units are unclamped, the network settles to equilib- it uses the approach of integrating parsing and inferencing rium, and the weights are decreasedby an amount propor- (see M. Dyer, In-Depth Understanding, MIT Press, Camtional to how often the two units are on together (12), The bridge, MA, 1933). result of repeatedly applying this procedure is that the netK. S. Anona work turns its nonvisible units into feature detectorsthat alSUNY at Buffalo low it to represent the structure of its environment in the weights. Problems Several obstaclescurrently prevent Boltzmann machines from being of practical use. It can take a long time to reach thermal equilibrium (10) so if the weights are hand coded,care must be taken to avoid energy barriers that are too high for annealittg searchesto cross.If the weights are learned, equilibrium must be reached many times to know how to change the weights, and the weights must be changed many times to construct goodmodels, so even very simple learning tasks require many hours of CPU time.
See Processing, bottom-up and BOTTOM-UP PROCESSING. top-down.
BRANCHINCFACTOR
A branching factor is a parameter that measuresthe effective complexity of a problem or a searctr(qv) algorithm, especially those charactertzed by an exponentially growing complexity. The term branching factor has evolved from the metaphor of a uniform tree where each internal node sprouts exactly b branches and the total number of nodes up to depth d is 164+r- 1)/(b - 1). Thus, if an algorithm searchessuch a tree BIBLIOGRAPHY and generatesevery node up to depth d, the complexity of that 1. D. H. Ackley, G. E. Hinton, and T. J. Sejnowski,"A learning algorithm will be roughly tbl(b L)lbd,with b measuring the to each additional level of due in relative increase complexity -169 for Boltzmann machin€s," Cogn. Sci. 9, I47 algorithm search (see Problem solving). (1985).
82
BRANCHINGFACTOR
This growth rate measurement can be extended to algorithms whose search spacesare nonuniform trees. It d stands for the maximal depth reached by an algorithm A, and Na stands for the number of nodes generated during the search, then the effective branching factor, Be, can be defined by
where fa is the unique positive root of the equation xbr-x-l_o
(6)
Moreover, this branching factor is the best achievable by any game-searchingalgorithm (seeAlpha-beta pruning). (1) Ba - (No;tra Roughly speaking, a fraction of only B lb : $Lt4ofthe b legal available from each game position is exploredby alphamoves Indeed, when applied to a uniform tree, this formula gives beta. Alternatively, for a given search time allotment, the alpha-beta pruning allows the search depth to be increasedby (2) 86: (h)''ou a factor log bllogfi, : 413over that of an exhaustive minimax search. which, for large d, reduces to Under perfect ordering of successors,alpha-beta examines ( 3 ) a total of 26dt2 1 game positions,thus, Ba_ b In general, the complexity Na ma1zvary significantly from one problem instance to another and may be a complex function of d. Therefore, the definition of Ba is usually applied to the average number, 14, of nodes generated by algorithm A and usually invokes the limit as d -) oo: Bs- jg tlo(d)frtd
(4)
This definition extracts the basis of the dominant exponential term in the expressionof Ia@). In summary, the branching factor measures the relative increase in average complexity due to extending the search depth by one extra level or, equivalently, it measuresthe average number of branches explored by an algorithm from a typical node of the search space(1).
B _ b for exhaustive search. B : b3t4for alpha-beta with random ordering. B _ bltz for alpha-beta with perfect ordering. It is important to mention that the branching factor only captures the asymptotic growth rate of a search strategy as the search depth increasesindefinitety; it doesnot reflect the size of nonexponential factors in I(d) regardless of how large they are. However, an exact evaluation of the average performances of three game-searchingstrategies shows that the ratio /( illBd is fairly small (3); it remains below 5 over wide 20). r a n g e s o bf a n d d f t ' 2 0 , d =
BIBLTOGRAPHY
Applications The primary usage of the branching factor has been in comparing the pruning power of various game playing strategies (see Game playing; Game trees). Theoretical analysis of these strategies usually assumes uniform, b-ary game trees, searchedto depth d, with random values assignedto nodesat the search frontier (1). Based on this model, it can be shown (2,3) that the branching factor of the alpha-beta pruning (qv) algorithm (as well as that of SCOUT and SSS.) is given by
B:
tt : 1- tu
[3t4
(5)
pruning," 1. D. E. Knuth andR. E. Moore,"An analysisof alpha-beta Artif. Intell. 6(4),293-326(1975)2. J. Pearl,"The solutionfor the branchingfactor of the alpha-beta pruning algorithm and its optimality," CACM 25(8),559-564' (1982). g. J. Pearl, Heuristics:Intelligent SearchStrategiesfo, Computer Readitg,MA, Chapters8 and9, ProblemSoluing,Addison-Wesley, 1984. J. Pr^qnl UCLA
THEREADINCOF RECOGNITION: CHARACTER TEXTBY COMPUTER
CADUCEUS An expert system for medical diagnosis (see Medical advise systems) developed by Jack Myers and Harry Pople at the University of pittsburgh and completed in 1985. This system is an enhancement of INTERNIST (qv) in that it incorporates causal relationships in its diagnosis(seeP. Szolovits(ed.),Ar' tifi.ciatIntettigrnri in Med.icine,westview,Boulder, co, 1982). K. S. Anone SUNY at Buffalo
The reading of text by computer is an AI topic that has been investigated for more than 25 years. An early example is the work of Bledsoe and Browning (1). The objective of work in this area is to develop the ability to convert an image (twodimensional array of intensity values known as pixels) of text into a computer-interpretable form, such as ASCII code,with the same fluency and accuracy that a human could read the same material.
THE READINGOF TEXT BY COMPUTER RECOGNITION: CHARACTER
Currently, the most frequently used methodology for the design of reading algorithms has the three stages illustrated in Figure 1. The image preprocessingstage determines which of the image are text and isolates images of individual "rr", characters within the words of the text. These character imagesare then passedto a character recognition algorithm that identifies one or more letters that match each one. These decisions are then passedto a contextual postprocessingalgorithm that resolves ambiguities or corrects errors in the character decisions. The following sections of this entry survey the character recognition and contextual postprocessingaspectsof reading algorithms. The basic stratery of each area is discussed,and r.u.t.l notable AI approachesto each one are presented.An analogy is developedbetween these methods and explanations of human reading. The large gap that exists between the performance of current algorithms and human fluency is shown. The benefits to be gained by adapting results from studies of human reading to the development of reading algorithms are speculatedabout and preliminary efforts in this area are discussed. CharacterRecognition Character recognition techniques associatea symbolic identity with the image of a character. These methods can be generally classified as either template matching or feature analysis (seeMatchitg) algorithms. TemplateMatching. Template matching techniques directly comparean input character image to a stored set of prototypes. The prototype that matches most closely provides recognition. The comparison method can be as simple as a one-to-onecomparison of the input and prototype images or as complex as a decision tree analysis in which only selectedpixels are tested (2). Template matching is suitable for an application where a limited number of character images have to be recognized(3). However, it suffers from a lack of robustness because of a sensitivity to noise in the image and an inability to adapt to differencesin character style. It is interesting from an AI perspective that template matching has been ruled out as an explanation for human performance for similar reasons (4). FeatureAnalysis. Feature analysis techniques are more frequently used for character recognition. In this approachsignificant features are extracted from a character image and compared to the feature descriptions of ideal characters. The description that matches most closely provides recognition. A comparison procedure favored by several AI researchers is basedon the size and relative placement of strokes. Strokes in this context are somewhat analogousto the strokes made by a
Scanning
Input document
person when a letter is drawn. For example, an F is composed of thtee strokes: one long vertical stroke and two short horizontal strokes. An advantage of feature analysis is its ability to adapt to new characters and its tolerance to noise in an image. Thus, the capabilities of a human reader are captured more accurately by feature analysis than by template matching. This has causedfeature analysis to be proposedas a model for human letter recognition (4). Many feature analysis techniques have been developedand applied to character recognition. Most of these are examplesof traditional pattern recognition methods and are usually suitable for application to constrained domains. SomeAI methodologies have been incorporated in such traditional techniques [e.g.,a rule-basedsystem (5)] (seeRule-basedsystems).In particular, the use of a semantic network (qv) is discussedbelow. Two additional AI approaches are also presented that have utilized analogies to the human character recognition process. The feature analysis technique developedby Krumme (6) is an example of a traditional solution to the character recognition problem that uses AI techniques. It uses a semantic network to encodeknowledge about strokes. The network is also used to direct the analysis of a character image. The Krumme network is made up of many types of directed arcs and nodes, only a small portion of which are describedhere. The subset arc s states that the node at its tail is a subset of the node at its head. The property arcp states that the node at its tail has the property at its head. Terminal nodes represent a primitive property of the image, and nonterminal nodesrepresent a settheoretic property about the image. A node with outgoing s and p arcs represents the largest subset of the set at the head of the s arc with the property at the head of the p atc. A node with more than one outgoing s arc represents the intersection of the sets at the heads of the s arcs. Descriptionof F. The example description of the capital F shown in Figure 2 illustrates these concepts.Node 2 represents the subset of all the input with a major vertical line on the left. Note that this includes many letters such as B, D, E, F, H, and so on. Nodes 4 and 5 represent the strokes near the top and middle of the major vertical line, and node 6 represents the conceptthat there is no other stroke near its bottom. Nodes 7 and 8 represent the concept that the horizontal line near the top of the major vertical line is on its top and to its right. Nodes 9 and 10 represent a similar conceptfor the horizontal line near the middle of the character. Finally, node 11 represents F as the intersection of the sets represented by n o d e s6 , 7 , 8 , 9 , a n d 1 0 . This is not only a description of F but also a plan to follow for its recognition. The terminal node input reads a character image and begins recognition. The major vertical line is then tested for, and if it is located, additional tests are carried out to locate the appropriately oriented strokes near the top and mid-
Segmented word images
lmage preprocessing
83
Character recognition
Character decisions
Coded word decisions
aI Contextu postprocessing
Figure 1. Methodolory of most current reading algorithms.
A S CII, EBCDIC, etc.
B4
CHARACTER RECOGNITION:THE READINGOF TEXTBY COMPUTER
mine the functional attribute at the pivot of the ambiguity (7). An intermediate skeletal level provided a description that distinguished characters from everything else as well as from characters in other families of type fonts (8). This level of description was implemented as a set of graphs, one for each character in each font family. The lowest physical level in this hierarchy is where actual character images were placed. This representational system can be used for recognition in several ways. Functional descriptionscan be used directly if O nr i g h t H o r i z o n t aI procedures are developedto detect the features they specify. p line This would be appropriate if it were known a priori that only character images (not graphics, halftones, etc.) might be preFocustop sented to a recognition system since functional descriptions can only distinguish one character from another. Otherwise, a On top/ bottom skeletal representation would be a better choice since it can 1 1 discriminate characters from everything else. This corre10 sponds more closely to the way people read letters; however, its font-specific nature loses some robustness. The main advantage of this line of research is its acknowledgementof the On right complexity of the character recognition task and the necessity Horizontal l i n e to incorporate knowledge about human character recognition p in algorithms. Knowledge Source Organization. The robustness of human Ietter recognition and its place in a more complex reading processthat involves the syntax and semanticsof an input text In p u t was acknowledged by Brady and Wielinga (9). They studied the organization of knowledge sources needed to read handprinted FORTRAN coding sheets.As part of this project they developedalgorithms for the recognition of isolated characters that could utilize syntactic and semantic information provided by a FORTRAN reasoner. Clearbottom A stroke-based character recognition scheme similar to of F in the Krummenetwork(adaptedfrom that of Krumme was implemented in an early version of this Figure 2. Representation system. In this method strokes and relations between strokes Ref.6). were used to specify models for characters. For example, the model for an F shown in Figure 3 specifiesthat it must contain one vertical stroke and two horizontal strokes. One of the horidle of the image. If any of these tests fail, backtracking takes zontal strokes must be above both the other strokes, and the place and the presenceof primitives from other characters is two horizontal strokes must be to the right of the vertical determined. For example, in the complete syst€ffi,if the major stroke. The junctions between strokes were also specified. vertical line cannot be located,a loop, such as occursin an O, is There must be an Z junction between the vertical stroke and tested for next. Advantages of this approach include its use of one of the horizontal strokes, and there must also be a 7 junca more flexible control structure than most traditional meth- tion between the vertical stroke and the other horizontal ods. Disadvantagesinclude its application to a limited alpha- stroke. bet of only 20 uppercaseletters. Although many casesof disRecognition was determined by the features specified in torted input were recognizedcorrectly, the robustnessof this such models. The relations and junctions between strokes aptechnique remains unclear. pear to have been particularly important features. It is interIsolated Characfers.The development of an algorithm for the recognition of isolated characters that overcomesthe constraints of traditional techniques was pursued by the MIT research group composedof M. Eden and B. Blesser, among STROKES 3 [ V HT HB ] ; Vertical, Horizontal Top, and Horizontal Bottom others. As part of this work they developed a character de- D IR EC T ION S hor i z ontat I H T H B I v er ti c al t V l scription schemethat was basedon human experiments.The people to use features if the was that approach this behind idea R ELAT IO N S [ ( ABO VE H T H B) (ABOVE HT V) recogntzeLettersare properly describedand used in a charac(RIGHT HT V) perform as ter recognition algorithm, the algorithm should (RIGHT HB V) I well as a human. Functional,Skeletal,and PhysicalLevels.Three levels of de- PO SIT ION S [((HH- -TMOIPD DHLTE) HB) scription were distinguished. The abstract or functional level ( V- LEF T V) ] defined the essential meaning of letters in terms of a set of ruNcrIoNS tli-iUilSl3il features or functional attributes. These were determined by a XilI]r procedure that included the presentation of ambiguous characfor F (adapted from Ref. 9). representation Figure 3. Brady's ters to human subjects and the use of their responsesto deter-
RECOCNITION:THE READINCOF TEXTBY COMPUTER CHARACTER
8s
binary data in the matrix indicates whether or not the letter combination that specifiesits location occursin the diction ary. A 1 (logical true value) indicates the occurrenceof the letter combination, and a 0 (logical false value) indicates its nonoccurrence.Typically other n values (position indices) are associFigure 4. Exampleof handprintedFORMATstatement. ated with each anay. These teII the positions in which the letter combinations occur within diction ary words. This method can be used to detect as well as correct eruors esting that FORTRAN knowledge could be used to modify the in the output of a character recognition algorithm. Many error search for these features. For example, if Figure 4 was input types can be handled by this approach;however, only the suband a FORMAT statement was expected,an F would be sought stitution of one character for another is described here since in the first position. At some point in this processconfirming this is the most commonerror in character recognition. A word evidence would be sought for the L junction at the top of the is consideredcorrect only if the intersection of all its appropricharacter. However, since this junction is not physically con- ate n-gram entries is nonzero. Otherwise, it must contain an nected, additional image processing could be invoked to in- error. The position of the error is determined by intersecting crease the length of the strokes to see if they could be con- the sets of position indices that returned zero in the detection nected. If this occurred, the junction was confirmed; otherwise phase.If there is only a single position in this intersection,it the presenceof an F was denied. contains the error. Vectors from all the aruays that involve The sensitivity of this approach and its dependenceon that position, given that the other positions are correct, are many empirically determined thresholds such as how far to then intersected. If there is only a single letter in that interincrease the length of strokes was acknowledged.An alterna- section,it can be substituted in the error position to produce a tive representation basedon a two-dimensional version of gen- word that is acceptableto the n-gram arrays. eralized cylinders was proposedto better capture the characAn example illustrates these points. Figure 5 shows a dicteristics of letters that are used by human readers.However, it tionary of the three three-letter words {cat, cot, tot}. The three was not extensively describedin published accounts.The im- binary digram (n portant point of this work is its acknowledgment of the com- shown. plexity of readittg and the necessity to use many knowledge If a character recognition technique outputs the string coo, sources to adequately recogRizeisolated characters. On the detection of the error would be done by detrimental side, one knowledge source that apparently was dt,z(c,o) O dt,sk, o) n dz,sb, o) not incorporated is very important to human readers. This is
t-ORV/hT
the dependenciesbetween letters of an input vocabulary. This knowledge sourcehas been extensively investigated and used in contextual postprocessingtechniques.
This would return 0 from both dt3 and dz,s.Since the intersection of {1, 3} and {2, 3} yields {3}, correction is done by intersecting the vectors: dt,s(c,*) O dz,s(o,x)
ContextualPostprocessing Contextual postprocessing techniques utilize a knowledge source one step above the level of individual characters to resolve ambiguities and correct errors in character recognition. These methods use information about other characters that have been recognized in a word as well as knowledge about the text in which the word occursto carry out this task. Typically, the knowledge about the text takes the form of a dictionary (a list of words that occur in the text). For example, a character recognition algorithm may not be able to reliably distinguish between a u and a u in the secondposition of q.ote. A contextual postprocessingtechnique would determine that u is correct since it is very unlikely that qvote would be in an English language dictionary.Thus, some of the knowledge possessedby human readers is incorporated in such methods. Methods of contextual postprocessingdiffer in their manner of knowledge representation (qt). Some methods use an approximation to a diction ary that often takes the form of probabilities of letter transitions (10). Other approachesuse an exact representation such as a serial representation (11), a hash table (Lz), or a graph structure (18). Binary n-Grams. The method of binary n-grams is one approach that uses an approximate representation (L4). In this method a set of n -dimensional binary arrays representsa dictionary. Each of the dimensions can take on one of m values, where rn is the number of letters in the alphabet, and the
The resulting vector has only one nonzero element, corresponding to a t. Therefore coo is corrected to cot. This short example illustrates several of the advantages and disadvantagesof this method. The computations to locate and correct errors are relatively simple and involve only binary comparisons. Hence they can be economically implemented. However, the potential storage costs are also apparent by observation of the sparsenessof the arrays. This is a major weakness of this method as discussedin Ref. 15, where the binary n-gram technique is comparedto an approachthat uses an approximate statistical representation of a dictionary. The binary n-gram method is shown to be goodfor error detection; however, the statistical approach is better for error correction especially because of the large amount of memory
di c ti onar y : { c at, c ot, tot }
a
a
c
c
c
o
o
o
t
t
t
d r ,z
a
dt,z
d2,3
Figure 5. Example dictionary and its representation by binary digram arrays.
CHARACTER RECOCNITION:THE READINGOt TEXTBY COMPUTER
needed to yield good correction performance with the binary n-gram method. DVA. The dictionary viterbi algorithm (DVA) is a contextual postprocessingtechnique that uses an exact representation for a dictionary (seeViterbi algorithm). A graph of letter alternatives produced by a character recognition algorithm is first set up. An example of such a graph is shown in Figure 6. The string tof at the top of the graph is assumedto be input from a character recognition algorithm and {o, c, o, t} is the alphabet of the sourcetext. Each node is labeled with a letter of the alphabet and has a cost associatedwith it that is the probability that the letter on the node is confused with the corresponding letter of the input word. Each arc in the graph also has a cost associatedwith it that is the probability that the letter at its head follows the letter at its tail in the source text. A path is traced through this graph in a left-to-right manner one column at a time. The costsof all the ways of reaching a node from nodes in the previous column are computed, and only the partial path with the best cost is retained. Each time the cost of an arc is evaluated, the presencein the dictionary of the substring composed of the letters on the path from the beginning of the graph to the node at the head of the arc is determined. If it does not occur in the diction&ry, this partial path is discarded from future consideration. This evaluation processis performed once for every node in the graph of alternatives. The letters on the best path from the first node to the last node are output. Trie. The simultaneous searching of the graph of alternatives and the dictionary is done with a data structure for the dictionary known as a trie. An example trie for the dictionary {cat, cot, tot} is shown in Figure 7 .If the graph of alternatives shown in the previous figure was evaluated with this trie, only the c and f nodesin the first column would be consideredsince these are the only two letters at the first level of the trie. At the next step only one path to each of the o and o nodesin the secondcolumn would be retained. These partial paths would most likely be ca and to. At the next step, only cat and tot would be consideredbecauseof the absenceof any other paths in the trie. Most probably tot would be output becauseit is most like the input. The DVA, as other techniques that use an exact representation for a diction &ty, is more accurate than methods that use an approximate representation. It is shown in Ref. 13 that its performance is better than a similar technique that uses an approximation. However, methods based on exact representa-
Figure 6. Example graph of alternatives for DVA.
Figure 7. Trie representation for {cat, cot, tot}.
tions incur additional processing costs. The acceptability of these costs should be determined by the application and the need for improved performance. Applications Reading technolory has many examples of practical applications in the marketplace. Small desk-top character readers that cost about $10,000each and can typically recognizeup to six fonts have recently been appearing in offices.The Kurzweil Corporation manufactures medium-sized character readers that cost about $35,000each but can recognizea wide range of character fonts (16). The United States and other countries have recently installed large postal address-readingmachines that cost about $500,000each;however,they must meet more stringent performance requirements than most other character readers(17). The performance of all these machines is controlled by many constraints. Deviations from these constraints can cause a large deterioration in performance(18). In most casesindividual characters must not touch one another, and text must be clearly printed in dark ink on a lightly coloredbackground (19). In some units the location of individual charactersmust fall within prespecified limits. Such constraints are present even in products manufactured by the Kurzweil Corporation. Even these machines require that characters be unsmudged and that adjacent characters not touch one another. Furthermore, although the ability to read many different fonts is claimed for the Kurzweil Data Entry Machine, this capability is achieved by requiring an operator to train the machine on new fonts. This constraint frequently causesthe machine to misrec ognrzetext printed in a font that it has not previously seen(20). The mere presence of such constraints in even the most sophisticated reading machines illustrates that the ability to read text automatically with the same fluency as a human remains an unachieved goal. This is further evidencedby the performance of portal address-reading machines that have been the subject of much research and development and are designed to read relatively unconstrained text. These machines can correctly read over 907oof the addressesthat appear on machine-printed first-class mail. However, they can only read about 347oof the addresseson mail from collection boxes. Overall, 627o of the addresseson mail processedby postal reading machines are correctly recognized(2L). These percentagesare basedon mail samplesthat were readableby a hr*utt operator. This shows that even the most expensive commercial equipment is not nearly as fluent as a human reader. Obviously much work is needed if a program is to reach levels of human comPetence.
CHARACTER RECOGNITION: THt READINGOF TEXTBY COMPUTER
Conclusions Levels of performance comparable to human capabilities are thus unachieved. Although some notable efforts exist (discussed above) for studying the human process of character recognition and applying the results of those studies to the analogousmachine process,no such effort has been carried out at the word level. The potential successof this approach is clear when the basic strategy of current algorithms is compared to explanations of human performance in word recognition. The recognition of individual characters followed by postprocessingwith a dictionary as an explanation for human performance was rejected in 1886 Q2). Although some interesting similarities exist between the relaxation-based word recognition system of Hayes (23) and the contemporary theory of word perception proposedby McClelland and Rummelhart (24), the algorithm is different from the theory in essential places. The development of a synergism between algorithms and theories is essential if algorithms are to reach levels of human competence. A preliminary study of word perception by human and computer was carried out by Brady (25). He used a computational simulation of human early visual processingto show that previous psychologicalresults that were attributed to higher level processingcould in fact be accountedfor by visual processing. He also speculatedon the importance of such an investigation to the development of an understanding of human and machine reading. The relationship between the shape of words and their recognition by humans and computershas been investigated (26). Word shape (the pattern of ascenders,descenders,and normalheight characters in a lowercaseword) is a visual cue that has been known for many years to be useful for word recognition by humans. Several alternative representations for word shape are looked into, and a representation is found that produces a small search space in a large dictionary. This representation is based on features that can be reliably extracted from word images and does not require the segmentation of words into characters. This avoids the major pitfall of current reading algorithms and more closely reflects the way visual information is used in the early stagesof word recognition by humans. These efforts are just the beginnitrg of what is needed to develop a fluent reading ability for computers. The background material discussedin this article points out the great amount of effort already expendedin the developmentof reading algorithms and shows several notable approachessuitable for application to limited domains. However, the many constraints imposed on implementations of these techniques and their lack of demonstrable general-purposeperformurt." illustrates a large gap between human and machine reading capabilities. Only if further efforts are made to apply resulti from studies of human reading to the design of algorithms will this gap be bridged.
BIBLIOGRAPHY 1. W. W. Bledsoeand I. Browning, "Pattern recognition and reading by machine," Proc. Eastern J. Comput. Conf. L6,225-282 (1g5g). 2. K. Y. wong, R. G. Casey, and F. M. wahl, "Document analysis system," IBM J. Res. Deuelop.26(6), 647-686 (November rgg2). 3. R. O. Sheppard,Jr., Feasibility and Implementation of an Adap-
87
tive Recognition Technique, PTR Research Report, USPS Research and Development Laboratories, January 1978. 4. R. N. Haber and L. R. Haber, "Visual componentsof the reading process,"Visible Lang. XV(2), I47-181 (1981). 5. D. D'Amato, L. Pintsov, H. Koay, D. Stone,J. Tan, K. Tuttle, and D. Buck, "High speed pattern recognition system for alphanumeric handprinted characters," Proceedingsof the IEEE Computer Society Conferenceon Pattern Recognition and Image Processing, Las Vegas, Nevada, July L982,pp. 165-170. 6. D. W. Krumme, Theory and Implementation of a Network Representation of Knowledge: Application to Character Recognition, Ph.D. Thesis, University of California, Berkley, June 1gzg. 7. R. J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, Ph.D. Thesis, Massachusetts Institute of Technology, August t97 4. 8. C. H. Cox, III, P. Coueignoux,B. Blesser,and M. Eden,"skeletons: A link between theoretical and physical letter descriptions,"Pa,ttern Recog.,15(1),IL-22 (1982). 9. J. M. Brady and B. J. Wielinga, Readingthe Writing on the Wall, in A. R. Hanson and E. Riseman (eds.),ComputerVision Systems, Academic Press,New York, pp. 283-299,1928. 10. R. Shinghal and G. T. Toussaint, "Experiments in text recognition with the modified viterbi algorithm," IEEE Trans. Pattern Anal. Mach. Intell., PAMI-L(2), 184-192 (April lg7g). 11. R. Shinghal and G. T. Toussaint,"A bottom-up and top-downapproach to using context in text recognition," Int. J. Man-Mach. stud., tt, 20L-2I2 (19?g). L2. W. Doster, "Contextual postprocessingsystem for cooperation with a multiple-choice character-recognition system," IEEE Trans. Comput. C-26, 11 (November Lg77). 13. J. J. Hull, S. N. Srihari, and R. Choudhari, "An integrated algorithm for text recognition: Comparison with a cascadedalgorithm," IEEE Trans. Pattern Anal. Mach.Intell.,pAMl-b(4), Bg4395 (July 1983). L4. A. R. Hanson, E. Riseman, and E. G. Fisher, "Context in word recognition,"Pattern Recog.,8, 35-45 (lgZG). 15. J. J. Hull and S. N. Srihari, "Experiments in text recognitionwith binary n-gram and viterbi algorithms," IEEE Trans. Pattern Anal. Mach. Intell., PAMI-4(E), b20-590 (September1982). 16. R. C. Kurzweil, "Artificial intelligence program at CORE of scanning system", Graphic Arts Monthly 5G,5G4-b66 (July 1984). L7. J. J. Hull, G. Krishnan, P. Palumbo, and S. N. Srihari, optical Character Recognition Techniques in Mail Sorting: A Review of Algorithms, Technical Report 214, State University of New York at Buffalo, Department of Computer Science,June 1984. 18. J. Schurmann, Reading Machines, Proceedingsof the Sixth International Conferenceon Pattern Recognition, Munich, FRG, October 1982,pp. 1031-1044. 19. Automation: A Guide to BusinessMail Preparation, Publication 25, United States Postal Service, March 1984. 20. H. Brody, "Machines that read move up a grade," High Technol., 3(2), 35-40 (February 1983). 2I. USPS, Report on the Field Testing of Commercial OCR's, USPS Researchand Development Laboratories, 1980. 22. J. K. Cattell, "The time it takes to see and name objects,"Mind,, 11, 63-65 (1886). 23. K. C. Hayes, Jr., Reading Handwritten Words Using Hierarchical Relaxation, Ph.D. Thesis, TR-783, Computer Vision Laboratory, University of Maryland, CollegePark, Maryland, July, lg7g. 24. J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effects in letter perception:part 1. an account of the basic findings," Psychol. Reu., 88(b), g7s-407 (September 1981). 25. M. Brady, "Toward a computational theory of early visual processing in reading," Visible Lang., XV(2), 188-21b (Spring 1981).
PROGRAMS CHECKERS-PLAYING 26. J. J. Hull, Word ShapeAnalysis in a Knowledge-BasedSystem for Reading Text, Proc. of the Second IEEE Conferenceon Artificial Intelligence Applications, Miami Beach, Florida, December 1985, 114-119.
element of chance.The presenceof clear rules and goals makes it a game of strategy. Also, the game is one of perfect information in the sense that at any given time both players have complete knowledge of all the previous moves and the current board situation. Finally, the outcomeof a game is either a win for one of the two players and a loss for the other or a draw: General References Checkers is therefore a zero-sum game. G. Nagy, Optical Character Recognition:Theory and Practice, in P. R. Like most other game-playing programs all known proKrishnaiah and L. N. Kanal (eds.),Handbook of Statistics,Vol. 2, grams for playing checkers search a game tree (qv), an exampp. 621-649, L982,is a survey of statistical feature analysis techple of which is shown in Figure 1. In such a tree nodes correniques for character recognition. spond to board positions and branches correspondto moves. E. Reuhkala, "Recognition of strings of discrete symbols with special The root node represents the board position from which the application to isolated word recognition," Acta Polytech. Scand. player whose turn it is to play is required to make a move. A Ma 38, l-92 (1983)containsa brief survey of methodsfor contexnode is at ply (or depth) k if it is at a distance of k branches tual postprocessingand an exhaustive bibliography. ply fr, which has branches leaving it S. N. Srihari , Cornputer Text Recognition and Enor Corcection,IEEE from the root. A node at Computer SocietyPress,Silver Spring, MD, 1984,is a tutorial on and entering nodesat ply k + 1, is called a nonterminal node; the reading of text by computer. Twenty basic papers and an ex- otherwise the node is terminal. A nonterminal node at ply ft is connectedby branches to its offspring at ply k + 1. Thus, the tensive bibliography are given. C. Y. Suen, M. Berthod, and S. Mori, "Automatic recognition of offspring of the root represent positions reached by moves from handprinted characters-the state of the art," Proc. IEEE 68(4), the initial board; offspring of these represent positions reached 469-487 (April 1980) is a survey of techniques developedfor the by the opponent'sreplies; offspring of these represent positions recognition of isolated handprinted characters. reached by replies to the replies, and so on. The number of I. Taylor and M. M. Taylor, The Psychology of Readirg, Academic branches leaving a nonterminal node is the fan-out of that Press,Orlando,FL, 1983,is an overview of researchabout human node. The term branching factor (qv) is used to denote the reading. Contains a comprehensivebibliography. average fan-out for a given tree over all nonterminal nodes. J. R. Ullmann, Advances in Character Recognition,in K. S. Fu (ed.), A complete game tree represents all possible plays of the Applications of Pattern Recognition, CRC Press, Boca Raton, FL, game. Each path from the root to a terminal node corresponds pp. 197-236, L982,is a general overview of character recognition to a complete game with the terminal nodes representing a techniques oriented toward practical applications. A comprehenwin, loss, or draw. It has been estimated that a completegame sive bibliography including many U.S. and U.K. patents is given. tree of checkers contains approximately 1040 nonterminal nodes(3). Assuming that a program is capableof generating 3 J. J. Hur,r (10e)such nodesper second,it would still require in the SUNY at Buffalo biltion vicinity of 1021centuries in order to generate the whole tree. Instead, checkers-playingprograffis, like programs for playing most other similarly challenging games,search an incomplete NG PROGRAMS CHECKERS.PLAYI tree. The depth of such a tree is limited and, in addition, it is often the casethat not all paths are explored.In an incomplete tree terminal nodes are those appearing at some predefined Programs Game-Playing ply k or less and do not necessarily represent positions for Programming computers to play games is one of the earliest which the game ends. A static evaluation function is used to areas of AI research (1,2).As it did in the past, it continues assign a value to each of the positions.representedby terminal today to attract workers for a number of reasons.The first and most obvious of these is that the ability to play complex games appears to be the province of the human intellect. It is therefore challenging to write programs that match or surpassthe ployer I skills humans have in planning (qv), reasonitg, and choosing ployer 2 among several options in order to reach their goal. Another plyo motivation for this research is that the techniques developed while programming computers to play games may be used to solve other complex problems in real life, for which games serve as models. FinaIIy, games provide researchersin AI in particular and computer sciencein general with a medium for ply I testing their theories on various topics ranging from knowledge representation (qt) and the process of learning (qv) to searching algorithms (seeSearch)and parallel processing.The game of checkerswas one of the first for which a program was ply 2 written. This entry describesthe early and important work of (5) and Samuel (3,4) as well as more recent efforts by Griffith Akl and Doran (6) (see also Game playing).
u o
The Game of Checkers Checkers is an old board game believed to have originated in ancient Egypt (Z). It is played by two persons and involves no
ply 3
Figure l. A gametree:P, Q, andE are boardpositions.Number9 is the value ofthe alpha-betasearchofpositionP.
PROGRAMS CHECKERS.PLAYING
nodes.The alpha-beta algorithm (a refined version of minimax (qv) analysis) is then used to back up these values up the tree (see Atpha-beta pruning). When all the offspring of the root have been assigned backed-up values representing their "goodness,"the progTam choosesthe move that appears to be best (in light of this incomplete information). Once this move is made and the opponent has replied, the program generates and searchesa new tree from the current position to determine its next move. Note that game trees are generated while they are searched.A so-calleddepth-first search (qv) is usually followed: It starts by generating a completepath from the root to a terminal node; search then resumes from the latest nonterminal node on the path whose offspring have not all been generated or eliminated by the alpha-beta algorithm. Search continues until all nodes-up to some depth k-have been either generated or eliminated.
B9
W HI T E
t2
t3
3
l4
to
tl 7
I 4
r5
r6
t7
r9
20
2l
22
23
24
25
26
?8
29
30
3l
32
33
34
35
6
5
2
BLACK Figure 2. The standard8 x 8 checkerboard.
Samuel'sWork
1 if square i holds a black king; otherwise, it is set to 0. The third and fourth words are defined similarly for the white The best documented checkers-playing program was written -1967 pieces.Note that bits 9, L8, 27, and 36 correspondto squares (3,4). plays The program by Samuel in the period L947 not appearing on the board and are therefore unused in all players loses and most at avery high level: It can win against game four words. a against one win to managed it 1962 In only to the best. To see how all the possible next moves can be generated former champion of Connecticut and drew another with the world champion in L965. The purpose of Samuel'swork was to quickly from a given position, assumethat it is Black's turn to use the game of checkers to perform experiments in machine play. Ignoring kings and jumps for the moment, the rules of learning. The result was one of the earliest and most success- checkersspecify that piecesare only allowed to move forward ful game-playing programs that could learn from its own mis- to a diagonally adjacent square. Hence a black pieceon square i can go either to square i + 4 (by a right move) or to square i + takes to improve its plaY. 5 (by a left move) provided that such a numbered square appears on the board. For example, if there are four black men on lanin assembly program was written Representation.The squares5, 13, L5, and 26, as shown in Figure 3, the squares guage for the IBM 700 series computers whose word length was 36 bits. A clever technique that savedboth time and space reachableby these men are 10, 17,19 and 20, and 30 and 31, was used to represent a board position and generate all possi- respectively. By shifting the contents of the word in Figure 3 ble next moves. Consider the standard 8 x 8 checkerboard four positions to the right, one obtains a representation of the shown in FigUre 2, where black squares are numbered, and squares potentially occupied by right moves, namely, 9, 17, recall that checker pieces can be placed exclusively on these 19, and 30, as shown in Figure 4 (squarl9, of course,is not on squares.Thus, only four computer words, each with bits num- the board). Similarly, by shifting five positions to the right, bered 1-36, are needed to represent a given position. In the one obtains a representation of the squares potentially occufirst word bit i is set to 1 if square i holds a black piece (man or pied by left moves, namely 10, 18, 20, and 31, as shown in king); otherwise, it is set to 0. In the secondword bit i is set to Figure 5 (square L8 is also not on the board). Now let EMPTY
|
2
3
4
5
6
7
8
9 t O l r 1 2 1 3 1 4 1 5 1 6 t 7 r 8 1 9 2 0 2 1 ? 2 2 - 3 ? 42 5 2 6 ? 7 ? A e 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6
Figure 3. Four black men on squares5, 13, 15, and 26.
12 1314 t5 t6 t7 t8 t9 20 2t 2?23 2425 ?6 27 28 29 30 3t 32 33 34 35 36
5
Figure 4. The word in Figure 3 is shifted four positions to the right to obtain all potential right moves.
7
14 r5 t6 t7 l8 19 20 2t 222324 25 2627 ?A 293031 32 3334 3536
Figure 5. The word in Figure 3 is shifted five positions to obtain aII potential left moves.
PROGRAMS CHECKERS-PIAYING
used to order the offspring, and this order is to be respectedin be a word such that bit i is set to 1 if square I in the current board position is unoccupied;otherwise, if square I is occupied the search that follows. This method was called plausibility by either a black or a white piece,bit i is set to 0 (note that bits analysis by Samuel, and it ordered the available movesbased 9, 18, 27, and36 are also set to 0). By taking a bit-by-bit logical on their promise. Dynamic Ordering. Samuel also introduced a technique AND of the word in Figure 4 with EMPTY, it is possible to obtain simultaneously all right moves available to the four that allowed the program to revise the ordering of moves arblack men. Similarly, a logical AND of the word in Figure 5 rived at by the plausibility analysis. Suppose that the offwith EMPTY yields all left moves. Backward king moves, spring of a node have been ordered as above; a search up to a jumps, and multiple jumps are handled by simple modifica- limited depth is now started from the offspring ranked "best." tions to this approach. In terms of storage five words are At the end of this search the backed-up value is comparedto that of the offspring ranked earlier as "second best." If the neededto represent the moves: one word for each of the jump, forward right, forward left, backward right, and backward left former is better, the search continues to a greater depth; othmoves.The various rules of checkers,such as crowning, recog- erwise, it is interrupted and a new limited search is started nizing a win, loss or draw, and so on, are incorporated in the from the cument best offspring. The method can be repeated as program within this representation in a straightforward way. many times as needed and to any required depth. Forward Pruning. Both fixed and dynamic ordering are simSearch. The program uses the alpha-beta algorithm to ply time-saving heuristics and do not in any way affect the search trees up to a maximum depth of 20 moves. Instead of overall outcome of the search. Another way of reducing the holding the actual depth used while searching from a move to number of nodes examined by the alpha-beta algorithm proa constant, it is allowed to vary according to the position under ceedsas follows. First, plausibility analysis is used to order all consideration.Typically, the program begins by looking ahead the legal moves from a node; then the best few of these are three moves. Nodes at that level are evaluated directly if nei- retained and the others discarded. The number of moves rether the last nor the next moves are jumps and no exchange tained is inversely proportional to the depth at which they are offer is possible. If any of these conditions is satisfied for a generated. A variant of this method is used at a later stage given node, however, search proceeds from that node. For when a node is chosento begin a search.If the value assigned nodes at depth 4 search terminates if neither a jump nor an earlier to that node by plausibility analysis falls outside the exchangeare possiblefrom that position. From ply 5 to ply 10 range currently set by the alpha-beta algorithm, the node is discarded. Neither of these two forms of forward pruning is look-aheadis interrupted if no jump is possible.Searchtermiguaranteed not to discard a good move. more by is ahead greater if one side ply L l or at nodes nates for than two kings. Learning.Two learning mechanismswere provided in the In many situations during the search the program needsto generated program to constantly better the quality of its play. The first having without directly estimate the value of a node to memorize moves (or rote learning), the secability disthe function, was evaluation A static its offspring. or examined cussedin more detail below, is used for that purpose.It con- ond was a variable static evaluation function that could be improved through training (or learning by generalization) (see sists of a computational procedure that assigns a numerical various Learning). on value to the position the node represents based pieces Rofe Learning.fn rote learning the program memortzed the the of worth and number parameters such as the program has, the mobility of these pieces and their potential boards and their evaluations that were encountered during for capturing opponent pieces, their situation on the board, the courseof previous games.Assume that a goodstatic evaluand so on. The primary application of a static evaluation func- ation function has already been constructed and that at some tion is in assigning scoresto terminal nodes. It can also be point during the game it is the program's turn to move from game tree in used to enhance the alpha-beta algorithm through ordering board position P. The program generates the function evaluation its static Figure L and determines using and pruning of moves. say. At positionP is 9, of value the that (qv) search and alpha-beta coupled with Fixed ordering' when depth-first search by the suggested move the makes program point the a this generate search and the alpha-beta algorithm is used to supNow 9. value with position P together stores and examsearch node are of a offspring the game tree, the order in which in a arise were to 6 Figure in depicted pose situation the is that moves of perfect ordering A great importance. ined is of funcevaluation static the defined as one in which, for any node in the tree, the first move later game. Rather than invoking the generated is the best for the player whoseturn it is. Then for a tion to assign a value to position P , the program could use of number total fu"" of depth D and branching factor B, the terminal nodes generated by the alpha-beta algorithm is approximately 2gDrz instead of the full BD. This represents a rignincant savings in time due to the large number of nodes eliminated by the alpha-beta algorithm and that the program therefore need not examine. Consequently, for a constant number of terminal nodes search depth can be almost doubled. Of course,there is no way of guaranteeing such ordering, and many heuristics (qv) exist that attempt to approximate it. One such heuristic used by Samuel's program is to perform a shallow look-aheadfrom a given node and use the static evaluation functlon to assign values to the resulting terminal nodes. Thesevalues are backed up by the alpha-beta algorithm to the Figure 6. offspring of the original node. The backed-up values are now
CHECKERS-PLAYING PROGRAMS
stored value of P. This would have two advantages.First, if the time required to retrieve the value of P from storage is much smaller than that required to compute the static evaluation function, time is saved that could be used to search deeper somewhere else in the tree. Secondand more important, the value assignedto P in this manner was obtained by searching to depth 3 below P (Fig. 1) and is therefore more accuratethan the static value that would otherwise be computed. The net effect therefore is an improvement of the look-aheadability of the program. In addition to the board position and its value, the tength of the path followed in the game tree to compute this value was also stored. Subsequently, when the program had to choose between two or more moves leading to positions with equal values, it favored the position whose value had been reached by the shortest search. A senseof direction was thus acquired by the program, which was able in this way to progressquickly toward its goal (e.g.,a win in the end game). The board positions and their associatedvalues were saved in a large file that was stored on magnetic tape due to centralmemory limitations. The file was organized so as to achieve storage efficiency and fast retrieval. In order to use as little spaceas possible,all the positions were savedas though white is to move, various rotational symmetries were exploited, and least-usedpositions were deletedperiodically. Quick accessof stored values was made possibleby indexing board positions according to some important characteristics (e.g., number of pieces)and by keeping them on the tape in approximately the order in which they might occur in actual play. In order to study the effect of rote learning, the program was trained by playing against itself and against humans (including masters) and by following many book games between masters. It was noticed that rote learning is particularly useful in improving the program's play steadily during the opening and end games but not so much during the middle game, where the number of possiblemovesfrom a given position is fairly large by comparison. The program reached a better-than-average novice level, having stored over 53,000positions.Samuel pointed out a limitation of rote learning if it were to be used alone: A program would need to accumulate an estimated number of about 1 million positionsto play at the master level. He concludedthat this would be too impractical, requiring an inordinate amount of playittg time, not to speak of the storage and retrieval problems. Other learning processesare therefore needed. Learningby Generalization. Samuel experimented with two static evaluation functions: a linear polynomial and a nonlinear signature table method. As with rote learning two training methods were used: in the first, the program played either against itself or against a human, and in the secondit learned by following book moves. The LinearPolynomialApproach.Here the static value assigned to a position is obtained from the polynomial wtpt + wzpz + ' ' ' + wnpn, where the parameter p; is a numerical measure of some feature of the board and u;; is a real-valued weight indicating the worth of pi. The larger the value of the polynomial, the more attractive the position is to the player moving to it. Typical parameters used as ADV (advancement), EXCH (exchange), MOB (total mobility), and THRET (threat). THRET, for example, is defined as the number of squares to which the player whose turn it is can move a piece and in so doing threaten to capture an opponent piece on a subsequent move. Assume now that pi corresponds to THRET for ro*" i.If board position Q leads to board position R by a move as shown
91
in Figure 1, the value of piis equal to the value of THRET for R minus the value of THRET for Q. There are two decisionsto be made when designing such an evaluation function: which parameters to use and what values the weights are to take. In this casethe first of these decisions was made in part by Samuel himself. He initially selecteda set of 38 board features. It was then left to the program to choose the 16 best of these as well as the values of the associated coefficients. To begin, the program selected arbitrarily 16 parameters, pt, pz, . , prc. Two versions of the program were then created, call them X and Y, each with 16 arbitrary weight s, Ln!, trz, . . , wr6. Version X played a sequenceof games against Y. During any given game X learned by generalizing on its experienceand changed its coefficientscorrespondingly,while the coefficientsfor Y remained constant. At each move X computed two evaluations for the current board position: the static value given by the polynomial and a backed-upvalue obtained by looking ahead a few ply in the game tree. On the assumption that the secondvalue ought to be more accurate than the first, X adjusted its coefficientsin order for the static value to better match the backed-up value. Whenever X won a game, its polynomial was used by Y in the next game. If X lost a sequenceof games, it would change its coefficientsat random in order to move away from the current local optimum. This technique is sometimes referred to as hill climbing or local neighborhood search. Parameter selection proceededin conjunction with the adjustment of coefficients.Starting with the 16 arbitrarily chosen parameters, the program keeps a count of the number of times each parameter is assigned the lowest coefficient. Following each move by X, this count is incremented until, for some parameter, it exceeds 32. This parameter is then removed from the polynomial and placed at the end of a queue formed by the currently unused parameters. The first element in that queue is now added to the polynomial. similar approacheswere used to selectparameters and adjust weights in playing against humans. Samuel observedthat the rcf of parameters and their weights reacheda stable state after several games.Considerable,though slow, improvement in the quality of the program's play was obtained by this method, particularly during the middle game, where it attained a better-thanaverage level. An alternative to actual play, book learning, proved to be a more efficient approach for adjusting the coeffrcients.Approximately 250,000 different board positions together *itft the move recommendedby an expert for each of them were stored on tape. The program was then asked to produce, for each position, all possible next positions and the associatedvalues of the 16 parameters. Then the coefficientw;forevery parameter p; was obtained from (L H)I(L + H), where L is the overall number of positions for which the value of the parameter was lower than its value for the recommendedposilion and f/ is the number of times it was higher. The major drawback of the polynomial approach is its 1inear nature. Two techniques were used in one version of the program to overcomethis weakness.One was to introduce new parameters that were logical combinations of earlier ones;the other was to divide the game into six phaseseach employing an entirely different polynomial. SignatureTables.A third method for obtaini.tg a nonlinear function of the parameters was suggestedby crimtrr (b) and used by Samuel very successfully in a later version of his
92
PROGRAMS CHECKERS-PLAYING
checkers-playittgprogram. Here each of the parameters measuring a board feature is restricted to take values from a small set of integers. Typically, the parameter GUARD is 0 if both or neither of the two players have complete control of their back rows, + 1 if the player whose turn it is controls his back row while the opponent does not, and - 1 if the latter condition is reversed. An n-dimensional table is then created (conceptually) with one dimension per parameter. Entries in this table represent static evaluations corresponding to various combinations of parameter values. Thus, Lf n - 2, for example,and the two parameters are GUARD and MOB, taking values from {- 1, 0, U and {-2, 1, 0, 1,2), respectively,then the signature in table is as shown Figure 7. If for the board under consideration MOB - 1 and GUARD - 0, this correspondsto cell (1, 0) in the table. Since this is a desirable situation, a relatively high value is found in (1, 0), and this signature is assignedto the position as its static evaluation. Signature tables therefore have potential to produce a fairly accurate estimate of the worth of a position, as they expressthe various dependencies among the parameters. Their major disadvantage, if implemented as describedabove, would be their inordinate storage space and learning time requirements for any nontrivial n. Samuel dealt with these problems as follows. First, 18 of 24 chosenparameterswere restricted to the values {-1, 0, 1}, and the remaining parameterstook their values from {-2, -L,0, 1, 2). Next, d hierarchy of signature tables was constructed. In the first level of the hierarchy the parameters were divided into six subsets each containing one five-valued and three three-valued parameters. Six signature tables, one per subset, were constructed,with entries chosenfrom {-2, -1, 0, 1,2}. For each three of these tables there is one second-Ieveltable with entries from -7 to 7. The third level consistsof just one table. When a position is to be evaluated, the parameters are measuredand used to index the first-level tables. The program then moves up in the hierarchy, with values read from tables at one level giving accessto tables at the next level. Finally, the entry obtained from the single third-level table is the static evaluation of the board position under consideration. As with the polynomial evaluation function, the game was divided into six phases with a different three-level signature table set for each phase, and the program was trained by following book moves. Two cumulative totals A (agfee) and D (differ), initially set to zero,are associatedwith eachcell in the hierarchy. As before, the proglam is made to follow book games. For any given position there is a number of next positions one of which is recommendedby the book. Each of these positions correspondsto one cell in each of the three levels of the signature table hierarchy. A 1 is addedto the D totals of all
-2 -l
MOB O I 2
-r
o GUARD Figure 7.
I
such cells not representing the book move; for cells associated with the book move, however, the A count is incremented by the number of nonbook moves. Once in a while the correlation coefficient C : (A - D)16 + D) would be computedfor every cell as a measure of the goodnessof the associatedpositions as book-recommendedmoves. The value obtained for C becomes the new cell entry after being adjusted to fall in the required range for every level in the hierarchy. Book learning worked particularly well: After following 1-73,989 book moves the program was tested on 895 new positions. It was able to predict the best move recommendedby the book 387oof the time and the secondbest 267oof the time. This performance was attained using only the evaluation function. When it conducted a tree search in addition, the program's ability to follow book moves was increased substantially. The signature table method was distinctly superior to the polynomial evaluation function in improving the quality of the program's play. Samuel's work was one of the first successfulcontributions to machine learning and game playing (8-10). No program before his had reached a championship level of play in a nontrivial game of strategy. Few other game-playing programs today exhibit a better performance. It remains therefore as one of the major achievements of AI research. SimpleHeuristicsand the PhaseTable Method Additional experiments with various static evaluation functions were conductedby Griffith (5). Using the book-learning approach, he showed that a very simple evaluator is better than the linear polynomial but not as goodas signature tables in capturing checkersknowledge.This new method is basedon four checkers-related heuristics: highest priority is given to moving a king and next highest to a move along the main diagonal and into the two central squares; third priority is given to all remaining moves, except those from specified squares in the first row and those leading to jumps, which are given lowest priority. Similarly, a second static evaluation function proposedin Ref. 5 was found to be at least as good as signature tables and considerably simpler to implement. In this method the game is divided into six consecutivephases, and for each phase a table is created with 98 entries representing all legal moves in the game. When a position is to be evaluated, its goodnessis determined by the value in the appropriate table correspondingto the move that leads to it. SearchingCheckersTreesin Parallel Besidesbeing used as an experimental ground for researchon learning, the game of checkers served to test the applicability of parallel processingideas to AI. A parallel computer is one consisting of several processingunits: Given a computational task it is subdivided into subtasks each of which is assignedto a different processing unit. Such a computer is of particular use to a game-playing program, as the time required to search enormoustrees could be significantly reducedthrough parallel processing.By speedingup the search,a program can examine deepertrees in a fixed amount of time and, as a consequence, improve the quality of its play. A number of experiments with parallel algorithms for searching game trees are describedin Ref. 6. Two versions of a checkers-playing prog1am are compared each using a different parallel algorithm for tree search. The programs were tested on an experimental parallel com-
Al lN CHEMISTRY, puter. With the exception of the opening game, where all moves appear to be equally good, the results indicated that a parallel implementation of the alpha-beta algorithm was especially effective in reducing the running time as well as the total number of nodes examined and the total number of terminal nodes evaluated.
BIBLIOGRAPHY 1. P. C. Jackson, Introduction to Artificial Intelligence, Petrocelli, New York, 1974. 2. A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence,Vol. 1, Kaufmann, Los Altos, CA, 1981. 3. A. L. Samuel, SomeStudies in Machine Learning Using the Game of Checkers,in E. A. Feigenbaum and J. Feldman (eds.),Computers and Th.oughf,McGraw-Hill, New York, pp. 71-105, 1963. 4. A. L. Samuel, "Some studies in machine learning using the game of checkers.Il-Recent progress,"IBM J. Res. Deuelop.,11(6), 601-617 (November1967). 5. A. K. Griffith, "A comparison and evaluation of three machine learning procedures as applied to the game of checkers," Artif. Intell., 5, L37-148 (1974). 6. S. G. Akl and R. J. Doran, A Comparisonof Parallel Implementations of the Alpha-Beta and Scout Tree Search Algorithms using the Game of Checkers, in M. A. Bramer, €d., Computer-Game Playing: Theory and Practice, Ellis Horwood, Chichester, LJ.K., pp. 290-303, 1983. 7. W. F. Ryan, PIay Winning Checkers, Coles, Toronto, Canada, 1978. 8. B. G. Buchanan, T. M. Mitchell, R. G. Smith, and C. R. Johnson, Jr., Models of Learning Systems,in J. Belzer,A. G. Holzman and A. Kent (eds.),Encyclopediaof Computer Scienceand Technology, Vol. 11, Marcel Dekker, New York, pp. 24-51, 1978. 9. P. McCorduck, Machines Who Think, Freeman, San Francisco,pp. 149-153, 1979. 10. A. L. Samuel, AI, Where It Has Been and Where It Is Going, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1983, pp. 1L52-1157.
93
are pattern recognition (qv) techniques and learning machines. To avoid the difficult problem of defining AI exactly, this entry is limited to work that uses expert systems (qv) llogical inference (qv)1, symbolic manipulation, and naturalIanguage interpretation techniques (seeNatural-langu ageunderstanding). Within these three areas AI technologieshave made many contributions to the practice of chemistry. Work in applying AI technologiesto chemistry has recently expanded beyond the traditional academic environment. Although academia continues to developnew techniques,industry has begun to apply older, more developedtechniques to solve their problems. Vendors are using AI techniques to enhance existing products and develop new ones. Other industrial researchersare developing proprietary systemsin an attempt to gain a competitive advantage. Endeavors covered in this article can be divided into six general categories: natural and chemical langu&B€, organic synthesis planning, chemical structure elucidation, improving chemical instrumentation, symbolic algebraic manipulation, highly specificexpert systemsthat do not fall into the above categoriesand probable directions for future work. Natural-and Chemical-Language Applicationsin Chemistry
Natural-langu age applications may be divided into two classes:the "language" of chemical structures, substructures, and reactions and the methods used to convert between that "language" and English sentences.Chemical structure language requires a method for representing molecules in the computer and a syntax for manipulating those representations (3,4). This language requirement was clearly defined by researcherswho were storing chemical information in computer files. Wiswesser line notation (5,6) and its derivatives were developedto uniquely define chemical structures as a string of SsLrNr G. Axr Queen'sUniversity characters. Each character represents a specificfragment of a molecule, allowing the computer to "recognize" a molecule. An alternative approach uses graph theory, defining moleCHEMISTRY, AI IN cules as vertices and the connectionsbetween them (7). This approach creates a connection table, or matrix, whose rows Chemistry was one of the first disciplines, aside from computer and columns refer to atoms. The values stored in the matrix science,to actively engage in research on AI techniques. The describe the type of connection between atoms. Syntax rules first chemistry AI project was the DENDRAL project at Stan- can be defined for manipulating these computer representaford University. This project began in 1964, involved more tions, allowing substructures of molecules to be defined and than 50 researchers,and producedmore than 100 articles and matched. Computer representations of molecules have led to 2 books. DENDRAL began with the goal of automatic inter- chemical databases of molecular representations. Molecules pretation of mass spectral data. However, during its 23-year may be graphically entered into a computer and the database history, it has also investigated other topics, including auto- searched for "substructures." The popularity of these datamated learning (qv), computerized representation of chemical baseshas created an entire industry to fill the demand (8). structures, applications of graph theory, exhaustive chemical The seconduse of natural-language techniques is more fastructure generation, and proton and 13CNMR interpretation. miliar. Natural-language systems attempt to understand EnA detailed account of the project can be found in Refs. 1 and 2. glish sentencesthat contain chemical information. These sysDefining which computer systemsshould be classifiedas AI tems function as user-friendly interfaces to other chemical systems has remained a problem throughout AI's history. In expert systems or as intelligent interfaces to chemical datachemistry this problem is compoundedbecausemany AI appli- bases.Understanding English-phrased questions about datacations rely heavily on numerical algorithms, and many appli- bases are simpler problems than understanding problems recations that exhibit Al-like characteristics are completely nu- lating to chemistry. Commercial natural-language systems merical rather than symbolic in nature. Examples of the latter can perform some of these tasks in chemical applications, but
94
AI IN CHEMISTRY,
several systems have been developed specifically for chem-
istry. Chemical Abstracts Service is investigating automatic keyword indexing ofpapers based on a computer interpretation of the text (9-11). Other work has focused on searching the chemical literature, basedon an interpretation ofthe text (12). Chemical ReactionSynthesis Chemical reaction synthesis is one ofthe oldest applications of AI in chemistry, beginning in 1967 (13,14). These programs attempt to design a sequenceof chemical reactions that would result in a "target" molecule. This early work was based on a chemical synthetic tree. The target moleeule was decomposed into its potential precursors using every possible single-step chemical synthesis. Each precursor was further decomposed into earlier antecedents, thus creating a synthetic tree. This decomposition led to an impossibly Iarge number of potential synthetic paths. The deeper into a synthetic tree one proceeded, the more the number of potential paths multiplied. Initially, the selection of the best branch at each junction in the tree required the chemist's intervention in the program. This was necessary to limit the number of pathways and was accomplished by using interactive computer graphics to display the potential paths, and the "best" precursor was selected by the chemist (15). The program, Simulation and Evaluation of Chemical Synthesis (SECS), improved the interactive graphics by enabling the chemist to select the reaction path and broadenedthe computer knowledge base by adding stereocherrlical reactions and displays (3'16). One technique, which eliminated the need for a chemist's intervention to find a synthetic route, incorporate{ synthetic rules and heuristic programmin1 (J7). The program uses only the molecular substructures of the "target" molecule that participate in the available synthetic methods. Heuristic rules determine the correct sequence ofreactions required to protect any reactive functional groups on the target. A secondtechnique to eliminate a chemist's intervention is based on the principle of minimum chemical distance (18). This technique allows the program to eliminate synthetic routes that are not likely to be useful. Future developments for this method include the potential of predicting chemical reactions that are not known today (19). Expert systems techniques have also been applied to this problem. These methods, before further evaluation, reduce chemical reactions to more general axioms. SYNLMA (20) uses theorem-proving (qv) techniques to design organic synthesis. QED (21) applies multivalue-logic predicate calculus (qv) with axioms to select a plan to choose among the possible precursors. ChemicalStructureElucidation Structure elucidation is a prime area for AI applications because it requires both scientific expertise and problem-solving capabilities. Information on molecular formula and structural fragments generally comes from spectral interpretation but from any source at the chemist's disposal. Internal ".r, "o-" checks combine data from multiple sourcesto reconsistency solve conflicting information about the presenceor absenceof particular fragments. Enumeration programs conneet the remaining fragments to obtain all chemically possible molecules. Those structures are ranked based on such properties as comparison of predicted and observed spectra and steric
stresses.The chemist must then devise a way to distinguish between the remaining candidate structures. Determinationof Structural Fragments.There are three approaches to identifying compounds using spectroscopicdata. The oldest method, library searching, comparesthe unknown spectrum with a collection of known spectra. This is straightforward, but it becomesimpractical as the size of the reference libraries increases.In addition, library searching cannot identify compounds not in the library, such as newly synthesized compounds. The second approach, pattern recognition, compares the unknown spectrum with "patterns" that are characteristic of classesof compounds.This solves the two problems library searching presents but requires a substantial number of spectra for each class of compounds to be recogntzed.AI avoids these problems by interpreting spectra using the rules a spectroscopistwould use. AI techniques have an advantage over spectroscopistsbecausethe computer system doesnot forget or confuse information. Unfortunately, AI systems do not have all the knowledge known to the spectroscopist.AI structure elucidation systems are comparable in performance to a postgraduate spectroscopist(22). Below, the approachestaken to interpret the various types of spectral data are described. Infrared Spectroscopy.Spectroscopistshave known for some time that certain functional groups and substitution patterns have characteristic absorptions in the ir. These patterns are documented in standard Colthup Charts. Early work attempted to computenze these tables to automatically interpret ir spectra (23,24). These programs must be able to deal with the following problems: Many functional gloups absorb at each frequency; functional groups can cause more than one absorption; and the solvents used can shift or mask peaks. Recent work Qil focusedon reducing the task of codifying the rules required to identify the functional groups of interest. This approach extracts rules automatically from the spectraof known compounds. Mass Specfroscopy. The major work on MS interpretation was done through the DENDRAL project. The first step in interpreting ms data was to determine, basedon known masscharge ratios, the probable molecular formula of an ion. The rules determining how a molecule will fragment were also codified. Using these rules, the mass spectrum of a candidate structure could be predicted. These spectra are compared to the unknown's mass spectrum to determine the likelihood that the candidate structure was the correct one. Later work from the DENDRAL project automatically determined molecular fragmentation rules using the mass spectra of known compounds.This work was called meta-DENDRAL (26). A recent instrumental development is ms/ms which takes the ms of each peak in the original ms. Determination of the structure of each fragment from the original ms provides a unique way to preform internal consistency checking using the data from only one spectrometer (27). Nuclear Magnetic Resonance. Work in nmr has included both 1H and 13Canalysis. Proton nmr is similar to ir in that functional groups tend to absorb in certain regions of the spectrum. The ranges of possibte absorptions for each functional group makes lH nmr more suitable for eliminating functional groups determined from another sourcethan for generating a iirt of fragments (28). 13Cnmr, a more recent development,is very sensitive to the environment of the resonating carbon. This sensitivity causes every structurally unique carbon to resonate at a different frequency. The structural equivalence extends two or three bonds in all directions. Structurally
AI IN CHEMISTRY,
equivalent carbons in different molecules absorb at similar frequencies. 13Cnmr is, therefore, very good at generating a list of molecular fragments that are present. Work has been done to determine 13Cnmr interpretation rules (29); however, the general method used is a library search. The library, in this case,is composedof molecular fragments and the characteristic absorption (30). Becausethe characteristic absorptions overlap, a list of possible functional groups is generated for each nmr peak. X-Ray Powder Diffraction. Peak heights in x-ray powder diffraction are proportional to concentration, and therefore quantitative analysis is possible. Variations in relative peak heights from laboratory to laboratory and day to day preclude the use of simple least-squares fitting of reference spectra to the unknown. An expert system was developedto use the same knowledge that a mineralogist uses to solve this problem (31,32).This work was repeated using several different expert system development tools (EXPERT, UNITS, EMYCIN, OPS5) and LISP. The study concluded that all the development tools and LISP had their shortcomings. The convenienceof the expert system development tools can lead to restrictions that prevent the program from completely solving the problem. LISP, on the other hand, requires a great deal of progfamming effort. Fortunately, the knowledge base is generally easier to translate from one system to another than it is to extract from the expert. Internal ConsistencyChecks.Internal consistency checks eliminate fragments that are inconsistent with the other fragments present. This elimination processhelps to reduce the combinatorial explosion of possiblestructures. One of the most effective ways to provide this check is to combine information from different sources.As an example, one possiblesourceof a 13Cnmr peak involves a carbonyl carbon, but there is no carbonyl absorption in the ir spectrum. These cross-checkscan also come from different peaks in the same spectrum. Conversely, consistency checks can also increase the confidencein the presence of a fragment if more than one source of data suggestsits presence. techStructureEnumeration.Oncethe various spectroscopic niques have generated a list of molecular fragments, they must be connectedto form possiblestructures for the unknown compound. The generation of possible structures must be exhaustive and nonredundant. Several problems arise during this procedure.The most serious problem is combinatorial explosion. Sometimes, there may be millions of possible structures. Enumeration programs must be able to eliminate chemically impossible structures and allow the chemist to eliminate other highly unlikely structures. Elimination is usually done at each step of a depth-first search to "prune" the search tree when possible. Another problem is that the list of fragments may not be complete and/or may contain ambiguities. The programs must handle multiple possible starting points and recognizewhen and where the assembledfragments begin to overlap. Additionally, the input fragments themselves may overlap one another (33). The enumeration program must also recognizethe existence of stereoisomersand be able to distinguish between stereoisomers(34-36). Rankingof CandidateStructures.There are two methods of ranking candidate structures. The first method compares the unknown's spectrum with a predicted spectrum for the candidate structure. This is the approach taken by DENDRAL by
using mass spectra.The secondmethod, which is generally not as useful, discriminates against structures that are highly strained, such as three-membered rings. In the end, the chemist must devise physical tests to distinguish between remaining candidates. Below is a comparison of packages developed to address various portions of the structure elucidation process. DENDRAL: Uses mass spectral data and 13Cnmr data. All other constraints on structures must be deduced by the chemist. It excels in the structure generation process,handling stereoisomers and overlapping substructures (37). The candidate-testing procedure is built into the program (L,2). CHEMICS: usesmass spectral, tH and 13Cnmr, and ir data. It is a fully integrated package written in FORTRAN. It is limited to moleculescontaining only carbon,hydrog€tr,and oxygen. The structure generator cannot handle overlapping substructures (38- 42). CASE: Uses 13Cnmr and ir data. It is written in FORTRAN and cannot handle overlapping substructures (43-46). SEAC: Uses ir, 1H nmr, and uv data. The structure generator cannot handle overlapping substructures (47,22,4L). STREC: Uses mass spectral, nmr, ir, and uv data. The program is written in FORTRAN and cannot handle overlapping substructures (48). B. Curry: Uses mass spectral, ir, and uv data. The package concentrateson the internal consistencycheck and conflict resolution (49). PAIRS: Is strictly an ir interpretation program, its strength being the ability to easily add new rules (25,32,50-55). EXMAT: Uses ir and mass spectral data. Its strength is the ability to design the entire analysis and use other chemometric techniques to solve parts of the problem (56).
lmprovingChemicalInstrumentation Infrared Spectroscopy.The ir interpretation program PAIRS has been incorporated into at least one vendor's Fourier transform ir spectrometer(54). This gives the spectroscopicstructural information for those spectra not found in the limited spectral library available on the spectrometer. The program is also being made available through QCPE (55). Mass Spectroscopy.One of the most complex mass spectrometers today is the triple quadrupole mass spectrometer. The spectrometer is completely computerizedwith more than 30 controllable parameters. Tuning the spectrometer for an optimum signal requires a high level of operator expertise. The MS signal must be maximized, and the peak shape must be evaluated. An expert system was developedusing the KEE expert system development facility to automatically tune the instrument (57,32). The expert system is capable of outperformin g a simplex optimization but is not quite as good as a competent operator. Chromatography.Expert Chromatography Assistance Team (ECAT) is an expert system designedto aid chemists in developing liquid chromatography methods (32,58). Liquid chromatography design involves analyzirg, optimizing, and troubleshooting a particular separation. The expert system includes general chromatographic knowledge, specific litera-
AI IN CHEMISTRY,
ture references,an experiment designer, and chromatography data analysis. Ultracentrifugation. Ultracentrifugation is a technique for separating biological samplesby their density. Most researchers consider it simply a tool and are not interested in the intricacies of the separation process.The expert system SpinPro questions the user on their research goals and recommends the optimum set of operating conditions (59). The operating parameters include rotor type, run speed, run time, gradient material, and gradient concentration. SpinPro also recommendsthe best set of conditions using only equipment available in the researcher's lab and details the results of those compromises.This expert system can be run on an IBM personal computer. ProcessControl. The monitoring and control of chemical processsystems is a new area for expert systems.Control systems are directly connectedto instruments that monitor the temperatures, pressures, concentrations, and other variables of the process equipment. These measurements are used to predict the development of the chemical processand, if problems are discovered,modify the controlling instrumentation to correct the process.These control systems are specificto each process and require extensive measurements of the chemical system and knowledge of processbehavior when the controlling instrumentation is varied (60,61). ComputerAlgebraApplications The fundamental theories in chemistry can be described by mathematical equations, which can be quite complex. Many chemical problems can be solved numerically using these equations (62,63).The solution of these equations has, however, been greatly simplified by the development of symbolic algebra packages (64). These packages solve complex equations analytically instead of using numerical approximations. As early as 1954 symbolic algebra techniques were applied to problems in quantum chemistry (65). However' only recently have symbolic algebra programs becomepopular (66). This delay was caused by the high cost of computers powerful enough to run the software and the availability of commercial packageswith full user support. The applications of symbolic algebra have becomeso diverse that an entire symposium was devotedto the subject at the August 1984 American Chemical Society national meeting. Twenty-two papers were presented at the symposium (67). There are five commercially available packages that perform algebraic, calculus, and differential manipulations. They are MACSYMA (68,69),also available through EDUNET and ARPANET; REDUCE (70);MAPLE (7r); SMP (72),written in C for speedof execution; and muMath (66,73),a microcomputer version. A useful feature of several of these packages is their ability to translate their answers into FORTRAN code, which can be used by numerical programs. Miscellaneous Computer-Aided Education.A chemical tutor called GEORGE has been developedg4). GEORGE is an expert system being developedto understand dimensional analysis problems dealing with basic chemistry. The program knows how to manipulate the dimensions of physical properties (such as
moles, density, concentration) and conversionfactors to different units of measure. Using these, the program can solve a wide variety of problems. The unique feature of this program is that the student specifiesthe problem, and GEORGE explains the solution with text and diagrams. Formulationof Agricultural Chemicals.Biologically active chemicals must be combined with various other chemicals to make a commercial product with the desired application characteristics. An expert system developedto assist in this processtakes into account cost, marketing, legal, chemical, and end-useconsiderationsin determining the "best" formulation (75). The program also uses several FORTRAN programs that calculate chemical parameters necessaryto the decision-making process. Analysisof Water Chemistryin SteamPower Plants.Corrosion becauseof improper water and steam chemistry is a major cause of downtime at steam power plants. This corrosion can cost a company approximately $1 million (106) per day. An expert system has been developedthat receivesdata from both the operator and remote chemical sensors and recommends corrective measures, if necessary, or the likely result if no action is taken (76,77). ExperimentalDesign. Deciding what experiments are required to answer a particular question is a pervasive problem in chemistry. Several expert systems have been developedto solve the following problems: determination of intracellular Mgr* levels, derivin g enzymekinetic models to fit experir4ental data, design of experiments to determine safety and efficacy of drugs (?8), determination of the number of analyses (including blanks) that must be done for environmental water analysis (79), and design of molecular geneticscloning experiments (80). MacromolecularStructureDetermination.A recently developedexpert system allows the construction of moleculesusing heuristic rules. The system creates a three-dimensional protein based on the protein amino acid sequence(81). This system includes heuristic rules that determine when the protein sequenceturns on itself and for determining which sequences form alpha- and beta-sheets(82). Similarly, another progIaffi, Artificial Intetligence in Model Building (AIMB)' has been written for creating three-dimensional molecular models.This program can construct the three-dimensional model of a molecule from a two-dimensional drawing faster than a chemist can using mechanical models (83). FutureApplications Computer software is dramatically increasing its penetration into the chemist's laboratory. The volume and sophistication of software is exceeding the chemist's desire and ability to keep curyent. However, the integration of numerical software, graphical displays, and expert systems promises to revolutionize the practice of chemistry. Expert systemswill build on the vast library of existing chemical software and make these technologiesavailable to the practicing chemist (84). Integration of these techniques will lead to "intelligent computer assistants" for every chemist. There will be structure elucidation assistants for analytical chemists, process control assistants for chemical engineers (85), experimental design assistants for organic chemists, and mathematical as-
CHEMISTRY,AI IN
sistants for physical chemists. However, before these assistants can be built, each different rule-base must be further developed. This collation and development of chemical relationships, expressible as heuristic rules, is underway today in both academia and industry. It is thought that the intelligent application of these rules through expert systems can reduce problems that lead to combinatorial explosions of possible solutions. Problem simplification of this type will be necessary before the huge amount of chemical information that exists today can be integrated into intelligent assistants. BIBLIOGRAPHY 1. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project, McGraw-Hill, New York, 1980. 2. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, Applications of Artificial Intelligence for Chemical Inference: The DENDRAL Project, NlcGraw-Hill, New York, 1980. 3. W. T. Wipke, S. R. Heller, R. J. Feldman, and E. Hyde (eds.), Computer Representation and Manipulation of Chemical Information, Wiley, New York, pp. I47-L74, 1974. 4. H. W. Whitlock, "An Organic Chemists View of Formal Language," in T. W. Wipke and J. Howe (eds.), Computer Assisted Organic Synthesis, ACS Symposium Series 61, American Chemical Society, Washington, DC, 1977. 5. W. J. Wiswesser, A Line-Formula Chemical Notation, Thomas Y. Crowell Comp&Dy, New York, L954. 6. E. G. Smith , The Wiswesser Line-Formula McGraw-Hill, New York, 1968.
Chemical Notation,
7. S. H. Bertz, W. C. Herndon, and G. Dabbagh, On the Similarity of Graphs and Molecules, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society, Washington DC, 1985. 8. J. E. Gordon and J. C. Brockwell, "Chemical Inference," J. Chem. Inf. Comput. Sci. 23, II7 (1983). 9. M. Moureau, A. Girard, and J. Delaunay, "Natural language bibliographic searches. PRETEXT program," Reu. Inst. Fr. Petrole Ann. Combust. Liq. 25(L0), 1117-1143 (1970). 10. S. M. Cohen, D. L. Dayton, and R. Salvador, "Experimental algorithmic generation of articulated index entries from natural language phrases at Chemical Abstracts Service ," J. Chem. Irf. Comput. Sci. 16(2), 93-99 (1976). 11. K. H. Baser, S. M. Cohen, D. L. Dayton, and P. B. Watkins, "Online indexing experiment at Chemical Abstracts Service: Algorithmic generation of articulated index entries from natural language phrases," J. Chem. Inf. Comput. Sci. lB(1), 18-25 (1978). .r2. P. J. Smith, D. A. Krawczak, and S. Shute, EP-X: A KnowledgeBased System to Aid in Bibliographic Searches of the Environmental Pollution Literature, Artificial Intelligence Applications in Chemistry, American Chemical Society Meeting, Chicago, IL, Sept., 1985. 13. E. J. Corey, "General methods for the construction of complex molecules," Pure Appl. Chem. 14, 19 (lgGZ). 14- E. J. Corey and T. W. Wipke, "Computer-assisted design of complex organic synthesis," ,SclencelGG, 128 (1969). 15. E. J. Corey, w. T. Wipke, R. D. Cramer, III, "Computer-assisted synthetic analysis", J. Am. chem. soc. g4(z), 4zL (Lg72); E. J. Corey, A. K. Long, and S. D. Rubenstein, "Computer-assisted analysis in organic synthesis," Science 2281 408 (1985). 16. T. Wipke and T. Dyott, "simulation and evaluation of chemical synthesis," J. Am. Chem. Soc. gG(15), 4B2S (LgT4).
97
17. P. E. Blower,Jr. and H. W. Whitlock, Jr., "An applicationof artificial intelligence to organic synthesis,"J. Am. Chem. Soc.,98(6), 1499-1510(1976). 18. C. Jochuffi, J. Gastiger, and I. Ugi, "The principle of minimum chemical distance,"Angew. Chem.Int. Ed., 19, 495 (1980). 19. J. Gasteiger,M. G. Hutchings, P. Low, and H. Saller, The Acquisition and Representationof Knowledge for Expert Systems in Organic Chemistry, Artificial IntelligenceApplications in Chemistry, ACS SymposiumSeries 306, American Chemical Society,Washington, DC, 1985. 20. T. Wang, I. Burnstein, S. Ehrlich, M. Evens, A. Gough, and P. Johnson, Using a Theorem Prover in the Design of Organic Synthesis,Artificial IntelligenceApplications in Chemistry,ACS Sy-posium Series306, American Chemical Society,Washington,DC, 1985. 2L. D. P. Dolata, QED Automated Inference in Planning Organic Synthesis, Ph. D. Thesis, University of California, Santa Cruz, 1984. 22. Z. Hippe, "Problems in the application of AI in analytical chemist y," Anal. Chim. Acta, 150, LL-?I (1983);T. Monmaney,"Complex window on life's most basic molecules,"Smithsonian, ll4 (July 1985). 23. B. Schradeet al., "Automatic reduction and evaluation of IR and Raman spectra:' F. Z. (Fresenius'Zeitschift) Anal. Chem., 303, 337-348 (1980). 24. H. B. Woodruff and M. E. Munk, "Computer-assistedinterpretation of IR spectra,"Anal. Chim. Acta 95, 13-23 (1977). 25. S. A. Tomellini, R. A. Hartwick, J. M. Stevenson,and H. B. Woodruff, "Automated rule generation for PAIRS," Anal. Chim. Acta, t62, 227-240 (1984). 26. B. G. Buchanan,D. H. Smith, w. C. white, R. J. Gritter, E. A. Feigenbaum,J. Lederberg,and C. Djerassi,Applications of artificial intelligence for chemical inference.22. Automatic rule formation in mass spectrometry by means of the meta-DENDRAL program," J. Am. chem. Soc.,98(20),6168-61?8 (19?6);D. Lindsay et al., Applications af Artificial Intelligence in Organic Chemistry: The Dendral Projecf, McGraw-Hill, New York, 1981. 27. K. P. cross, A. B. Giordani, H. R. Gregg, P. A. Hoffmann, c. F. Beckner, and C. G. Enke, "Automation of structure elucidation from mass spectrometry-mass spectrometry data," Artifi,cial IntelligenceApplications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. 28. H. Egli, D. H. Smith, and C. Djerassi,"Computer assistedstructural interpretation of proton NMR spectral data," Helu. Chim. Acta, 65, 1898-1919(1982). 29. T. M. Mitchell and G. M. Schwenzer,"Applications of artificial intelligence for chemical inference.XXV. A computer program for automated empirical t3C NMR rule formatior," org. Magn. Reson.,11(8),378-384 (1978). 30. M. R. Lindley, N.A.B. Gray, D. H. Smith, and C. Djerassi,"Applications of AI for chemical inference 40. Computerizedapproachto verification of r3CNMR spectralassignments,"J. Org. Chem.,47, L027_1035 ( 1982). 31. S. P. Ennis, Expert Systems,A {Jser'sPerspectiveof Some Current Tools, Proceedingsof the SecondNational Conferenceon AI, Pittsburgh, PA, pp. 319-321, 1982. 32. R. E. Dessy (ed.;, "Expert systems part II,,, Anal. Chem., 56(L2), 1312A-1332A(1984). 33. R. E. carhart, D. H. smith, N. A. B. Gray, J. G. Nourse, and c. Djerassi, "GENOA: A computer program for structure elucidation utilizing overlapping and alternative substructures," J . Org. Chem.,46, t708-1718 (1981). 34. J. G. Nourse, R. E. Carhart, D. H. Smith, and C. Djerassi, "Exhaustive generation of stereoisomersfor structure elucidation," J. Am. Chem.Soc.,l0l, L2I6-I228 (lg7g). 35. J. G. Nourse, "The configuration symmetry group and its applica-
CHEMISTRY, AI IN tion to stereoisomergeneration, specification,and enumeration," J. Am. Chem.Soc.,l0l(5), I2I0 (1979). 36. J. G. Nourse, D. H. Smith, R. E. Carhart, and C. Djerassi,"Computer assisted elucidation of molecular structure with stereochemistry," J. Am. Chem. Soc., t0z,6289-6295 (1980). 37. GENOA, Molecular Design Ltd., Hayward, CA, 1982. 38. I. Fujiwara, T. Okuyams, T. Yamasaki, H. Abe, and S. Sasaki, "Computer-aided structure elucidation of organic compounds with the CHEMICS system," AnaI. Chem. Acta, 133, 527-533 (1 9 8 1 ) . 39. S. Sasaki,H. Abe, I. Fujiwara, and T. Yamasaki, "The application of 13CNMR in CHEMICS, the computer program systemfor structure elucidation," Stud. Theor. Chem., 16, 186-204 (1981). 40. S. Sasaki et al., "CHEMICS-F: A computer program system for structure elucidation of organic compounds,"J. Chem. Inf. and Comp.Scl., 18(4),2II (1978). 4I. S. Sasaki,H. Abe, I. Fujiwara,T. Yamasaki,Z.Hipp€, B.Debska, J. Duliban, and B. Guzowska-Swider,"Recent problemsof application of artificial intelligence in computer-aided elucidation of chemical structures," Chem. Anal. (Warsaw), 27(3-4),, 171-181 (1982). 42. H. Abe, T. Yamasaki, I. Fujiwara, and S. Sasaki, "Computer aided structure elucidation methods," Anal. Chim. Acta, 133, 49995006 (1981). 43. M. E. Munk, C. A. Shelley,H. B. Woodruff, M. O. Trulson, "Computer assistedstructure elucidation,"F. Z. Anal. Chem.,3l3, 473479 (1982). 44. C. A. Shelley and M. E. Munk, "CASE, a computer model of the structure elucidation process,"Anal. Chim. Acta, 133, 507-516 (1 9 8 1 ) . 45. M. O. Trulson and M. E. Munk, "Table driven procedure for IR spectrum interpretation," AnaI. Chem., 56, 2137-2142 (1983). 46. A. H. Lipcus and M. E. Munk, "Combinatorial problems in computer assistedstructural interpretation of C-13 NMR spectra," J. Chem.Inf. Comput. 9ci.,25,34-45 (1985). 47. B. Debska,J. Duliban, B. Guzowska-Swider,and Z. Hippe, "Computer aided structural analysis of organic compoundsby an AI system,"Anal. Chim. Acta, 133, 303-318 (1981). 48. L. A. Gribov, M. E. Elyashberg, and V. V. Serov, "Computer system for structure recognition of polyatomic molecules by IR, NMR, UV, and MS methods,"Anal. Chim. Acta, 95, 75-96 (1e77). 49. B. Curry and J. A. Michnowrcz, An Expert System for Organic Structure Determination , Artifi,cial Intelligence Applications in Chemistry, ACS SymposiumSeries 306, American Chemical Society, Washington, DC, 1985. 50. G. M. Smith and H. B. Woodruff, "Development of a computer language and compiler for expressingthe rules of IR spectral interpretation," J. Chem. Inf. Comput. Sci. 24, 33-39 (1984). 51. S. A. Tomellini, J. M. Stevenson,and H. B. Woodruff, "Rules for computerized interpretation of vapor phase IR spectra," Anal. Chem., 56, 67 -70 (1984). 52. H. B. Woodruffand G. M. Smith, "Generating rules for PAIRS-A computerized IR spectral interpreter," Anal. Chim. Acta, 133' 545-553 ( 1981). 53' H' B' woodruffand G' M' smith"'computer program for the analysis of IR spectra,"Anal. Chem., 52, 2321-2327 (1980). 54. H. B. Woodruff et al., "Automated interpretation of IR spectra with an instrument basedminicomputer," Anal. Chem.,53r 23672369 (1981). b5. H. B. Woodruff and G. M. Smith, "Program for the analysis of ir spectra GAIRS) (QCPE 426)," QCPE Bull. 1, 58 (1981). 56. S. A. Liebmar, P.J. Duff, M. A. Schroeder,R. A. Fifer, and A- M. Harper, ConcertedOrganic Analysis of Materials and Expert Sys-
tem Development,Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 57. C. Wong and S. Lanni^g, "AI in chemical analysis," Energ. Technol. Reu., Lawrence Livermore National Laboratory, Berkeley, CA, February 1984. 58. J. Karnicky, R. Bach, and S. Abbott, An Expert System for High PerformanceLiquid Chromatography Methods Development, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, Washington, DC, 1985. 59. P. R. Martz and M. Heffron and O. M. Griffith, An Expert System for Optimizing Ultracentrifugation Runs, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 60. R. L. Moore, C. G. Knickerbocker, and L. B. Hawkinson, A Real-Time Expert System for Process Control , Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 61. E. A. Scarl, J. R. Jamieson,and C. I. Delaune, "Processmonitoring and fault location at the Kennedy Space Center," SIGART Newlett.,93, 38 (1985). 62. Quantum Chemistry Program Exchange (QCPE), Dept. of Chemistry, Indiana University, Bloomington, IN 47405 812-3354784. 63. S. A. Borman, "scientific software," Anal. Chem., 57(9), 983A ( 1985). 64. J. F. Ogilvie, "Applicationsof computeralgebra in physical chemistry," Comput.Chem.,6, 169-I72 (1982). 65. S. F. Boys, B. G. Cook, C. M. Reeves,and I. Shavitt, Notu,re,178' 1207-1209 (1956). 66. C. S. Johnson, "Computer algebra in chemistry," J. Chem. Inf. Compuf.Sci. 23, 151-L57 (1983). 67. R. Pavelle (ed.),"Application of Computer Algebra," Kluwer, Boston, MA, 1985. 68. MACSYMA ReferenceManual, MIT Mathlab Group, Cambridge, MA, 1977. 69. MACSYMA Primer, MIT Mathlab Group, Cambridge, MA, 1982. 70. A. C. Hearn (ed.),REDUCE User's Manual, Version 3.0, Rand Publication CP78(4/83),The Rand Corp.,Santa Monica,CA, 1983. 7L. K. O. Geddes,G. H. Gonnet, and B. W. Char, MAPLE User's Manual, Znd. ed., University of Waterloo, Waterloo, Ontario, Canada. 72. C. A. Cole,S. Wolfram, et al., SMP Handbook,Caltech,Pasedena, cA, 1981. 73. G. Williams, "Mu Math-?9 Symbolic Math System," BYTE 11' 325-338 (1980);and D. Stoutemyer, "A Preview of the Next IBMPC Version of Mu Math," in G. Goos and J. Hartmanis (eds.), Eurocal'85, Springer-Verlag,New York, 1985. 74. R. Cornelius, D. Cabrol, and C. Cachet,Applying the Techniques of Artificial Intelligence to Chemical Education, Artifi.cial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. 75. B. Hohne and R. Houghton, An Expert System for the Formulation of Agricultural Chemicals,Arf ifi.cialIntelligenceApplications in Chemistry,ACS SymposiumSeries 306, 1985. 76. J. C. Bellows, "An artificial intelligence chemistry diagnostic system, Proc.45th Int. Water Conf.Eng. Soc.,Western PA, pp. 15-25' 1984. 77. J. C. Bellows, Chemistry Diagnostic System for Steam Power Plants, Artificiat Intelligence Applications in Chemistry, ACS SymposiumSeries306, 1985. 78. D. Garfinkel, L. Garfinkel, V. W. Soo,and C. A. Kulikowski, Interpretation and Design of Chemically Based Experiments with Expert Systems, Artificial Intelligence Applications in Chem,istry, ACS SymposiumSeries306, 1985.
CHURCH'STHESIS 79. H. L. Keith andJ. D. Stuart, A Rule Induction Program for Quality Assurance-Quality Control and Selection of Protective Materials", Artificial IntelligenceApplications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. of Symbolic Computation 80. P. Friedland, MolcEN-Applications and Artificial Intelligence to Molecular Biology, in M. Keenberg (ed.), Proceedingsof the Battelle Conferenceon Genetic Engineering, Vol. 5, Battelle Seminar Studies Progr&ffi, Seattle, WA, pp. 171-182,1981. 81. F. E. Cohen,R. M. Abarbanel, I. D. Kuntz, and R. J. Fletterick, "secondary structure assignment for alpha/beta proteins by a combinitorial approach,"Biochem. 22, 4894 (1983). 82. I. D. Kuntz, private communication,Univ. of Cal., San Francisco, 1985. 83. W. Wipke and M. A. Hahn, Analogy and Intelligence in Model Building, Artifi,cial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1995. 84. R. Banares-Alcantara, A. W. Westerberg, and M. D. Rychener, "Development of an expert system for physical property predictions," Comput. Chem.Eng., 9(2), L27-L42 (1985). 85. K. Brooks, Chem. Wee&,38-39 (Sept. 10, 1986). B. HouNE and T. Pmncn Rohm and Haas Co.
4.5 CHESS Chess 4.5 is a chessprogram (see Computer chessmethods) that uses a method called "iterative deepening" to determine its next move. It is a brute force method that doesexhaustive search, first to the secondlevel and then redone to the third level. It continues on with this iteration until a fixed time limit is reached.The newer version is known as Chess4.7 (see D. Slate and L. Atkin, Chess4.5: The Northwestern University ChessProgram, in P. Frey (ed.),ChessSkill in Man and Machitu€,"Springer-Verlag,New York, pp. 82-118, L977). J. RospNBERG SUNY at Buffalo
CHURCH'STHESIS Church's thesis is the assertion that any processthat is effective or algorithmic in nature defines a mathematical function belonging to a specific well-defined class, known variously as the recursive, the L-definable,or the Turing computable functions. These terms originated in the 1930s to designate what appeared superficially to be three quite different notions: Gridel's characterization of functions definable by means of recursive definitions of the most general kind (see Recursion), Church and Kleene's notion of functions definable using the ).operation (subsequently incorporated by John McCarthy into the LISP programming language (seeLISP)), and Turing's notion of a function computable by an abstract computing device (seeTuring machine). However, it was very soonseenthat the three notions define the very same class of functions. Church announcedhis proposal to identify the class of functions definable by means of an effective processwith the class of recursive functions (1), in April 1935 at a professional meeting, &
year after he had first suggested it to his student Kleene. Quite independently, Turing developed his own equivalent version during the spring of 1935.Godel,who had been skeptical of Church's arguments in favor of his thesis, was fully convinced by Turing's work. Gcidethad made use of a more restricted class of functions, later called primitive recursive, in his famous work on undecidability. The fact that there were functions like Ackermann's that were clearly definable by recursive means but were not primitive recursive led Gridel to attempt to charactenze recursive definitions in general. Using a suggestion of Jacques Herbrand, G6del was led to his class of general recursive functions. Godel went so far as to suggest,in lectures at the Institute for Advanced Study in Princeton in L934, that if this definition really included all possible recursive definitions, then all functions computable by "finite procedures" would be general recursive, but he was not yet prepared to assert that his definition was really so inclusive. Meanwhile, Church and his students developed the concept of ),-definability as part of an effort to salvage a theory of the ),-operator from an ambitious system of logic developed by Church that had been proved inconsistent by his students Kleene and Rosser.Turing developed his machines in connection with his work on Hilbert's Entscheidungsproblem,the problem of finding an algorithm for testing inferencesin first-order logic (qv) for validity: Turing was able to show that no such algorithm could exist, a conclusion that Church also reached. Post, who had developed some of these ideas many years earlier, now proposeda formulation very similar to Turing's. Post's work was independent of Turing, but not of Church. When these various conceptswere proved equivalent to one another, it was clear that something of great importance had been discovered.(For a discussionand analysis of this history as well as the importance of the work of Kleene and of Post, see Ref. 2, which also contains referencesto the original literature and to other historical accounts.) Church'sThesisand Al Church's thesis has made it possible to prove the algorithmic unsolvability of important problems in mathematics, provided an important basic tool in mathematical logic, made available an array of models in theoretical computer science,and provided the basis for an entirely new branch of mathematics. However, quite apart from all this, Church's thesis provides a crucial philosophical foundation for the proposition that AI is possible and that digital computers provide an appropriate instrument for realizing it. Before World War II large-scale computing machines were conceived and built as engines of numerical calculation. After the pioneering work of Church, Gcidel,Kleene, Post, and especially Turing, it becameclear that the notion of computation includes far more than numerical calculation; indeed, it encompasseseverything expressibleas an effective processor (as Turing onceput it) by a "rule of thumb." This insight is physically embodied in the von Neumann architecture for computers. Thus, the project of producing computer programs that successfully emulate human cognitive functions (see Cognition), which seems evidently preposterous so long as a computing machine is conceived of as merely a device for carrying out numerical calculations, comesinto focus as an ultimate goal. Of course, this is precisely the goal of AI research.These same consider-
l OO
CIRCUMSCRIPTION
ations lead to the proposalthat computer programs provide an appropriate theoretical model for cognitive functions, which is the principle paradigm held forth by workers in cognitive science (3) (qv).
and specificallyin nonmonotonicreasoning(qv). McCarthy describes circumscription as a "rule of conjecture" as to what objectshave a given property P. A useful example exploited by McCarthy is the familiar "missionaries and cannibals" puzzle: Three missionaries and three cannibals must crossa river using a boat that can hold only two persons; if the cannibals Church'sThesisand Mechanism outnumber the missionaries on either bank of the river, the The belief that AI is possible,in principle, is closelyassoci- missionaries will be eaten. How can the crossing be arranged ated with a mechanist view of the human mind. Such a view safely? Now, there are numerous features of interest in the holds that the properties of "mind" are ultimately to be under- puzzle. The one of concern here is that it is in fact a puzzle, stood on the basis of the behavior of the brain (and other rele- that is, the puzzler is expected to recognize certain implicit vant organs) as a material object obeying the laws of nature. ground rules, such as that the boat doesnot have a leak or any This is opposedto a mentalist position that mental states are other incapacity for transporting people. Moreover, there are irreducible and extramaterial. Church's thesis is related to no additional cannibals or missionaries lurking in the backthese matters in several ways. Turing's version of Church's ground, who may upset otherwise sound plans, even though it thesis (called Turing's thesis in Ref. 3) identifies effectiveness was not specifically stated that there are only three cannibals with mechanical computability. Thus, Church's thesis implies and three missionaries.It is as if there is an implicit assump(as emphasizedin Ref. 4) that mechanism is incapable of being tion that if something is not mentioned in the puzzle,then it is refuted by effective means. That is, a mentalist who wishes to not to be considered,an idea sometimesreferred to as a closedclaim that someparticular human mental activity is incapable world assumption. It correspondsto minimizing the number of of being duplicated on a purely mechanical basis had best be objectshaving certain properties. In effect, one is considering very sure that this activity is uneffective. On the other hand, conjecturesthat for certain properties P, an object r does not evidenceof the extensivenessof mechanical computability, for have P unless it is required to do so. Moreover, this sort of example, Turing's construction of a "universal" machine as minimizing assumption appearsto be very useful even in nonwell as the equivalence of the various precise explications of puzzle situations. Circumscription provides one way to make effectiveness,tends to refute a mentalist critique basedon the this rather vague idea precise. alleged limitations of the purely mechanical. Even the main negative consequenceof Church's thesis, the existence of algorithmically unsolvable problehs, servesto help refute men- The Formalismof Circumscription talism. The mentalist has traditionally ridiculed the claims of Circumscription involves the use of an axiom schemain a firstmechanists by contrasting the varied, unpredictable, and com- order language (see Logic), intended to express the idea that plex behavior of human beings with the rigid and simple be- certain formulas (wffs) have the smallest possible extensions havior of clockwork automata. However, the fact that there consistent with certain given axioms. To illustrate, if B is a are problems concerning Turing machines for which no effec- belief system (qv) including world knowledge W and specific tive solutions can exist shows that computing mechanisms as domain knowledge (qr) AlPl concerning a predicate P, then it well as people can exhibit a behavioral repertoire of great may be desired to consider that P is to be minimized, in the complexity and unpredictability. sensethat as few entities .r as possible have property P as is consistent with AlPl. The world knowledge W together with AlPl and the circumscriptive schema are used to derive conBIBLIOGRAPHY clusions in standard first-order logic, which then may be added to B (hopefully consistently and appropriately). It is this no1. A. Church,"An unsolvableproblemof elementarynumbertheory," tion of consistencywith a part of the belief system itself that Am. J. Math. 58, 345(1936). causesconceptual as well as computational problems in non2. M. Davis, "Why Godel didn't have Church's thesis l' Inf.Contrl.,54, monotonic reasonirg, essentially problems of self-reference. 3-24 (1982). McCarthy has found a very ingenious way of finessing such 3. W. Pylyshyn, Computation and Cognition: Toward a Foundation self-referencein the context of minimization, allowing a mefor Cognitiue Science. MIT Press, Cambridge, MA, 1984. chanical means of establishing the effect of consistencytests in 4. J. C. Webb, Mechanism, Mentalism, and Metamathematics, D. certain cases. Reidel, Dordrecht, the Netherlands, 1980. As suggestedabove, given a predicate symbol P and a formula AlPl containing P, the minimization of P by A [P] can be General References thought of as saying that the P objectsconsist of certain ones A. Church, Calculi of Lambda Conuersion, Princeton University Press, as neededto satisfy A [P] and no more, in the sensethat any Princeton, NJ, t941. tentative set of P objectsr (such as those given by a wff Zx such M. Davis, The Undecidable, Raven Press, New York, 1965. that AIZI holds) already includes all P objects.Circumscription expressesthis by means of a schema or set of wffs, which M. Dnvrs are denotedhere by AlPllP, as follows: New York University
CIRCUMSCRIPTION
AlPltP -{ta Vl A (rX Z(x) -+ P(x))l -
Circumscription is a technique devised by McCarthy (1) for formalizing certain notions in commonsense reasoning (qv)
(HereAlZlresultsfromAlPlbyreplacingeveryoccurrenceof P By Z.)
(y)(P(y) - Z(y))lZ is a wff}
CIRCUMSCRIPTION
101
which intended to have the interpretation that dead things (D) are those that are not living (L), and o is living, b is dead, and c is a kangaroo 6). The circumscription of D then corresponds to the notion that as few things as possible are to be considereddead. However, using mere predicate circumscription, that is, AlDl. rather than AlD, Ll*, D could not be "squeezed"down by means of an appropriate Z predicate since ^L,being unchanged, would force D to be its unchanging complement. Thus, A [D]. would not have either Dc or Lc as theorems. On the other hand, AlD, tr1. doeshave -Dc, and hence Lc, as theorems. This can be seen by circumscribing with the two predicatesr - b (for Zo) and x * b (for Z): AIZo, Zr) is just (r)[ Zox e -r Zfi) and Zta and Zob and Kc and(a * b and a * c and b * c), which is true, and also (i)(Zox + Dx), so that by A Generalization.McCarthy (2) generaltzeshis original no- the schema one has (x)(Dx - Zox). In particular, -, Zoc + tion of (predicate)circumscription to allow specifiedpredicates -Dc, and so on. other than P to vary as well as P; this decisively extends the Of course,formula circumscription can accomplishall that range of applicability of circumscription. In the new formula- variable conscription does,and even more, as is shown below. tion, called formula circumscription, the schemais replacedby Etherington, Mercer, and Reiter (3) establish several theoa single second-orderformula, but comparison with predicate rems characterizing the above kind of limitation of predicate circumscription is easier when a schema or set AlPt, . , circumscription, thereby bolstering the significance of variP"llB is retained, in the following form: able and formula circumscription.
A key example, a variation on one emphasized by McCarthy (1), is the following let AIPlbe a + b and P(a) y P(b). Let Zt@) be r - a and Zz@)be r - b. Then from P(a) yP(b) one gets that either 21 or 22 will serve for circumscription. That is, either P(a) holds, so that AIZ:I is true and ZJx) -n P(x) and hence circumscription using Zt for P yields P(x) --> ZJx); orP(b) holdsso that AIZz] is true and Zz@)- P(*) and hence,using Z2for P, P(x) --+Zz@).Thus, either o is the only P object or b is; indeed, -'P(a) V -rp(b) will then be provable from AIPI + AIPUP.In fact, it follows that there is a unique P object; this, however, should not cause concern, for the intention is to explore the consequencesof conjecturing the stated minimization of P.
{A[Zb
. , z,l n @)@[zr,
, Zn1 -> E)
* (rXE'-+ E(Zt, , Zn\ , Zn)) lwffs Zr, . The TheoreticalBasisfor Circumscription is formula in Pr, . a which whereE E(Pu . ,Pnl ,Pn Minimal Models. Aside from giving examples,it is desirable may appear,and ElZr, . , Znl is obtainedfrom E by substi- to show in precise terms in what sense the circumscriptive tuting Z; for each P;. Here the intuitive idea is to minimize AtPllP does in fact minimize. For this purpose,Mc(the extension of) the formul a E by allowing variations in (the schema Carthy (1) proposedthe conceptof minimal model in the conextensionsof) Pr, . , P n The new second-orderversion of text of predicate circumscription. Etherington (4) has redecircumscription is called formula circumscription; the weakfined minimal model in a manner appropriate to McCarthy's ened version retaining a schema but allowing variable predinew (formula) version of circumscription, which is presented cates is called variable circumscription. here in slightly modified form. Let M andN be models of A [P] As McCarthy has observed,it is the presenceof the predi- Al,Po,Pv. . , Pnl with the same domains and the same cate variables P1, . , P,that gives variable circumscription interpretations of all constant, function, and predicate symits power, and not the fact that E may be a formula. Indeed, bols exceptpossiblyPo, PL, , P,. Here M is a proper Pforming an extensionby definitions of AtPl bV adding the new N if P6 submodel of the extension in M is a proper subset of of axiom (x)(Pox <+ Ex) where Po is a new predicate letter, one that in N. Then N is a P-minimal model of A [Po, . , P rl if can simply circumscribe Pe with Po, . , P, as variables in model of AIPo, . . , Pr] and no model M of AIPo, N is a the extensionofAtPl. That is, one canjust as well take E tobe . , Prl is a proper P-submodel of .l{. tBy model is meant a single predicate letter Ps since any formula that one may normal model, i.e., a model in which equality is interpreted as wish to minimize can be made equivalent to such a Po by identity. This incidentally showsthe pointlessnessof choosing means of an appropriate axiom included in AlPl itself. In the Ps to be the equality predicate, for then two distinct elements sequelthen, E is the predicate letter P6, and P stands for Po, necessarily cannot be identical and so all (normal) modelsare Pt, . , Pn, that is,.E plays the role of Ps aboveunlessconfor minimal equality. Etherington, Mercer, and Reiter (3) text dictates otherwise. Then the schema AlPllP is as above points.J related have studied this and e x c e p t h a t t h e p r e d i c a t ev a r i a b l e sP o ,. , P n a p p e arra t h e r As consider McCarthy's axiom AIP): an example, again than simply Pt, . , Pn and the wffs Zo, , Z, as well, just P. It is easily seen that Pb. Ps is a + and Pa Here b Y again where P6 is ElPo, . , Pnl and Zs substitutesfor E[Zo, the P-minimal models are precisely ones of the form {Pa, . . , Zrl. To be precise, AlPllP is the set of wffs a- arb: brcr: crrc2: czr. . .)or{Pbra- arb: brcr: cr, {[A[Zo, , Znl A (r)(Zox -+ Por)l cz : c2, . . .) where the number of c; ma! be none or any other cardinality. In particular, Mt - {Pa} and M2 : {Pb} are two - (y)(Pot -->Zo!)lZo, , Zn are wffs) such models. But {PaPbL although it is a model of A [P], is The theory obtained from AtPl by adjoining the set A lPllP not minimal. as new axioms is abbreviated with the notation A [P]- whenThe clearly desirable situation would be to have a definition ever the P can be understood from context. That is, AlPl- _ of model appropriate to the proof theory of the circumscriptive AtPl + AtPltP. schema,that is, affording a completenessresult of the form: B An example using variable circumscription is the following is a consequenceof AtPl bV circumscription, that is, a theorem "life and death" problem; let AlD, Ll be the axiom of A lPllP if B holds in all P-minimal modelsof AtPl. That this doesnot hold in general, as is discussedbelow, indicates that (r)( Dx <+ -Lx) n La A Db A Kc A at present there are unclear areas in the foundational status of (a*bAa+cAb*c) circumscription.
102
CIRCUMSCRTPTTON
Soundness.First, however, is stated a positive result, variants of which have been given in Davis (5) (for what is often called domain circumscription), McCarthy (1) (for predicate circumscriptien), and Minker and Perlis (6) (for protected circumscription) and extended by Etherington (4) to formula circumscription. SoundnessTheorem. For any formul d B, AlPl. v B implies AtPl IP : B where P is a vector of predicate symbols Po, Pr, . , P, and the P-double-turnstile means that the consequentholds in any P-minimal model of the antecedent. Again, the example above illustrates this. Since A [P]- F -Pa Y -Pb, it follows that -Pa V -Pb holds in the models M1 and M2. Of course,one seesdirectly that this is the case. Negative CompletenessResults.I-Infortunately, in general, the converse, which would provide a full completeness(qt) theorem, does not hold, as shown by Davis (5). Let A [N] be Peano arithmetic [with the postulates l/(0), (r)(N(r) - N(r + 1)), etc.l. Then the N-minimal models contain N extensions isomorphic to the natural numbers so that the formulas B relativized to N that are true in these models are precisely those that are true in arithmetic. But no recursive first-order theory, including one of the form A [N]. - AtNl + A[N]/N, has as its theorems precisely those sentencestrue of the natural numbers, nor even its N-relativized theorems. (Etherington, Mercer, and Reiter (3) noticed that for certain other arithmetical theories A tPl consideredin Ref. 5, A[P]may be inconsistent even though A [P] is consistent. Specifically, ALPI may fail to have minimal models. They also showed that certain theories, namely the universal ones, do not suffer this drawback.) Kueker (7) has found the following simpler illustration: Let ilPlbe the theory Pa,Px <+ Psx,a * sr, sff - sy + tc - y. Then models of ItPl are of two types: those that satisfy the sentence (r). Px + l-(Ey)x any minimal model is isomorphic to the natural numbers .l/ and is of the former type. Kueker has shown that this sentence is not a theorem of /[P]* : IlPl + /tPllP. It is worth noting, however, that the full second-orderformula version of circumscription doesentail Kueker's sentence,showing it to be more powerful than variable circumscription; to see this, it is sufficient to use the second-orderformula (Q)tQo n (yXQy --> Qsy) .- Qxl to replace P in the second-ordercircumscription axiom. Ohis trick will not work, of course,in Davis's example (5) since second-orderarithmetic is as prey to undecidability problems as is first-order arithmetic.) Results.Nevertheless,certain partial PositiveCompleteness conversesdo hold, which have rather broad application. First some terminology based on Doyle (8). AlPl is disjunctively P defining if it has theorems of the form ( x ) ( P i xforeach i : 0,.
Wr{) V'''
V ( r ) ( P i x* W i n i x )
, n where the V['s do not involvaPo,.
,
Pn.
Perlis and Minker (9) exploit this conceptin the following partial completenessresult: It A[P]- is disjunctively P definitg, AlPl" r B whenever AlPllP : B. They also show that if AIPI has only finite models, A[P)* r B whenever AIP)IP : B'
Efficiency As with many commonsensereasoning techniques, circumscription naturally presents itself as a candidate for a reasoning mechanism that could in principle be used in an intelligent robot (see Robotics),for instance, in conjunction with a theo. rem prover (see Theorem provirg). However, the fact that a schema or infinite set of axioms is involved presents practical difficulties, especially in the necessary choice of which instance(s) of the schema to use. That is, efficiency questions arise. In this regard, Lifschttz (10) has shown the significanceof a subclassof theories oppositeto circumscription: the separable theories. Separable theories AlPl are those that are formed, using conjunctions and disjunctions, from formulas containing no positive occurrencesof P6 and formulas of the form (x)(E(x) -+ Po(r)) where E is a predicate that doesnot contain Po. Ohese appear related to the disjunctively P defining theories and may afford fruitful terrain for further investigation.) Such theories turn out to afford expression by means of a single first-order wff replacing the second-ordercircumscription axiom. Applicationsand RelatedWork McCarthy (2) gives applications of circumscription to various problems in commonsensereasoning. Paramount among these is his use of a predicate ab for abnormal aspectsof entities. He shows how to represent reasoning to the effect that, for example, typically birds can fly. The idea is to minimize (as a conjectural assumption) the objects that are abnormal with respect to any given aspect,for instance, birds that are abnormal with respect to flying (such as penguins or ostriches). This allows the expression of default reasoning to be given a uniform treatment, in which the predicate ab is circumscribedprovided that other predicates as desired may be consideredvariable. For instance, letting ab(B, F, x) stand for x is an abnormal bird with respect to flying, then from the following axioms' Bird(r) and -'ab(B , F, x) .-+ Flies(r) Ostrictr(r) Ostrich(r) -
ab(B, F, x) Bird(r)
Bird(Tweety) one can prove by formula circumscription that Tweety can fly (and consequently that Tweety is not an ostrich). Here it is sufficient to use the null predicate (e.g.,x * x) for both ab(B, F, d and Ostrich(x), and rc _ Tweety for both Bird(r) and Flies(r). Grosof(11) presentsa translation schemefrom Reiter's (12) default logic into circumscription in an effort to unify and clarify these two approachesto nonmonotonic inference. Reiter (13) showsthat fbr certain special cases,circumscription achieves the effect of another formalism known as predicate completion (L4). Papalaskaris and Bundy (15) have applied circumscriptive reasoning to issues in natural-Ianguage processing(seeNatural-language understanding). In particular, they examine contextual cuesthat provide guidelines for appropriate predicates to circumscribe in formulating answers to questions.
CLUSTERING
BIBLIOGRAPHY 1. J. McCarthy, "Circumscription-A form of non-monotonicreasoni.g," Artif. Intell. L3,27-39 (1980). Z. J. McCarthy, "Applications of circumscription to formalizrng common sense knowledgu," Workshop on Nonmonotonic Reasonitg, New Paltz, NY, sponsoredby AAAI, October L7-19, 1984. 3. D. Etheringtotr, R. Mercer, and R. Reiter, "on the adequacy of predicate circumscription for closed-worldreasonirg," J. ComputIntell. 1, 11-15 (1985). 4. D. Etherington, personalcommunication,Comp.Sci. Dept., IJniv. of British Columbia, Canada, 1984. b. M. Davis, "The mathematics of non-monotonicreasoning," Artif. Intell. 13, 73-80 (1980). 6. J. Minker and D. Perlis, "Protected circumscription," Workshop on Nonmonotonic Reasoning,New Paltz, NY, Oct., L984. 7. D. Kueker, "Another failure of completenessfor circumscription," Week on Logic and Artificial Intelligence, University of Maryland, October 22-26, 1984. 8. J. Doyle, "Circumscription and implicit definability," Workshop on NonmonotonicReasoning,New Paltz, NY, Oct., 1984. g. D. Perlis and J. Minker, "Completenessresults for circumscription," Artif. Intell. 28, 29-42 (L986). 10. V. Lifschitz, "Some results on circumscription," Workshop on NonmonotonicReasonirg, New Paltz, NY, Oct., 1984. 11. B. Grosof, "Default reasoning on circumscription," Workshop on Nonmonotonic Reasoning,New Paltz, NY, Oct., 1984. 12. R. Reiter, "A logic for default reasoning,"Artif. Intell.,13' 81-132 (1980). 13. R. Reiter, "Circumscription implies predicate completion (sometimes)," Proc. Nat'I. conf. on Art. Intell., Pittsbursh, PA, 1982. 14. K. Clark, "Negation as failur€," in H. Gallaire and J. Minker (eds.),Logic and Data Bases,Plenum, New York, L978' lb. M. A. Papalaskaris and A. Bundy, "Topics for circumscription," Workshop on Nonmonotonic Reasonitg, New Paltz, NY, Oct',
103
similar according to a given measure, but becauseas a group they represent a certain conceptual class. This view, called conceptual clustering, states that clustering depends on the goals of classification and the conceptsavailable to the clustering system for characterizing collectionsof entities. For exampl;, if the goal is to partition a configuration of points into simple visual gfoupitrgs, one may partition them into those that form a T-sh&P€,an L-shape, and so on' even though the density distributions and distances between the points may suggest different groupings. A procedure that uses only similarities (or distances) between the points and is unaware of these simple shape types clearly can only accidently create clusterings corresponding to these concepts.To create such clusteri.g, these descriptive conceptsmust be known to the system. Another example of conceptual clustering is the gpouping of visible stars into named constellations. Conceptual clustering is contrastedwith the classicalview in the next section and describedin more detail in the section Conceptual Clustering. Clustering is the basis for building hierarchical classification schemes.For example, by first partitioning the original set of entities and then repeatedly applying a clustering algorithm to the classesgenerated at the previous step, one can obtain a hierarchical classification of the entities (a divisive strategy). A classification schema is obtained by determining the general characteristics of the classesgenerated. Building classification schemesand using them to classify objectsis a widely practiced intellectual processin scienceas well as in ordinary life. Understanding this process,and the mechanisms of clustering underlying it is therefore an important domain of research in AI and other areas. This process can be viewed as a cousin of the "divide and conquer" strategy widely used in problem solving (qv). It is also related to the task of decomposingany large-scale engineering system into smaller subsystemsin order to simplify its design and implementation.
1984.
D. Prnr,rs The ClassicalView versusthe ConceptualClusteringView University of MarYland In the classical approach to clustering mentioned above,clusters are determined solely on the basis of a predefinedmeasure of similarity. To define such a measure, a data analyst determines attributes that are perceived as relevant for characterizing objects under consideration. Vectors of values of these CLUSTERING attributes for individual objects serve as descriptions of these grouping physical process of objects.Considering attributes as dimensionsof a multidimenClustering is usually viewed as a or abstract objectsinto classesof similar objects.According to sional description space,each object description correspondsto this view, in order to cluster objects, one needs to define a a point in the space.The similarity between objectscan thus measure of similarity between the objectsand then apply it to be measured as a reciprocal function of the distance between determine classes.Classesare definedas collectionsof objects the points in the description space. Let Ve and Vn denote the attribute vectors representing whose intraclass similarity is high and interclass similarity is low. Becausethe notion of similarity between objectsis funda- objectsA andB, respectively.The distanceof objectA to object mental to this view, clustering methods based on it can be B is defined as a numerical function of the attribute vectors of called similarity-based methods. Many such methods have A and B and is written as d(Ve, Vil. For example, assuming been developed in numerical taxonohy, a field developedby that vector descriptions of objects A and B are Va . , x,(B)), re. , xn(A)) and Vs _ (rt(B), xz(B), social and natural scientists, and in cluster analysis, a subfield xz(A), (qt). Various similarity measures and spectively,where x!, x2, . of pattern recognition , xn are selectedobjectattributes, clustering algorithms utilizing them are presentedbelow (see a simple measure of distance is: also Concept learnirg; Region growing.) n Another view recently developedin AI postulates that obd(Ve, Vn) : l*,(e)- rc;(B)l jects should be grouped together not just because they are i:I
104
CTUSTERINC
Because distance is a function of only the attributes of two compared objects,the similarity-based clustering can be performed relatively easily and without a need for knowledge about its purpose.The similarity-based approachhas produced a number of efficient clustering algorithms, which have been useful in many classification-building applications. The classical approach suffers, however, from some significant limitations. The results of clustering are clusters plus information about numerical similarities between objects and objectclasses.No descriptionsor explanations of the generated clusters are supplied. The problem of cluster interpretation is simply left to the data analyst. Data analystsohowever, are typically interested not only in clusters but also in their explanation or characteri zation. To overcomethis, one may postscript the similarity-based clustering processwith an intelligent interpretation that tries to learn the conceptualsignificanceof each cluster through the use of AI techniques.Such a process,however, is not easy.In fact, it may be even more difficult than that of generating clusters themselves.This is becauseit requires inducing category descriptions from examples, which is a complex inferential task. Even if one ignores this difficulty, this processmay not produce desired results. Clusters generated solely on the basis of somepredefined numerical measure of similarity may in principle lack simple conceptual explanations. One reason for this is that a similarity measure typically considersall attributes with equal importance and thus makes no distinction between those that are more relevant and those that are less relevant or irrelevant. Consequently, if there is coincidental agreement between the values of a sufficient number of irrelevant attributes, objectsthat are different in a conceptual sensemay be classified as similar. Even if one assigns some a priori "weights" to attributes this wiII not change the situation very much, becausethe classicalapproachhas no mechanismsfor selecting and evaluating attributes in theprocessof generating clusters. Neither is there any mechanism for automatically constructing new attributes that may be more adequate for clustering than those initially provided. Another reason for the difficulty of the postclustering interpretation is that in order to generate clusters that correspond to simple concepts,one has to take into considerationconcepts useful for characteri zrrrgclusters as a whole in the processof clustering and not after clustering. The following example illustrates this point. Consider the problem of clustering the points in Figure 1. Typically, a per-
ooo
o o o
o Oa o a o
o o oo
o o o
.
o
OO
o
Ao
o
oo oo
o o
o
so
o
o
O
o
o
a
a
a o o
aooo o
oo
oo
a
Figure 1. How would you cluster these points?
son lookin g at this figure would say that it is a letter S intersecting with a letter M. One should observethat points A and B, which are closer to each other than to any other points, are classified into conceptually different clusters. The reason seems to be that people are equipped with conceptssuch as letter shapes,straight lines, and so on to help them recognize certain conceptsin the figure. Thus, clustering in this caseis not based on local closenessof points but on global concepts characterizing collections of points together. A conceptual clustering program would solve this problem by matching the descriptions of the letter shapes (contained in its memory as background knowledge) against the given collection of points. The best match would be obtained for shapes"S" and "M." One may add that, in general, classicaltechniquesdo not seem to be much concerned with the ways humans cluster objects.They do not take into consideration any Gestalt concepts or linguistic constructs people use in describing object collections.Observations of how peoplecluster objectssuggest that they search for one or more attributes (out of many potential attributes) that are most relevant to the goal of clustering and on that basis cluster the objects. Objects are put to the same cluster if they score similarly on these attributes. A description of the objects in the same cluster can therefore be expressedas a single statement or a conjunction of statements, each specifying one common property (attribute value) of the objectsin the cluster. The above remark doesnot mean, however, that individual statements could not include a disjunction of values of the same attribute (the so-called internal disjunction). For example, a cluster may be characterized as "a set of large boxes,made of cardboard, and colored either blue or yellow." Different clusters are expectedto have descriptions with different values of the relevant attributes. Conceptualclustering has been introduced as a way to overcomethe above-mentionedlimitations of classicalmethods.Its basic premise is that objectsshould be arranged in classesthat represent simple conceptsand are useful from the viewpoint of the goal of clustering. Thus, objectsin the same cluster do not necessarilyhave to be similar in somemathematically defined sensebut must as a group represent the sameconcept.In order to cluster objects into conceptual categories,the notion of similarity must be replacedby a more general notion of conceptual cohesiveness(1) (seealso Learning, machine). The conceptual cohesiveness(CC) between two objects A and B dependson the attributes of these objects,the attributes of nearby objects,and the set of conceptsavailable for describing object configurations. Thus, it is a function CC(VA, VB,E, C), where Va and Vs are vectors of attribute values for A and B, respectively,E denotesobjectsin the environment of A and B, and C is the set of available concepts.Thus, the conceptual cohesivenessis a four-argument function in contrast to a twoargument distance or similarity function. In conceptualclustering there is a constant duality between category descriptions and cluster membership. Specifically, the result of conceptual clustering is not only a set of clusters (a classification of the initially given objects)but also a set of conceptschar actenzing the obtained clusters (a classification scheme). One may say that from the viewpoint of AI, the similaritybasedapproach representsthe so-calledweak method, that is, a gene*I -"thod that uses little problem domain knowledge. Such a method can be called domain-general knowledge-poor.
CLUSTERINC
In contrast, the conceptual clustering approach that is dependent on the background conceptsand clustering goals can be called domain-generic knowledge-modular. It requires an interchangeablemodule of knowledge definedfor the problem at hand. A goal-dependencynetwork (GDN) (27) may be used to indicate which attributes are relevant to which goals of classification. Various algorithms for classical methods and conceptual clustering methods are presented below. of ClusteringProblems A Classification From the viewpoint of applications, it is useful to classify clustering problems on the basis of the dimensionality of objectsto be clustered. Three classesof problems can be distinguished:
105
or proximity measures and developing clustering techniques utilizing them. A large number of such measures and comesponding clustering methods have been developed to date. Comprehensivesurveys can be found in Sokal and Sneath (8), Cormark (9), Anderberg (10), Gower (11), and Diday and Simon $2). A summary of various distance measures is described in Ref. 13. Clustering techniques can themselvesbe clustered in many interesting ways. One classification partitions the techniques on the basis of the type of control used in building the clusters. The categoriesof clustering techniques accordingto this classification are agglomerative, divisive, and direct.
Agglomerative Techniques.Agglomerative techniques are often used in numerical taxonomy. These techniques form 1. One-dimensionalclustering (quantization of uariables).For clusters by progressive fusion, that is, by recursively joining continuous variables or discrete variables with ranges of separateentities and small groups together to form larger and values that are significantly larger than necessary for a larger gfoupings. Eventually a single universal group is given problem, one wants to reduce the number of distinct formed and the processhalts, leaving a record of the merges values of the variables by identifying equivalenceclassesof that took place. The history of merges is often displayed in the values. Clusters of values of individual variables are then form of a dendrogram (seeFig. 2c) that shows,by the position treated as single units. For example, in image processing of the horizontal location of the merge, the between-group the scannersusually distinguish between a large number of similarities. As the groups encompassmore and more entities, gray levels, but only a few levels may be neededfor solving the between-group similarity scoresdecrease. a given problem (see Image understanding). Rosenfeld (2) By adopting a threshold of minimum similarity, the aghas shown that clustering methods can be used for making glomeration process can be halted before all entities are such a reduction. Nubuyaki (3) proposeda clustering algo- merged into a single group. Conversely, the complete dendrorithm for this purpose in which the clusters have minimal gram may be "cut" apart across some similarity boundary. sums of squares of intracluster distances.Clustering tech- This yields a number of clusters, each containing those entiniques have also been used to analyze LANDSAT im- ties that were merged at a similarity score above the given ages(4). threshold. 2. Two-dimensional clustering (segmentation).This type of During the agglomerative clustering processit is necessary clustering occursmost often in image processing,where one to calculate the similarities between groups of entities. There searchesfor segmentsof an image in which all picture ele- are three standard ways to compute between-group similariments share some common properties. For example, they ties (measured as the reciprocal of distances). Supposetwo may have a similar gray level or similar texture. Coleman groups are identified as X and Y. The single-linkage methods (5) defined region segmentation as a problem of clustering calculate between-groupdistance between one entity in group (which he calls nonsupervised learning) and used the k- X and another entity in group Y. The complete-linkegemethmeans algorithm of MacQueen (6). Haralick and Shapiro ods use the maximum distance between one entity in group X (7) have used clustering to anatyze object shapes. and another entity in group Y. The auerage-linkagemethods 3. Multidimensional clustering. In multidimensional cluster- use the average of the distances between all possiblepairs of ing objects are partitioned into clusters in a description entities with one taken from group X and the other from space spanned by many attributes characterizing the ob- group Y. jects. As mentioned earlier, the basis for clustering is typically a similarity measure. Traditional clustering techDivisiveTechniques.Divisive techniques form a classificaniques may assumedifferent geometric distributions of the tion by progressive subdivision, that is, by repeatedly breakpoints in the space by the use of different normalization, ing the initial set into smaller and smaller clusters until only transformation, and statistical treatments of the attrisingle entities exist in each cluster. The result is a hierarchy of butes. The next sectiongives more details on the similarityclusters. The divisive technique of Edwards and Cavalli-Sforza based methods. In conceptual clustering the conceptof de(14) examines all 2w - 1 partitions of N objectsand selectsthe is not however, here the space scription spaceis also useful; fixed but may change as new attributes are generated by one that gives the minimum intracluster sum of the squared background knowledge heuristics. In addition, the method interobject distances. The computational cost of the method is equipped with a set of conceptsthat can be used to char- limits its use to casesinvolving the clustering of only a few objects. actertze object configurations.
ClassicalMethodsof Clustering The thrust of research in cluster analysis and numerical taxonomy has been toward determining various object similarity
Direct Techniques.The direct techniquesneither merge entities into clusters nor break large clusters into smaller ones. A direct technique is given the number (usually denoted k) of clusters to form and proceedsto find a partitioning of the enti-
106
CLUSTERING
1. MP (Microprocessor) Type:structured Domain:13 values 8080a 8502 Z,80 1802 6502C 6502A 68000 6800 6805 6809 8048 Z,8000 HP (Hewlett-Packard Co.proprietary)
2. RAM memory size Type: linear Domain: 4 values 16,000bytes 32,000bytes 48,000bytes 64,000bytes
4. Display type Type: structured Domain: 4 values Terminal B/W.TV Color-TV Built-in
3. ROM memory size Type: linear Domain: 7 values 1000 bytes 4000 bytes 8000 bytes 10,000bytes 11,000-16,000bytes 26,000bytes 80,000bytes (a)
5. Keys on keyboard Type: linear Domain: 5 values 52 keys 53-56 57-63 64-73 92
MP
oebs .d..)o.) .^1.1 \." ,r(, ur6oooebo
8o8oA z8o 8048
,
/
I
\
6502 6502A 6502c
T h e s t r u c t u r e dd o m a i n f o r t h e v a r i a b l e" M P . "
D i s p l a tyy p e
E x t e r n a tl e r m i n a l
C o l oT rV
B/W TV
Built-in
Thestructureddomainfor the variable"displaytype." (b) Figure 2. (o) Variable used to describe microcomputers. (6) The structure of domains of variables "MP" and "Display type." (c) A dendrogram generated by NUMTAX with descriptions generated by Aq. (d) A conceptual clustering of microcomputers.
ties into ft clusters that optimizes some measure of the goodnessof the clusters. Two early direct clustering techniquesare ft-meansdevelopedby MacQueen (6), and the center adjustment method developedby Meisel (15). A generalization of the k meansand center adjustment techniques called the dynamic clustering method has been developedby Diday (16). Another classification of clustering methods separatesthe monothetic techniques from the polythetic ones.A monothetic clustering algorithm divides the set of objects into clusters that differ in the value of one attribute. For example, such a technique might form one cluster in which attribute X; has the value 1 and another cluster in which attribute Xihas the value 0. A polythetic clustering technique forms clusters in which the values of several attributes differ for different classes. Traditional clustering relies on measuresof similarity and
the requisite need to "fold" the attribute values together to rneasure object-to-objectsimilarities. When this occurs in a multidimensional space,the question of attribute weighting comesup, and there is much controversy over what weighting schemeis best for various purposes. Weights on attributes have to be given a priori by the researcher. Problems with such an approach are that it is usually difficult to define such weights, and that some attributes may be dependenton other attributes. For example, attributes B and C may be important only if attributeA has the value 1. A similarity metric uses some static weights for attributes A, B, and C. The attributes B and C are weighted too high when attribute A takes the value 0 (since they should receive zero weight in that case),and they may be weighted too low when attribute A takes the value 1.
CLUSTERING
107
Similarity 0.35 0.20 0.05 0.10 0.25 0.40 0.55 0.70 0.85 1.0 +--+ --l---l---l +--+ +--+
r I
- - C : V l C2 0 - i - - - - ' r - - G : H P8 5
I - - lI
L -----i
I I
r--I
J : O h i oS c i . I - D:Sorcerer H:Horizon
--f t- - -
i r---
---
L- { I
]" ,)
-..l,!"ilh 19 t F : Z e n i t hH 89
)"
k= For the two-cluster solution (obtained by cutting the dendrogram at the dashed line marked by k - 2) cluster descriptions are:
.4SKlVlKeys< 631
IRAM:16K.
1RAM:G4Kl[Keys>63] (c)
= 53..63] [Display# Built-in]lKeys f
IMP - 8080x] - -'---l -
f
F--
---
BU u il l t --linnllL[ K K€elyS-=s O644. ./ .J7Jg ] ,O U ,S f rppllaayy= H If -L ------------zenith [Display* Built-in]& = 11K-16K] [ K e y s= 6 4 . . 7 3 ] [ R 0 M
-i
= u6502x] ruz-Al IMP rvrr = i L
- ----t-i [0isptay ColorTV]& = 10K] = J52..63][ROM rv^J l..oJjlnvrvr = It L [Keys neys = 1 : : - : - - : ; , :-; ;B- -/ W - - -- T- V ]&
---
| [ D i s p-l a y - 10K1 evs 53..56][ROM |l -l K ------
I M P= H P I
Sorcerer HOfiZOn
Trs-80| l -U 8 zZeeennnlirittnhhH H89
T r s - 8 0l l l
vlc20 Annte A pple ll
A t a rgi o o ghalenger o h i oS c i1 l
[ K e y s= 9 2 ] [ R O M- 3 0 K ]
A description of the class al: IMP : 8080x] & [Display * Built-in] & lKeys - 53..63] (d) Figure 2. (Continued)
ConceptualClustering As described above, conceptual clustering arranges objects into clusters corresponding to certain conceptual classes,for example, classescharacterized by conjunctive concepts(i.e., conceptsdefined by a simple conjunction of properties). The basic theory and an algorithm for conceptual clustering have been developedby Michalski (17). Implementation and experimentation with the algorithm has been performed by Michalski and Stepp (1,18)and Stepp (19) and has producedthe programs CLUSTER 12 andCLUSTER/S. Other programs that work differently but provide conceptual clustering features include DISCON (20), RUMMAGE (2I), and GLAUBER (22). From the viewpoint of AI, clustering is a form of learning from observation (or learning without a teacher). It is a processthat generatesclasses(conceptuallydefinedcategories)in order to partition a given set of observations.It differs from conceptlearning (qv) in that the latter creates descriptions of teacher-providedclassesby generalizing from the examplesof the classes. Below, one method for conceptual clustering is briefly outlined. The method is basedon the idea that conceptualcluster-
ing can be conductedby a series of conceptualdiscriminations similar to those used in learning conceptsfrom examples.The method uses the extended predicate calculus proposedby Michalski (17). Such a language is used to describe objects, classes of objects, and general and problem-specific background knowledge.The method employs a general-purposecriterion for measuring the quality of generatedcandidate classifications. Finding classificationsthat scorehigh on the quality criterion is the most general goal of the method. Additional problem-specificgoals may be supplied by the user or inferred by the system from a general goal dependencynetwork. Goal dependencyis important to reduce the spaceof hypothetical classificationsthe method investigates. Creating a classification is a difficult problem becausethere are usually many potential solutions with no clearly correct or incorrect answers. The decision about which classification to choosecan be basedon someperceivedset of goalsas described by Medin, Wattenmaker, and Michalski (23), a goal-oriented, statistic-basedutility function as describedby Rendell (24), or some other measure of the quality of the classification. One way to measure classification quality is to define various elementary, easy-to-measurecriteria specifying desirable properties of a classification, and to assemblethem into one
1OB
CTUSTERING
general criterion. Each elementary criterion measures a certain aspect of the generated classifications. Examples of elementary criteria are the relevance of descriptors used in the class descriptions to the general goal, the fit between the classification and the objects, the simplicity of the class descriptions, the number of attributes that singly discriminate among all classes,and the number of attributes necessaryto classify the objectsinto the proposedclasses. Building a meaningful classification relies on finding good classifying attributes. The method presentedbelow usesbackground knowledge in the search for such attributes. Background knowledge rules enable the system to perform a chain of inferencesto derive values for new descriptorsfor inclusion in object descriptions.The new descriptorsare tested by applying the classification quality criterion to the groupings formed by them. ConceptFormationby RepeatedDiscrimination.This section explains how a problem of conceptformation (here, building a classification) can be solved via a sequenceof controlled steps of concept acquisition (learning concepts from examples). Given a set of unclassified objects,k seedobjects are selected randomly and treated as representatives af k hypothetical classes. The algorithm then generates descriptions of each seed that are maximally general, form a good match with a subset of the objects given, and do not cover any other seed. These descriptions are then used to determine the most representative object in each newly formed class (where the newly formed class is defined as the set of objectssatisfying the generated classdescription). The k representativeobjectsare then used as new seeds for the next iteration. The process stops either when consecutive iterations converge to some stable solution or when a specificnumber of iterations pass without improving the classification (from the viewpoint of the quality criterion). This approach requires that the number of classesis specified in advance. Since the best number of classesto form is usually unknowr, two techniques are used: varying the number of classesand composingthe classeshierarchically. For most purposes, it is desired that the classification formed be simple and easy to understand. With this in mind, the number of classesthat stem from any nodeof the classification hierarchy can be assumed to be in some modest range such as from 2 to 7. With this small range, it is computationally feasible to repeat the whole clustering process for every number in the range. The solution that optimizes the score on the classification quality criterion (with appropriate adjustment for the effect of the number of classeson the score) indicates the best number of classesto form at this level of the hierarchy. The above method of repeated discrimination for performing clustering has been implemented in the program CLUSTERI} for a subset of extendedpredicate calculus (seeLogic, predicate) involving only attributes (zero-argument functions). Besidesits relative computational simplicity, this approach has other advantagesstemming from use of quantifierfree descriptions (for both objects and classes).It should be noted that classificationsnormally have the property that they can unambiguously classify any object into its corresponding class. To have this property, the class descriptions must be mutually disjoint.
For conjunctive descriptions involving relations on attribute-value pairs, the disjointness property is easy to test and easy to maintain. For the more complex problems that require object representations involving quantified variables, predicates on these variables, and function-value relationships over quantified variables, the test for mutual disjointness of descriptions is much more complex. To cope with this difficulty, the problem of clustering of structured objects is decomposedinto two steps. The first step finds an optimized characteristic description of the entire collection of objectsand then uses it to generate a quantifier-free description of each object. The second step processesthe quantifier-free object descriptions with the CLUSTERI? algorithm to form optimizedclassifications. These two processesare combined in the program CLUSTER/S.
Example1: Microcomputers.The problem is to develop a meaningful classification of popular microcomputers.Each microcomputer is described in terms of the variables shown in ((MP" and "Display type" are structured, Figure 2a.Yariables i.e., their value set forms a hierarchy (Fig. 2b). Two programs were applied to solve this problem: NUMTAX, which implements several techniques of numerical taxonoffiy, and CLUSTER/2, which implements conjunctive conceptual clustering. A representative dendrogram producedby NUMTAX is shown in Figure 2c. The dashedlines indicate where the dendrogram is cut apart to form two clusters (k _ 2). Accompanying the dendrogram is a logical description of the clusters. These descriptions were produced by an inductive learning proglam that acceptsas input a collection of groups (clusters)of objects and generates the simplest discriminant description of each group. For example, the first cluster is describedas IRAM - 16K . . . 48Kl V lKeyss 63] This description suggeststhat the cluster is composedof two kinds of computers,one that has [RAM _ 16K . . . 64Kl and the other that has [Keys =63]. The presenceof disjunction raises the question of why these computers are in the same cluster. The program CLUSTEP"I2 was given the same data and was told to use a classification quality criterion that maximizes the fit between the clustering and the objects in the cluster and then maximizes the simplicity of category descriptions. The clusterittg obtained is shown in Figure 2d. The firstlevel clustering is done on the basis of type of microprocessor. Example2: Trains. Consider a problem of classifying structured objects,for example, the problem of finding a classification of trains shown in Fig. 3o. The trains are structured objects, each consisting of a sequenceof cars of different shapes and sizes. The individual cars carcy a variable number of items of different shapes. Human classifications of the trains shown in Figure 3a have been investigated by Medin, Wattenmaker, and Michalski (23).The 10 trains were placedon separateindex cards so they could be arranged into groups by the subjects in the experiment. The experiment was completedby 3L subjectswho formed a total of 93 classifications of the trains. The most popular classification (17 repetitions) involved the number of cars in the trains. The three classesformed were "trains con-
CTUSTERING
109
A. H.
c. G.
lr.
^A J\ A.\1
Ctassl: "Train conteinslwo cars."
H.
c.
B.
c.
D.f
"Thesetrainsarecarryingtoxic chemicals"' E.
E.
G.
G.
B.
Class2: "Train containsthreeclrs."
D.
A.
E
f@l
rFEr
n
H.
Class3: "Train conteinsfour cars." (a)
(b)
"Thesc trains are not carrying toxic chemicals." (c)
Figure 3. (o) Trains to be classified. (b) The most frequent human classificationof trains. (c) Conceptual clustering of trains carrying toxic chemicals.
taining two cars," "trains containing three cars," and "trains containing four cars." This classification is shown in Figure 3b. This problem is an example of a classof problemsfor which the implicit classification goal is to generate classesthat are conceptually simple and basedon easy-to-determinevisual attributes. When people are asked to build such classifications, they typically form classeswith disjoint descriptions,BSin the above-mentionedstudy by Medin. For this reason methods that produce disjoint descriptions are of prime interest. The problem of classifying trains representsa general category of classification problems in which one wants to organize and classify observationsthat require structural descriptions, for example, classifying physical or chemical structures, anabuilding taxonomiesof plants or anilyzing genetic sequences, mals, characterizing visual scenes,or splitting a sequenceof temporal events into episodeswith simple meanings. One problem of concernhere is to developa general method that when applied to the collection of structured objects,such as trains, could potentially generate the conjunctiveconcepts occurring in human classificationsor invent new conceptshaving similar appeal. An extension of the trains problem illustrates the use of a goal dependency network and problem-specific background knowledge. Supposethat the knowledge base includes an inference rule that can identify trains carrying toxic chemicals and that the general goal "survive" has a subordinate goal "monitor dangerous shipments." This background knowledge can be used to help build a classification. In the illustrations of the trains a toxic chemical container is identified as a single sphere (circle) riding in an open-top car. A background-knowledgerule supplied to the program is
[contains(train,car)][car-shape(car)- opentop] [cargo-shape(car)_ circle][items-carried(car) - 1] e [has- toxic- chemicals(train)] In the above rule, equivalence is used to indicate that the negation of the condition part is sufficient to assert the negative of the consequencepart. After this rule is applied, all trains will have descriptions containing either the toxic chemical predicate or its negation. The characteristic description generated by the program will now contain the additional predicate "has-toxic-chemicals(train)" (or its negation). By recognrzrngthat this predicate is important to the goal "survival" through use of a GDN, the program producedthe classification shown in Figure 3c. Concept Formation by Finding ClassifyingAttributes. This section describesan alternative approachfor building classifications. This approach searches for one or more classifying attributes whose value sets can be split into ranges that define individual clusters. The important aspect of this approach is that the classifying attributes can be derived through a goaldirected chain of inferences from the initial attributes. The classifying attributes sought are the ones that lead to classes of objectsthat are best according to the classification goal and the given classification quality criterion. The "premise" of a descriptor to serve as a classifying attribute is determined by relating it to the goals or derived subgoals of the problem and by considering how many other descriptorsit implies. For example, if the goal of the classification is "finding food," the attribute "edibility" might be a good classifying attribute. The secondway of determining the promise of an attribute
11 0
CTUSTERING
can be illustrated by the problem of classifying birds. The question of whether "color" is a more important classifying attribute than "is-waterbird" is answeredin favor of "is-waterbird" becausethe latter leads to more implied attributes than doesthe attribute "color" in a given GDN (e.g.,"is-waterbird" implies can swim, has webbed feet, eats fish, and so on), as describedby Medin, Wattenmaker, and Michalski (23). There are two fundamental processesthat operate alternately to generate the classification. The first processsearches for the classifying attribute whosevalue set can be partitioned to form classessuch that the producedclassificationscoresbest according to the classification quality criterion. The second processgeneratesnew descriptorsby a chain of inferencesusing background knowledge rules. Descriptors that can be inferred are ordered by relevancy to the goals of the classification. The searchprocesscan be performed in two ways. When the number of classesto form (fr) is known in advance,the process searchesfor attributes having k,or more different values in the descriptions of the objects to be classified. These values are called the obseruedvaluesof the attribute. Attributes with the number of observedvalues smaller than k arc not considered. For attributes with observed value sets larger than k, the choice of the mapping of value subsets to classesdependson the resulting quality criterion score for the classification produced and the type of the value set. When the number of classesto form is not known, the above technique is performed for several different values of fr. The best number of classes,ft, is indicated by the classification that best satisfiesthe quality criterion and goals. The generate processconstructsnew attributes from combinations of existing attributes. Various heuristics of attribute construction are used to gUide the process.For example, two attributes that have linearly ordered value sets can be combined using arithmetic operators. When the attributes have numerical values (as opposedto symbolic values such as small, medium, and large), a trend analysis can be used to suggest appropriate arithmetic operators,as in the BACON system by Langley and his associates(25). Predicatescan be combinedby logical operators to form new attributes through background knowledge rules. For example, a rule that says an animal is a reptile if it is cold-bloodedand lays eggs can be written as [cold-blooded(ol)][offspringbirth(ol) - egg] ) lanimal-type(ol) - rePtile]. The application of this rule to the given animal descriptions yields the new attribute "animal-type" with the specified value "reptile." l)sing this rule and similar ones, one might classify some animals into reptiles, mammals, and birds even though the type of each animal is not stated in the original data. Summary Clustering objectsor abstract entities into meaningful categories is an important form of learning (qt) from observation. This entry has described a classical, "similarity-based" approach and the more recent conceptual clustering approachto this problem. The fundamental notion is conceptualcohesiveness that groups together objects that correspond to certain conceptsrather than objects that are similar according to a mathematical similarity function.
BIBLIOGRAPHY 1. R. S. Michalski and R. E. Stepp,Learning from Observation:Conceptual Clusterirg, in R. S. Michalski, J. Carbonell, and T. Mitchell (eds.),Machine Learning: An Artificial IntelligenceApproach, Tioga, Palo Alto, CA, pp. 331-363, 1983. 2. A. Rosenfeld, Some Recent Developments in Texture Analysis, Proceedings of the Conferenceon Pattern Recognition and Image Processing,Chicago, 1979. 3. O. Nubuyaki, Discriminant and Least Squares Threshold Selection, Proceedings of the Fourth International Conferenceon Pattern Recognition,Kyoto, Japan, p. 592, 1978. 4. P. H. Swain, "Image and Data Analysis in Remote Sensing," in R. M. Haralick and J. C. Simon (eds.),Issues in Digital Image Processing,Sijthoff and Noordhoff, Amsterdam, 1980. 5. G. B. Coleman, SceneSegmentation by Clusterirg, University of Southern California Image ProcessingInstitute, Report USCIPI, L977. 6. J. MacQueen, "Some methods for classification analysis of multivariate observations,"Proc. 5th BerkeleySymp. Math. Stat. Prob., 28r, 1967. 7. R. M. Haralick and L. Shapiro, Decomposition of Polygonal Shapesby Clustering, Proceedingsof the IEEE Conferenceon Pattern Recognition and Image Processing,Troy, NY, p. 183, 1977. 8. R. R. Sokal and R. H. Sneath,Principlesof NumericalTanonotrty, W. H. Freeman, San Francisco,1963. 9. R. M. Cormark, "A review of classification," J. Roy. Stat. Soc., SeriesA, P, L34-321 (1971). 10. M. R. Anderb€rg, Cluster Analysis for Applications, Academic Press,New York, 1973. 11. J. C. Gower, "A comparisonof somemethodsof cluster analysis," Biametrics 23, 623-637 (1967). L2. E. Diday and J. C. Simon, "Clustering analysis," Conxrnunication and Cybernetics,Springer-Verlag, New York, L976. 13. R. S. Michalski, R. E. Stepp,and E. DidaY, "A RecentAdvancein Data Analysis: Clustering Objects into ClassesCharacterizedby Conjunctive Concepts," in L. N. Kanal and A. Rosenfeld (eds.), Progressin Pattern Recognition, Vol. 1, North-Holland, Amsterdam, 1981. L4. A. W. F. Edwards and L. L. Cavalli-Sforza,"A method for cluster analysis,"Biometrics 2L, 362-375 (1965). 15. W. Mei sel,Computer Oriented Approachesto Pattern Recognition, Academic Press,New York, L972. 16. E. Diday, "Problems of clustering and recent advanc€s,"Eleuenth Congressof Statistics, Oslo Norway, 1978. t7. R. S. Michalski, "Knowledge acquisition through conceptualclustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts,"J. Pol. Anal. Inform. Sys. 4,2L9244 (1980). 18. R. S. Michalski and R. E. Stepp, "Automated construction of classifications: Conceptual clustering versus numerical taxonomy," IEEE Trans. Pattern Anal. Mach'ineIntelL PAMI'5 (4)' 396-410 (July 1983). 19. R. E. Stepp, Conjunctive Conceptual Clustering: A Methodology and Experimentation, Ph.D. Thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1984. 20. P. Langley and S. Sage,ConceptualClustering as Discrimination Learnin g, Proceedings of the Fifth Biennial Conferenceof the Ca' nadian Societyfor Computational Studies of Intelligence, London, Ontario, L984,pp. 95-98. ZL. D. Fisher, A Hierarchical Conceptual Clustering Algorithm, Technical Report, Department of Information and Computer Science,University of California, Irvine, 1984. 22. P. Langley, J . Zytkow, H. Simon, and G. Bradshaw, The Searchfor
COGNITIVEMODELING Regularity: Four Aspects of scientific Discovery, in R. S. Michalski, J. Carbonell, and T. Mitchell (eds'),Machine Learning: An Artificiat Intettigence Approach, Vol. II, Morgan Kaufmann' pp. 425-469, L986. 23. D. L. Medin, W. S. Wattenmaker, and R. S. Michalski, "Constraints in inductive learning: An experimental study comparing human and machine performance"' ISG Report 86-1, UIUCDS-F86-952,University of Illinois, 1986. 24. L. A. Rendell, "Toward a unified approach for conceptualknowledge acquisition," AI Mag. 4, L9-27 (Winter 1983). 25. P. Langley, G. L. Bradshaw, and H. A. Simon, "Rediscovering chemistry with the BACON system," in R. S. Michalski, J. Carbonell, and T. M. Mitchett (eds.), Machine Learning: An Artificial Intelligence Approach, Tioga, 1983, pp. 307-329. 2G. D. Fisher and P. Langl"y, Approachesto Conceptual Clusteritg, Proceedingsof the Ninth International Joint Conferenceon A.I, Los Angeles,CA, pp. 691-697, (August 1985). 27. R. E. Stepp and R. S. Michalski, ConceptualClustering: Inventing Goal-Oriented Classifications of Structured Objects,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, YoL II, Morgan Kaufmann, pp. 331-363, 1986. R. S. MrcselsKl and R. E. Stnpp University of Illinois
COCNITION.
See Reasoning.
MODELINC COGNITIVE A cognitive simulation model is a computer simulation of mental or cognitive processes.Such a model is normally constructed by cognitive psychologists,who are members of the branch of experimental psychologythat is concernedwith the scientific and empirical study of human behavior, with an emphasis on understanding the internal mental mechanismsthat underlie behavior (seeCognitive psychology).The purposesof cognitive modeling are to express a theory of mental mechanisms in precise and rigorous terms, to demonstrate the sufficiency of a set of theoretical concepts,and to provide an explanation for observedhuman behavior. Becausecognitive models use many techniques and ideas from AI, they are similar to AI proglams. But the goals of cognitive modeling and AI tend to be substantially different (seeRef. 1). Briefly put, the goal of AI is to build intelligent machines, whereas the goal of cognitive modeling is to build models of human mental mechanisms. These activities are very similar, but they differ mainly in the criteria for success. Again briefly put, the quality of a pieceof AI work is measured in terms of how well the machine is able to perform the task. In a cognitive modeling effort the question is not only whether the computer program is able to perform the task but also the extent to which it behaveslike a human performing the same task and whether the mechanisms involved are plausible theoretical explanations for human mental processes.Notice that in AI terms these mechanisms may be inefficient or unnecessarily complex for the task. This entry touches on the contribution of cognitive modeling to AI. It is not a commonly acceptedidea, but cognitive modeling work is relevant to AI in that some of the mechanisms in cognitive models are applicable to AI problems.
111
Purposesof CognitiveModeling The rationale for cognitive modeling is best seen in terms of the history of theoretical developmentin cognitive psychology. Except for the temporary aberration of behaviorism, the goal of experimental psychology over the last century has always been to construct an adequate theory of the mental processes that underlie behavior. An adequate theory of the human mind would explain the observed behavioral data in terms of plausible internal mechanisms. The traditional mode for describing such mechanismshas been in the form of verbal statements. As the ideas get more complex, such verbal theories become difficult to handle. Thus, there is a need to express psychological theory precisely and to demonstrate that theoretical conceptsare actually sufficient to explain the behavior and to derive testable predictions about data in a rigorous fashion. The idea of rigorous theoretical models in experimental psychology is a fairly old idea; an excellent early example is the work of Hull during the 1940s,who constructedone of the first large-scalemathematical theories of behavior. During the fifties and sixties mathematical models of psychological processeswere developed.These models represented perceptual and learning situations as stochastic processes,which were very successfulin accounting quantitatively for many details of human behavior. See Ref. 2 for a summary of these approaches. This combination of verbal and mathematical theory has producedwhat might be termed the "standard" theory of cognition, which is basedon a decompositionof the human mind into major components. These consist of structures such as short-term memory and long-term memory and processessuch as recognition, memory stor dge,and memory retrieval, which processand manipulate the information stored in the structures. This theory is the basic framework for most current cognitive models. As interest in cognitive psychology moved from simple learning (qv) and perception (see Vision, early) to complex behavior such as reasoning (qv) and reading comprehension (see Natural-language understanding), the mathematical models seemed to be inadequate because they characterized behavior in terms of a small number of continuous mathematical variables; it seemedthat complex qualitative, or symbolic, systems were neededinstead, especially in order to represent knowledge (see Representation, knowledge). In addition, many researcherscame to feel that a psychologicaltheory or model should describe the processesgoing on in the mind rather than simply providing a chara cterization of the statistical properties of the behavior (3). Thus, computer programs, in which these complex entities can be representeddirectly, became the ideal mode for expressing theory (4). Perhaps the most important event in symbolic cognitive modeling was the adoption of semantic networks (qv) from AI. For cognitive psychologists the significance of the semantic network representation was that it provided a representation of knowledge in a form that tied into the classical conceptof associationvery well (seeRef. 5 for a comprehensivereview of this topic). Semantic networks were so appealing theoretically that AI quickly became of intense interest to cognitive psychologists,and cognitive simulation models were the best way to incorporate AI concepts into cognitive theory. Currently, there seems to be a consensus that cognitive simulation models best represent the core theoretical conceptsin cogni-
112
COCNITIVEMODTTING
tive psychology.However, it is important to note that despite the recogntzedimportance of cognitive simulation models and the AI conceptsthat underlie them, relatively few cognitive psychologists actually construct and make use of simulation models (seeRefs. 6 and 7 for further discussion). Evaluationof CognitiveModels TheoreticalQuality. Since cognitive psychologyis an empirical sciencethat is attempting to construct explanatory theory, the quality of a cognitive model dependsboth on its ability to mimic observedbehavior and on the quality of the model as a piece of theory (6,7). Most of the extant cognitive modeling work has been done with the primary theoretical goals of demonstrating that a theory is sufficient to produce the behavior and of stating the theory rigorously. Beyond these concerns, the architectural integrity of the model is critical. Does the model make consistentuse of a set of explicit theoretical mechanisms that comprisea cognitive architecture or doesit appear to contain ad hoc, arbitrary mechanisms?If the architecture has been maintained, it will be relatively clear how the model works; a theory is of little value if it cannot be understoodby the scientists in the field. Thus, there is a great premium on the model having a basically simple and consistently maintained architecture. EmpiricalQuality. One criterion for empirical quality is apparent realism, which is the criterion that most AI projects attempt to meet; the system must be able to produce apparently realistic behavior. That is, most natural-language-processing systems are designed so that they appear to do the correct thing with the input. It is not necessaryto evaluate such systems on a systematic scientific basis becausethe established usage of language is adequate to characterize whether the model is reasonably correct. But, more recently, simulation models have been used to accountfor experimental data in great detail. Thus, it is desirable for a model to go beyond the apparently realistic stage and to account for data in a detailed way, preferably in a predictive rather than in an after-the-fact manner. In many casesthe time characteristics of the model and of human behavior are compared;somemeasure of processingtime or effort in the model should correspond to processingtime on the part of humans. The Nature of BehavioralData. There are somecharacteristics of behavioral data that are probably not obvious to those not familiar with cognitive psycholory. First, contrary to intuition, and perhaps common sense,introspection (observationof one's own thought processes)is neither a reliable nor a complete source of information about mental processes(seeRef. 8 for a history of this subject).The basic problems are that such observationsare highly idiosyncratic, easily distorted by subjective bias on the part of the observer,and more importantly, especiallythoseof interest most of the major mental processes, to AI, go on below the level of consciousawareness.The popular "think-out-loud" protocol-data are not strictly introspective data, but they suffer from related problems. Thus, modern cognitive psychology is based on behavioral, rather than introspective, data. Second,behavior is highly variable and subjectto the influence of many factors. This means that it is essential that behavioral data be obtained by the use of careful experimental methods and appropriate statistical analysis of the results. To
outsiders,this meticulousnessmay be hard to understand,but it is very easy to collect data that are worthless and misleadittg because of improper attention to such considerations. Third, human behavior is strongly determined by the task that the person is trying to do, meaning that the task should be carefully characterized, and inference from data to the internal processesmust be qualified by the task. Thus, the accuracy of a cognitive model is determined by how well it fits properly collected data on behavior in a suitable task, not by how well it agrees with the modeler's subjective impressions concerningmental processes. Finally, and most important, it is normally necessary to constrain a person'sbehavior in order to study it conveniently. This means that much more is known about certain aspectsof mental processesthan others. For example, perceptual processesare perhaps the best understood because the experimenter has great control over the stimulus and can require the subjectto producevery simple responsesbasedonly on observable propertiesof the stimulus. In more complexbehavior such as problem solving (qv), the behavior of a personbecomesless determined by specific features of the stimulus and more by the person'sinternal knowledgeand processes, such as his/her representation of the task. Normally, data from more complex tasks are much less reliable statistically and much harder to interpret. Thus, perhaps the most interesting processes,such as reasoning and problem solving, are the hardest to work with in terms of both data collection and the construction and evaluation of simulation models. Surveyof CognitiveModels This survey is limited to those modeling efforts in which modeling human behavior was a direct goal, as opposedto "pure" AI projects.Note, however, that Bower and Hilgard (2) use several AI projects directly as psychologicalmodelsbecausethese are the most complete and explicit statements available of certain theoretical mechanisms.Given below is a brief description of a variety of simulation models, grouPedby the cognitive processesunder investigation. Basic Approaches.There are three basic approachesthat have been used in cognitive models. In the first, basically a numeric simulation approach,representationsof somesort are activated in specifiedways over time, and the representations interact in terms of their activation. This is a very old concept in psychology;precursorsof it can be found in James (9) and Hebb (10), and it has great appeal becauseof its neurological flavor. This modeling work focusesmostly on the mathematical specificsof the time courseof activation and how the representations interact. The secondbasic approach involves the manipulation of symbolic structures that represent knowledge, essentially the same approach as current "mainstream" AI. Many of the cognitive models that are describedhere take this form. The third approach is a hybrid of the activation and symbolic approaches.That is, which knowledge structures are paid attention to and manipulated is determined by activation that typically spreads from one piece of knowledge to another. Quillian's (11) use of spreadingactivation is one of the original systems of this type. Perception.Perceptualprocesseshave usually beencharacterized as low-Ievel processesin which activation mechanisms
C O G N I T I V EM O D E L I N G
11 3
perhaps the earliest to point out the extreme amount and complexity of human knowledge when expressedin these terms. He suggestedthat the complexity of human thought, and its idiosyncrasies between individuals, could probably be accounted for in terms of the differences in knowledge rather than differences in basic cognitive processesthat use the knowledge. This is a precursor of the current emphasis on knowledge-basedsystemsin both AI and cognitive modeling. The work by Quillian (11) and Collins and Quillian (28,29) introduced the idea of semantic networks to cognitive psychology. This knowledge representation was widely acceptedbecauseit put the classic conceptof associationinto a form adequate to represent knowledge. The Collins and Quillian work led to the idea of cognitive economy, in which inheritance relations are used to reduce the amount of stored information, and the basic mechanism of spreading activation is used to explain how knowledge can be retrieved in terms of its relevance to currently active knowledge (21,30). Learning.Some of the earliest cogniti{'e models dealt with Important early models of semantic memory (seeMemory, The first was the classicEPAM (qv) model learning processes. of Simon and Feigenbaum (17), which constructeda discrimi- semantic) were Rumelhart, Lindsey, and Norman's (31) LNR nation network in order to perform simple learning. The pre- and Kintsch's (.32)model, both of which were based on case sented stimulus was sorted by the net to find the response;if grammar (seeGrammar, case)representations,and Anderson the responsewas incorrect, the net would be modified to pro- and Bower's (5) influential HAM model, which used a repreduce a new path to the comectresponse.EPAM is an example sentation similar to predicate logic (seeLogic, predicate).The of how a model of a psychologicalprocesscontributed to the contrast between systems like LNR and HAM shows how difdevelopmentof an important AI technique,the discrimination ferent representation systems can be developedthat are apnet. Hintzman (18) Iater built a more elaborateversion, SAL, parently adequate to represent human knowledge but have in which additional mechanisms were added to account for a substantial notational differencesand cannot be distinguished variety of experimentally observedphenomenaof interference from each other empirically [i.e., the problem of nonidentiand forgetting in simple learning situations. This early work fiability (4,2U1.The casegrammar form of representation has on learning was not followed up for quite a few years; instead become very popular both in cognitive modeling and in AI. most simulation efforts focused on models of performance However, the Anderson and Bower HAM model was probably more influential, simply becausethey made a special effort to rather than learning. try to bring their model in line with data on human perforfocused various has on Anderson work by More recently, The LAS system (19) learned the grammar mance.Also, Anderson'swork has been more concernedwith learning processes. for a language by constructing an augmented transition net- explicitly stated architectures for cognitive processes,which work (ATN in responseto pairs of semantic representations makes the theoretical status of the models more clear (21). and input sentences (see Grammar, augmented-transitionnetwork). In the ACT model (20,21)a distinction is made beLanguageComprehension.Comparableto the large amount tween procedural knowledg", representedas production rules, of work in AI on natural-language processing has been the and declarative knowledge, usually represented as proposi- considerableprogresson cognitive models of how humans untions in a semantic network. The production rules examine derstand language, usually in the context of reading compreand act on the semantic network. Only semantic representa- hension. Kintsch and Van Drjk (33) developeda model for how tions that are active, as a result of spreading activation, can people acquire and recall information from text, which has trigger the production rules. Considerableattention is paid to becomeone of the most important theoretical representations the mechanismsby which new proceduralknowledge,such as of comprehensionand memory processes.The model begins a skill, is learned; new production rules are acquired and re- with a representation of the propositional content of the input fined through practice. This approach has been applied to the text and selectswhich propositionsare to be retained in the learning of geomefuyQ2) and learning programming in LISP system'slimited short-term memory as it goesfrom one sen(qr) (zil. Kieras and Bovair (24) and Kieras and Polson(25,26) tence to the next using simple heuristics that are based prihave applied a similar, but greatly simplified, analysis to ac- marily on how the propositions are connectedto each other. count for the learning of skills in interacting with equipment. According to a basic principle of human learning, propositions Thus, the representation of learning as the acquisition and that reside in short-term memory longer are more likely to be refinement of production rules appears to be a powerful and transferred to long-term memory and thus recalled better. comprehensiveapproach. This model can account for what is rememberedfrom a text in a variety of reading and memory situations. Memory Organization and Processes. Most simulation A model by Kieras (34) used an ATN parser in conjunction models of mernory have dealt with long-term memory, which with a semantic network knowledgg representation and is the repository of general knowledge. This concern with spreading activation-memory searchmechanism and was able knowledge representation makes long-term memory a fruitful to account in considerabledetail for the time required to read area for application of AI concepts.An early paper by Frijda sentencesin simple passagesunder different task conditions.In (27)presentsthe basic idea that knowledgecan be represented another model Kieras (35) showed how certain higher-level in terms of labeled associationsbetween concepts.Frijda is comprehensionprocessescould be representedusing produc-
are of primary importance. For example, the Mcclelland and Rumelhart (L2,13) model recognizesfour-letter words using a network of representations of letter features, letters, and words, which activate and inhibit each other. The network reachesa stable state in which the representation of the presentedword is the most activated. Interestingly, there has not been much follow-up of the classic blocks-world work in AI (14),in which perceptionis seenas a matter of matching schemas for known objects against perceptual input. Although these conceptsare central to current cognitive theory (L5,2), there has been little or no attempt to construct and evaluate simulation models of perceptual processesin this domain. Perhaps the best simulation of higher-order perception is the Simon and Gilmartin (16) model of chessexpertise.This system learns to recognize patterns of pieces on the chessboardby building a discrimination net.
114
COGNITIVEMODETING
tion rules to perform inferenees on a propositional representation of the text content. The model was able to recognize or extract generaltzatronsfrom simple passagesin a manner similar to that used by human readers. Perhaps the single most comprehensivesimulation model of comprehension is that of Thibadeau, Just, and Carpenter (36), again using a combination of production rules, propositional representation, and activation mechanisms.This model captures the highly parallel and interactive processingthat apparently goeson in reading, all the way from syntactic analysis to the application of general knowledge. It was able to account for extremely detailed timing data from eye movement recordings of humans reading technical passages. ProblemSolvingand Reasoning.According to a classicpaper by Newell (1), this areais the most important one for AI, but it is one of the most difficult topics in cognitive psychology,&s pointed out above. The best known work in this field is the GPS model by Newell and Simon (37,38),which introducedthe idea of means-endsanalysis. It is very influential as a model for the methods humans use to solveproblems,as well as being one of the first representatives of what is now termed "weak methods" in problem solving. Another example of early work on problem solving is that of Simon and Kotovsky (gg),which was also one of the earliest cognitive simulation models. This was a model of how series completion probleffis, which often appear on IQ tests, could be solvedby recognrzingthe patterns of repetition and succession.The model was able to accountfor which problems would be the easiest and most difficult for people.Anderson, Greeno, Kline, and Neves (40) represented many of the processesinvolved in solving elementary proof problems in geometry with a system involving both semantic structures and production rules. The system would acquire and apply schemasrepresenting proof approaches.Hayes-Roth and Hayes-Roth (41) constructed an influential model of planning that represented how people would select a route in an errand performing task. This model was basedon a blackboard knowledge source architecture. ContributionTo Al
all goal of cognitive modeling is to arrive at a comprehensive architecture that is adequate for cognition, rather than simply constructing a multitude of unrelated special-purposesystems. Some of the specific cognitive architectures resulting from cognitive modeling might becomedirectly applicable,but by adopting this architectural approach, future work in AI could probably becomemore focusedtheoretically.
BIBLIOGRAPHY 1. A. Newell, Remarks on the Relationship Between Artificial Intelligence and Cognitive Psychology,in R. Baner{i and M. D. Mesarovic (eds.), Theoretical Approaches to Nonnumerical Problem Soluing, Springer-Verlag,New York, pp. 363-400, 1970. 2. G. H. Bower and E. R. Hilgard, Theories of Learning, 5th ed., Prentice-Hall, EnglewoodCliffs, NJ, 1981. 3. L. N. Gregg and H. A. Simon, "Process models and stochastic theories of simple concept formation," J. Math'. Psychol. 4, 246276 ( 1967). 4. D. E. Kieras, Knowledge Representationsin Cognitive Psychology, in L. Cobb and R. M. Thrall (eds.),Mathematical Frontiers of the Social and Policy Sciences,AAAS Selected Syrnposium 54, Westview,Boulder CO, pp. 5-36, 1981. 5. J. R. Anderson and G. H. Bower, Hurnan AssociatiueMeffiotA, Winston, Washington, DC, 1973. 6. D. E. Kieras, A Simulation Model for the Comprehensionof Technical Prose,in G. H. Bower (ed.),The Psychologyof Learning and Motiuation, Vol. 17, AcademicPress,New York, pp. 39-80' 1983. 7. D. E. Kieras, A Method for Comparing a Simulation Model to Reading Time Data, in D. Kieras and M. Just (eds.),New Methods in Reading ComprehensionResearch,Erlbaum, Hillsdale, NJ, pp. 299-325, 1984. 8. G. Humphrey Thinking: An Introduction to lts Experimental Psychology,Wiley, New York, 1963. 9. W. James, The Principles of Psychology,Henry Holt & Co., New York, 1890. 10. D. O. Hebb, The Organization of Behauior, Wiley, New York, L949. 11. M. R. Quillian, Semantic MemorY, in M. Minsky (ed.),Semantic Information Processirg, MIT Press,Cambridgu,MA, pp. 227-270, 1968. 12. J. L. McClelland and D. E. Rumelhart, "An interactive model of context effects in letter perception: Part 1. An account of basic findings," Psychol.Reu. 88, 375-407 (1981).
One way in which cognitive modeling work can contribute to AI is in the development of specific conceptsand techniques. Several approaches,such as discrimination nets and probably the idea of rule-based systems (qv), apparently developedas 13. D. E. Rumelhart and J. L. McClelland, "An interactive activation cognitive models at the same time &s, if not prior to, their model of context effects in letter perception: Part 2. The contexadoption as pure AI techniques. For example, the standard tual enhancement effect and some tests and extensions of the approach used in expert systems(qv) probably developedfrom model,"Psychol.Reu.89n60-94 (1982). the basic characterization of human expertise as the ability to L4. P. H. Winston (ed.),The Psychologyof ComputerVision, McGrawrecognize patterns, which could then be representedas a set of Hill, New York, L975. production rules. Tracing out the exact lines of descent of 1b. S. E. Palmer, Visual Perceptionand World Knowledge:Notes on a lh.r. ideas is beyond the scopeof this entry, but it certainly Model of Sensory-CognitiveInteraction, in D. A. Norman and D. made have efforts modeling cognitive that historically appears E. Rumelhart (eds.),Explorations in Cognition, W. H. Freeman, San Francisco,PP. 279-307, L975. important contributions to AI. One prime candidate for a new contribution is the general 16. H. A. Simon and K. Gilmartin, "A simulation of memory for chess positions,"Cog. Psycltol.5, 29-46 (1973). approach currently used in cognitive modeling. Since cogniposition theoretical specific a with developed are H. A. Simon and E. A. Feigenbaum,"An information-processing 17. models tive theory of some effectsof similarity, familiarization, and meaningin mind, they normally proposean explicit cognitive architectypes data fulness in verbal learnitg," J. Verb. Learn- Verb. Behau.S' 385basic of set ture. This consistsof a relatively small 396 (1964). more comthe all constructed are which and processes,out of net model plex knowledge representations and processesthat the model 18. D. L. Hintzman, "Explorations with a discrimination (1968). 123-62 5, Psychol. g:' Math. J. learnin paired-associate for overthe uses to represent human mental mechanisms.Thus,
COGNITIVEPSYCHOLOGY 19. J. R. Anderson, Computer Simulation of a Language-Acquisition System,in R. L. Solso(ed.),InformationProcessingandCognition: The Loyola symposium, Lawrence Erlbaum, Hillsdale, NJ, pp. 295-349, 1975. 20. J. R. Anderson, Language, Memory, and Thought, Lawrence Erlbaum, Hillsdale, NJ, I976. ZL. J. R. Anders on, The Architecture of Cognition, Harvard University Press, Cambridg", MA, 1983. 22. J. R. Anderson, Acquisition of Proof Skills in Geometry, in J. G. Carbonell, R. Michalski, and T. Mitchell (eds.),Machine Learning, an Artificial IntetligenceApproach, Tioga, San Francisco,CA, pp. 1 9 1 - 2 L 9 ,1 9 8 2 . 23. J. R. Anderson, R. Farrell, and R. Sauers, Learning to Plan in LISP, Technical Report #ONR-82-2, Carnegie-MellonUniversity, Pittsburgh, PA, 1982. 24. D. E. Kieras and S. Bovair, The Acquisition of Proceduresfrom Text: A Production-System Analysis of Transfer of Training, Journal of Memory and Languages 25, 507-524 (1986). 2b. D. E. Kieras and P. G. Polson,"An approachto the formal analysis of user complexity,"Int. J. Man-Mach. Stud. 22,365-394 (1985).
115
COGNITIVEPSYCHOLOGY The term artificial intelligence evokes a contrast with the "natural" intelligence of higher organisms' most notably human beings. Somewould argue that AI, as definedby successful AI prograffis, is likely to prove qualitatively different from the nalural variety. Another view, however, is that AI should be directed toward imitation of the cognitive capabilities of humans. The latter view suggests that AI should be closely linked to cognitive psychology,the field that investigates how people acquire knowledge, remember it, and put it to use to make decisions and solve problems.
History and Scope. Cognitive psychology, also sometimes called information-processing psychology, is currently the leading area of human experimental psycholory. The origins of the field can be traced to nineteenth-century psychologists such as James (1) and the German Gestalt psychologistssuch as Duncker and Wertheimer (2,3). For much of the twentieth 26. P. G. Polson and D. E. Kieras, A Quantitative Model of the Learncentury up until about 1960, however, American psychology ing and Performance of Text Editing Knowledge, In L. Borman was dominated by behaviorist theories that eschewedany refand B. Curtis (eds.), Human Factors in Computing SystemsProerence to unobservable mental processes.The modern revival pp. ceedings,Special Issue of Sigchi Bulletin, San Francisco,CA, of cognitive psychology was fostered in part by developments 207-2L2, 1985. in other disciplines, most notably linguistics and computer 27. N. H. Frijda, "simulation of human long-term memory," Psychol. science.In linguistics Chomsky's theory of generative gramBull. 77, L_3L (L972). mar coupledwith his scathing critique of behaviorist accounts 28. A. M. Collins and M. R. Quillian, "Retrieval time from semantic of language use provided the impetus for cognitive approaches memory," J.Verb. Learn. Verb. Behau. 8r 240-247 (L969). to language in the new field of psycholinguistics (4,5). In com29. A. M. Collins and M. R. Quillian, How to Make a Language User, in E. Tulving and W. Donaldson(eds.),Organizationand Memory, puter sciencethe digital computer became a striking example of an information-processing system in which observable inAcademicPress,New York, pp. 309-351, 1972. put-output relations clearly dependedon complex but well30. A. M. Collins and E. F. Loftus, "A spreading activation theory of specified intervening computational steps, discrediting the semantic processing,"Psychol.Reu. 82, 407-428 (1975). behaviorist claim that only observable stimulus-response 31. D. E. Rumelhart, P. H. Lindsol, and D. A. Norman, A Process Model for Long-Term Memory, in E. Tulving and W. Donaldson relations were respectable objects of scientific scrutiny. The (eds.),Organization and Memory, Academic Press,New York, pp. use of the computer as a model for theories of human intelliL97-246, 1972. gencerose to prominence in a seminal book by Miller, Galan32. W. Kintsch, The Representationof Meaning in Memory, Lawrence ter, and Pribram (6), a work that set the stage for a book by Erlbaum, Hillsdale, NJ, 1974. Neisser (7) that gave the field of cognitive psychologyits mod33. W. Kintsch and T. A. van Dijk, "Toward a model of discourse ern identity. The landmark computational account of human comprehensionand production,"Psychol.Reu.85, 363-394 (1978). problem solving by Newell and Simon (8) established a firm 34. D. E. Kieras, "Componentprocessesin the comprehensionof sim- link between the view that AI should strive to imitate human ple pros€,"J. Verb. Learn. Verb. Behau. 20, L-23 (1981). cognition and the view that computer simulations afford test35. D. E. Kieras, "A model of reader strategy for abstracting main able theoretical models of cognitive processes.(For a thorough ideas from simple technical prose," Text 2, 47-82 (L982). survey of the origins of cognitive psychologyseeRef. 9, as well 36. R. Thibadeau, M. A. Just, and P. A. Carpenter, "A model of the as the historical review in Ref. 8.) In recent years work in time courseand content of readiog," Cog. Sci.6, 157-203 (1982). cognitive psycholory has becomeincreasingly integrated with 37. A. Newell and H. A. Simon, Human Problem Soluing, Prentice- work in AI, neuropsychology, linguistics, and philosophy Hall, EnglewoodCliffs, NJ, L972. within the emerging field of cognitive science. 38. G. W. Ernst and A. Newell,GPS:A CaseStudyin Generalityand Human cognition is a complex and highly interactive sysProblem-Soluing, AcademicPress,New York, 1969. tem that does not lend itself to tidy compartmentalization; 39. H.A.SimonandK.Kotovsky,"Humanacquisitionofconceptsfor however, it is useful to divide cognitive psychologyinto five sequentialpatterns,"Psychol.Reu.70,534-546(1963). subareas. These are perception, attention, memory, thinking, 40. J. R. Anderson,J. G. Greeno,P. J. Kline, andD. M. Neves,Acqui- and language, each of which are discussed below. Like any sition of Problem-Solving Skill, in J. R. Anderson(ed.),Cognitiue scientific discipline, the scopeof cognitive psychologyis delineSkills and Their Acquisitioz,Erlbaum,Hillsdale,NJ, 1981. ated not only by its subject mattei but also by the methods it 41. B. Hayes-Rothand F. Hayes-Roth,"A cognitivemodelof plan- employs. A variety of research methods are commonly used, ning," Cog.5ci.3,275-BL0 (1979). including measurement of reaction time to perform simple tasks, patterns of eye movements, distributions of types of
i'"Hil.Tl'-1tf,i'j3ffiJff D Krunes ffiiTiJ,t'i::[?f;,liif fa"f processes University Michigan has of
been to attempt to decomposecognitive
into com-
116
coGNtTtvE PSYCHOLOGY
ponents and to estimate the temporal relations among them (LZ-L4). CognitivePsychologyand Al. Cognitive psychologyand AI have been closely intertwined since the inception of each. AI has provided cognitive psychologywith both a methodological tool and theoretical formalisms. Given the highly interactive nature of human cognition, computer simulation is often a useful tool for deriving predictions from a complex model. At the theoretical level, cognitive psychologyhas adaptednumerous conceptsthat were developedin computer sciencein general and AI in particular [e.g., content-addressablememory (seeAssociative memory), semantic networks (qv), and blackboard models (qv)1. Early work in cognitive psychologyyielded theoretical concepts that anticipated some that are now being explored within AI. For example, Bartlett (15) introduced the conceptof a schema, a knowledge structure that actively generates expectations based on regularities abstracted from past experience. Such AI concepts as frames (qv) and scripts (qv) are variants of the schemaconcept(16-18). Tolman's (19) work on mental maps and the representation of expectancieswas a precursor to current conceptionsof mental models (20). More generally, empirical and theoretical work in cognitive psycholory has yielded a clearer understanding of somegeneral principles of human information processing that can help direct developmentof AI systemsmodeled after human cognition. In particular, as is elaborated below, human intelligence appears to be based on multiple representational codesfor knowledge (e.g., visuospatial as well as linguistic), on a great deal of parallel processing of information, and on inference patterns that depend on similarity and associativelinks more than on strictly deductive logic (seeInference, logic). These properties of human information processing seem to be inextricably linked to powerful learning (qr) mechanisms, ranging from elementary detection of covariations among properties of the environment to exploitation of analogies between knowledge acquired in different domains QL). These learning mechanisms allow humans to avoid the "brittleness" of typical AI expert systems (qv), which generally lack humanlike flexibility in adapting themselves to changes in their initial domain of application. Theoretical approaches to cognition increasingly tend to link cognitive psychologyand AI. This is particularly evident in the case of the two major types of formalisms in which cognitive models are currently being developed,namely, production systems (see Rule-based systems) and connectionist neural networks (see Connectionism). Systems based on production rules were first introduced into cognitive psychology as modelsof human problem solving (8); later developmentsby Anderson and others (22-2il extended versions of production systems,sometimes coupled with semantic networks, to serve as models of other cognitive processes.Recentwork has begun to exploit the modularity of rule-based systemsto provide accounts of learning in terms of generation of new rules. Connectionist models Q6-Zg represent a current resurgence of interest in modeling cognitive processesat a relatively microscopiclevel of analysis analogousto neural units, as in earlier psychologicaltheories such as that of Hebb (29). Whereas production systems were first proposedas models of higher level thought processesand then pressed"downward"
to attempt to account for more elementary processes,connectionist models were first applied to basic perceptual processes and are currently being pressed "upward" to attempt to account for phenomena that seem more conceptual. Connectionist models place much greater emphasis on parallel processing than production system models tend to do. It is noteworthy, however, that production system models in psycholory, unlike their AI counterparts, often assume parallel changes in the degree of activation of knowledge in memory. The phenomena of human cognition seems to impose some form of parallelism on psychological theories. A theoretical frontier in cognitive psycholory is likely to center on attempts at integrating ideas derived from rule-based systems with those derived from neural modeling. Major Areasof Research The survey of active research areas in cognitive psychology presented below is of necessity selective and incomplete. In addition, much more could be said about the interconnections between the various areas of research. More extensive and integrative reviews can be found in recent textbooks (30,31). Perception.The earliest stages in perception, such as extraction of information directly from the retinal image, are usually considered outside the scopeof cognitive psychology (although early perception is an important topic in experimental psychology and is clearly relevant to AI). Cognitive work on perception is concernedwith the construction of meaningful patterns from elementary components,with vision (qv) receiving by far the most attention. Since the classical research of Gestalt psychologistssuch as Wertheimer (32), a basic concern has been with the principles that govern the construction of relatively constant interpretations of perceptual inputs despite wide variations in the input itself. For example, a square is perceived as such even though it may be tilted in various directions, partially occluded, or composedof broken rather than solid lines. An important theoretical position associated with Gibson (33) is that perception dependson the detection of invariant properties of the distal stimulus (i.e., the object in the environment), which either remain constant or change systematically as the proximal stimulus (i.e., the retinal image) undergoes a wide range of variations. The relationship between the Gibsonian position and AI work in vision is discussedin Ref. 34. Recentresearch has made considerableprogressin addressing the longstanding and basic issue of identifying the elementary features the human visual system detects and uses to construct visual patterns. Treisman and Gelade (35) used a selective-attention task (seebelow) to identify a level of visual processingin which the color, form, and location of an object appear to be represented as separate features not yet integrated into a unified representation of an object. These features are detected in parallel acrossthe entire visual field so that time to detect a target embeddedin an array is independent of the number of elements in the array if and only if the target can be consistently discriminated from all distractors by considering a single feature (a phenomenonreferred to as "pop-out"). In contrast, discrimination must be based on a slower serial processwhen features (see Feature extraction) must be combined to identify the target. Other work has used
COGNITIVEPSYCHOLOGY
similar techniques to identify someof the elementary features that compor" ,ritrral forms (36,37).Rock (38) provides a lucid introduction to the topic of perception. Attention. The core issue in theories of attention concerns information reduction. Because humans are constantly faced with an immense amount of information as the result of both perception and memory retrieval and are lirnited in theit capacity to processit, they must be selective in their analysis of inputs. The basic idea that humans can be viewed as limitedcapacity information-processingsystemswas first proposedby Broadbent (39) and became a cornerstone of cognitive psychology. This cornerstone, however, has been the focus of controversy since it was first erected. At issue is the degree and locus of parallelism in information processing.Broadbent proposed that inputs are "filtered" early in perceptual processing and that only a selectedfew are processedat higher levels (e.g.,&t the level of meaning). Soon afterward, however, evidence accrued that people occasionally respond to the meaning of highty familiar inputs (i.e., their names) even when the inputs are unattended, suggesting that unattended inputs are attenuated rather than filtered entirely (40). These early-selection models, which emphasized limits on perceptual processing, were subsequently challenged by late-selectionmodels (4\), according to which all inputs are processedto the level of meaning, with selection occurring only among responsesto the inputs. Late-selection models imply a greater degree of parallel processingthan do early-selection models.The conceptof "automaticity" has been invoked to explain why humans can perform sometasks in parallel whereas others demand serial processing (42,43).The general notion is that particular types of experience result in a decrease in the capacity required to perform tasks so that multiple tasks can be performed concurrently without interference. Development of automaticity is sometimestheoretically associatedwith a reduction in control so that the person is unable to avoid making an overlearned automatic responseto an input (e.g.,accessingthe meaning of a familiar word). An important form of automatic responding is revealed by the tendency for processingof an input rapidly to "prime" related inputs (e.g., words of similar meaning) so that subsequentprocessingof related inputs is facilitated (44). Neely @5) provided evidence of rapid automatic facilitation and slower consciousinhibition of the processingof inputs. The relationship between selectivity and automaticity remains controversial. The various theoretical properties of automaticity do not always co-occur,and putative evidence for capacity-free processing beyond a stage of early perceptual selection has been challenged (46). For recent analyses of issues in attention see Refs. 47 and 48. Memory. Research on memory is concernedwith the processesby which information is stored,retained over sometime interval, and subsequently retrieved. Memory is intimately related to perception and attention since memory is often the incidental by-product of attentive perceptual processing. Learning roughly correspondsto the storagephase of memory; however, except for purely rote memory (if such a thing exists), learning typically implies some degreeof generalization or integration of new information with old. A story, for exam-
117
ple, is remembered as a hierarchical structure that reflects schematicknowledge about similar episodes(49,50)(seeStory analysis). Learning extends to the acquisition of knowledge more general than specificperceptualinputs, aswhen a child acquires a general notion of what "dog" means from experience with particular exemplars.Reviews of the extensive literature on human memory can be found in Refs. 30, 31, 51, and 52. Early theories of memory in cognitive psychologyproposed a fundamentat distinction between short-term and long-term memory stores (53,54). The short-term store was viewed as a bottleneck that limited the rate at which information can be transferred into permanent long-term storage. This view has since been modified, partly owing to the influence of criticisms launched by Craik and Lockhart (55). Current theories tend to view human memory as an essentially unitary system in which the short-term store (often called "active" or "working" memory) correspondsto the portion of the system currently in a highly active state. It is widely acknowledgedthat incoming perceptual inputs quickly make contact with representations in long-term memory in a parallel fashion. The most important limit on the eventual retrievability of information is the time required to associatethe input with other information in memory that will afford potential retrieval cues. The nature of the stored representation of an input, referred to as a memory "ttaeer" is currently a matter of controversy. Many theories represent the trace as a Iocahzednode or set of nodes in a semantic network. An alternative view favored by connectionist models is that memory representations are distributed, with a trace corresponding to a pattern of activity across neural units that tends to be reinstated upon re-presentation of the same or a similar input. It remains unclear whether the localized versus distributed views of the memory trace can be reconciled. Theories of memory must accommodateevidence that memory retrieval sometimes resemblesautomatic activation of a trace by a retrieval cue and sometimes resembles a slow search process much like consciousproblem solving. Another controversial issue involves evidence suggesting that memory traces can be formed in qualitatively different codes.The focus of debate has centered on mental imagery, a memory code that preservesthe spatial and visual properties of perceptual inputs (seeAnalog representation). Shepard and his colleagues(56) demonstrated that when people are asked to judge whether two visual forms are the same despite a differencein orientation, the time to make the decisionincreases linearly with the difference in orientation, &s if people "mentally rotated" one of the objectsto place it into correspondence with the other. Kosslyn (57) proposedthat images can be constructed in an inner "space" analogous to a display screen attached to a computer and that the results of spatial transformations can be "read off" of the imaginal representation. AIthough the psychological and philosophical implications of mental imagery are still debated (58), the existence of perceptlike memory traces is supported by a large body of converging evidence. Human memory stores not only representations of specifi.c experiencesbut also representations of categories of experience.A great deal of research in cognitive psychology,particularly that of Rosch and her colleagues(59), indicates that natural human categories tend to be organized around clear prototypical exemplars but have relatively ill-defined bound-
118
COGNITIVEPSYCHOLOGY
aries. Recent work on categorization has centered on the mechanisms by which categories are induced from experience with exemplars and the form in which categories are representedin memory (60,61).The localizedversus distributed debate is particularly prominent in discussionsof categorization, as various theories suggest that categoriesare representedby sets of separate traces of category exemplars, a distributed representation formed by superimposition of such traces in a network of neural units, or more localized category nodes formed by inductive mechanisms such as generahzation.
mental psychologyon language acquisition. Psycholinguistics was initially devoted to tests of Chomsky's theory of transformational grammar (qt) as a performance model (seeLinguistics, competenceand performance)and was heavily influenced by his nativist position regarding language acquisition (qv). Transformational grammar failed as a performance model of actual language use (7+1, and strongly nativist accounts of language are now regarded as suspect. Explorations of the relationship between language and other cognitive processes, such as memory and learning, have led to greater integration of psycholinguistic theories with models of other aspects of Thinking. Thinking involves the active transformation of cognition, as illustrated in Refs. 24, 25, 27, 49, and 50. (For existing knowledge to create new knowledge that can be used reviews of research in psycholinguistics see Refs. 75 and 76.) Initial lexical accessof word meanings (at least for familiar to achieve a goal. The topic can be loosely divided into reasoning (qv) (drawing inferences from current knowledge or be- meanings) appears to be extremely rapid and initially quite liefs), decision making (the evaluation of alternatives and independent of contextual constraints (77), consistent with choiceamong them) (seeDecision theorY),and problem solving other evidenceof parallelism in basic recognition processes.At (qv) (methods for attempting to achieve goals). These topics a global level language comprehensionappearsto reflect parare closely intertwined and reflect different emphasesand ex- allel analysesof speechsounds(seeSpeechunderstanding) (or, perimental paradigms rather than strong conceptual distinc- in the case of readirg, visual features of words (seeCharacter recognition)), syntactic and semantic constraints (see Gramtions. Given the obvious power of human intellect, it is rather mar articles; Semantics), and the pragrnatic cues to meaning paradoxical that much of the work on thinking has served to provided by conversational contexts,integrated to make serial reveal ways in which human reason departs from the norma- decisions about the interpretation of the incoming speech tive standards set forth by such disciplines as statistics and stream (seeDiscourseunderstanding). Blackboard models (qv) Iogic. The research of Kahneman and Tversky and others (62- of the sort implemented in the Hearsay system for speechrec64) indicates that intuitive decision making is often based on ognition (78) constitute plausible descriptions of the general easily used but fallible heuristics. These heuristics are closely nature of human language comprehension. tied to basic memory processes,such as the easeof retrieving information from memory (the availability heuristic) and the Future Prospects similarity of an instance to a category prototype (the represenThe links between the aims, methods, and theories in AI and tativeness heuristic). (For a theoretical analysis of similarity judgments see Ref. 65.) Similarly, work on human deductive cognitive psychology are likely to bring the two fields yet reasoning reveals major departures from the normative stan- closertogether over the next decade.It is increasingly the case dards of formal logic (66,67).Although humans may basesome that cognitive psychologistsdemand of their theories the kind inferences on an abstract "natural logic" (68), everyday rea- of sufficiency test provided by computer simulation. To meet to soning often seemsto be basedon rules induced and applied in this standard, they wiII either adapt current AI concepts of cognition theories adapt the context of broad classesof pragmatically important tasks, build new theories of cognition or also such as understanding social regulations or causal relations to build new AI concepts.For their part AI researchers psycholin cognitive advances of aware remain to reason have among events (69) (seeReasonitg, Causal).The human "inferreence engine" (see Inference) appears to be very different in ogy. Human beings, despite their cognitive shortcomings, known intelliall of flexible general and most far the by main reaAI in some embodied systems logic-based from the kind for AI will soning prograffis, and may well be "better than normative" for gent systems.As long as this is so, a major stratery natimitate closely more programs that of construction the be in the range of problems humans encounter most frequently intelligence. ural everyday life. Human problem solving is also closely tied to basic properties of the memory system. A major area of current research involves the transition from novice- to expert-Ievel problem- BIBLIOGRAPHY solving skilt in domains such as physics QA,71) (see Physics, Dover,New York, (origi1. W. James,ThePrinciplesof Psychology, naive). Expertise appearsto reflect the reorganization of schepublished 1890). nally mas representing categoriesof problems and the acquisition of 2. K. Duncker,"on problemsolvingi' Psychol.Monogr.,58(270), specialrzed methods for dealing with the categories of prob(1e45). generaltze to ability lems encountered in the domain. The probM. Wertheimer,ProductiueThinhing,Harper& Row,New York, 3. new to applied be can they so methods problem-solving 1959. lems and the ability to solve novel problems by analogy to Mouton.The Hague,1957. 4. N. Chomsky,SyntacticStructu.res, known situations in other domains (72,73) distinguishes huverbal behaviot," LanSkinner's AI exF. of B. "Review typical 5. N. Chomsky, man problem solving from the performance of (1959). guage 35,26-58 pert systems. Language.The study of language-its acquisition, production, and comprehension-has been a distinct area within cognitive psyehology,with a closerelationship to work in develop-
6. G. A. Miller, E. Galanter, and K. H. Pribram , Plans and the Structure of Behauior, Holt, Rinehart and Winston, New York, 1960. 7. U. Neisser, CognitiuePsychology,Prentice-Hall, EnglewoodCliffs, NJ, 1967.
COGNITIVEPSYCHOLOGY 8. A. Newell and H. A. Simon, Human Problem Soluing, PrenticeHall, EnglewoodCliffs, NJ, L972. g. J. L. Lachman, R. Lachman, and E. C. Butterfield, Cognitiue Psychology q,nd Information Processing: An Introduction, Erlbaum, Hillsdale, NJ, 1979. Memory 10. C. R. Puff (ed.),Handbook of ResearchMethods in HurT-La,n and Cognition, AcademicPress, New York, 1982' 11. A. K. Ericcson and H. A. Simon, ProtocolAnalysis: Verbal Reports as Data, MIT Press, Cambridge, MA, 1984. L2. S. Sternberg, "High-speed scanning in human memory," Science, 153, 652-654 (1966). 13. M. I. Posner, ChronometricExplorations of Mind, Erlbaum, Hillsdale, NJ, 1982. L4. J. L. McClelland, "On the time relations of mental processes:An examination of systemsof processesin cascade," Psychol.Reu.,86, 287-330 (r"979). 15. F. C. Bartlett, Remembering, Cambridge University Press, Cambridge, UK, L932. 16. M. A. Minsky, A Framework for Representing Knowledg", in P. H. Winston (ed.),ThePsychologyof ComputerVision, McGrawHill, New York, 1975. L7. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals,and Understanding: An Inquiry into Human Knowledge Structures, Erlbaum, Hillsdale, NJ, 1977. 18. D. E. Rumelhart, Schemata:The Building Blocks of Cognition, in R. Spiro, B. Bruce, and W. Brewer (eds.), Theoretical Issues in Reading Comprehension,Erlbauffi, Hillsdale, NJ, 1980. 19. E. C. Tolman, "Cognitive maps in rats and men," Psych'ol.Reu., 55, 189-208 (1948). 20. D. Gentner and A. L. Stevens (eds.), Mental Models, Erlbauffi, Hillsdale, NJ, 1983. 21. J. H. Holland, K. J. Holyoak, R. E. Nisbett, and P. R. Thagard, Induction: Processesof Inference,Learning, and Discouery,MIT Press,Cambridge,MA, 1986. 22. J. R. Anderson and G. H. Bower, Human AssociotiueMeffioIU, Winston, Washington, DC, 1973. 23. J. R. Anders on,Language,Memory, and Thoughf, Erlbaum' Hillsdale, NJ, L976. 24. J. R. Anderson, The Architecture of Cognition, Harvard University Press,Cambridge,MA, 1983.
119
gb. A. M. Treisman and G. Gelade, "A feature-integration theory of attention," Cogn.Psychol.,12, 97 -136 (1980). 86. B. Julesz, Figure and Ground Perception in Briefly PresentedIsodipole Textures, in M. Kubovy and J. R. Pomerantz (eds.),Perceptual Organization,Erlbaum, Hillsdale, NJ, 198137. J. R. Pomeratttz,Perceptual Organization in Information Processing in M. Kubovy and J. R. Pomerantz (eds.),PerceptualOrganization, Erlbaum, Hillsdale, NJ, 1981. 38. I. Rock, Perception, Sci. Am. Libr., W. H. Freeman, New York, 1984. gg. D. E. Broadbent, Perception and Communication, Pergamon Press,London, 1958. 40. A. M. Treisman, "Contextual cuesin selectivelistenin gl' Quart. J. Exper. Psychol., L2, 242-248 (1960). 4L. J. A. Deutsch and D. Deutsch, "Attention: Some theoretical considerations,"Psychol.Reu., 70, 80-90 (1963). 42. R. M. Shiffrin and W. Schneider, "Controlled and automatic human information processing.II. Perceptual learning, automatic attending, and a general theory," Psychol. Reu., U, t27-t90 (1 9 7 7 ) . 43. M. I. Posnerand C. R. R. Snyder,Attention and Cognitive Control, in R. Solso (ed.),Information Processingand Cognition: The Loyola Symposium,Erlbaum, Hillsdale, NJ, 1975. 44. D. E. Meyer and R. W. Schvaneveldt,"Facilitation in recognizing pairs of words: Evidence of a dependencebetween retrieval operations," J. Exper. Psychol. 90,227-234 (1971). 45. J. H. Neely, "semantic priming and retrieval from lexical memory: Role of inhibitionless spreading activation and limited capacity attention," J. Exper. Psychol.: Gen, 106, 226-254 (L977). 46. P. W. Cheng, "Restructuring versus automaticity: Alternative accounts of skill acquisition,"Psychol.Reu.,92, 414-423 (1985). 47. D. E. Broadbent, "Task combination and selectiveintake of information," Acta Psychol.,50,253-290 (1982). 48. D. Kahneman and A. M. Triesman, Changing Views of Automaticity, in R. Parasuraman, R. Davies, and J. Beatty (eds.), Varieties of Attention, Academic Press, New York, pp. 29-61, 1984. 49. W. Kintsch and T. A. van Dijk, "Toward a model of text comprehensionand production,"Psychol.Reu.,85' 363-394 (1978).
27. D. E. Rumelhart, J. L. McClelland, and the PDP ResearchGroup, Parallel Distributed Processing:Explorations in the Microstructure of Cognition,Vol. 1, MIT Press,Cambridg", MA, 1986.
50. D. E. Rumelhart, {Jnderstanding and Summarizing Brief Stories, in D. Laberge and S. J. Samuels (eds.),Basic Processesin Reading: Perceptionand Comprehension,Erlbauffi, Hillsdale, NJ, 1977. 51. A. D. Baddeley, The Psychology of MemoU, Basic Books, New York, 1976. 52. R. G. Crowder, Principles of Learning and Memory, Erlbauh, Hillsdale, NJ, 1976. 53. N. C. Waugh and D. A. Norman, "Primary memory," Psychol. Reu., 72, 89-104 (1965).
28. Cogn. Sci. 9(1), 1985-issue devotedto "Connectionism." 29. D. O. Hebb, The Organization of Behauior, Wiley, New York, 1949. 30. A. L. Glassand K. J. Holyoak,Cognition,2nd.ed.,RandomHouse, New York, 1986. 31. J. R. Anderson,CognitiuePsychologyand Its Implications,2nd ed., Freeman, San Francisco,CA, 1985. 32. M. Wertheimer, Principles of Perceptual Organization, in D. C. Beardsley and M. Wertheimer (eds.), Readings in Perception, Van Nostrand, New York, 1958 (abridged translation of M. Wertheimer, originally published 1923). 33. J. J. Gibson, The SensesConsidered as Perceptual Systems, Houghton Mifflin, Boston,MA, 1966. 34. D. J. McArthur, "Computer vision and perceptual psychology," Psychol.Bull., 92, 283-309 ( 1982).
54. R. C. Atkinson and R. M. Shiffrin, Human Memory: A Proposed System and Its Control Processes,in K. W. Spenceand J. T. Spence(eds.),The Psychologyof Learning and Motiuation, Vol. 2, AcademicPress,New York, 1968. 55. F. I. M. Craik and R. S. Lockhart, "Levels of processing:A framework for memory research," J. Verbl. Learn. Verbl. Behau., 11, 671-684 (1972). 56. R. N. Shepard and L. A. Cooper,Mental Images and Their Transformations, MIT Press, Cambridge, MA, L982. 57. S. M. Kosslyn, Image and Mind, Harvard University Press,Cambridge, MA, 1980. 58. N. Block (ed.),Imagery, MIT Press,Cambridge,MA, 1981. 59. E. Rosch,Principles of Categonzation, in E. Roschand B. B. Lloyd (eds.), Cognition and Categorization, Erlbaum, Hillsdale, NJ, 1978.
25. R. Thibadeau, M. A. Just, and P. A. Carpenter, "A model of the time courseof readiog," Cogn. Sci. 6' 157-203 (1982). 26. G. E. Hinton and J. A. Anderson, Parallel Models of Associatiue Memory, Erlbaum, Hillsdale, NJ, 1981.
120
COGNITIVESCIENCE
60. E. E. Smith and D. L. Medin, Categoriesand Concepfs,Harvard University Press, Cambridge, MA, 1981. 61. D. L. Medin and E. E. Smith, "Conceptsand conceptformatioil," Ann. Reu.Psychol.,35, 113-138 (1984). 62. D. Kahneman, P. Slovic, and A. Tversky (eds.),Judgment Under Uncertainty: Heuristics and Biases,Cambridge University Press, Cambridge, MA, 1982. 63. A. Tversky and D. Kahneman, "Extensional versus intuitive judgment: The conjunction fallacy in probability judgmert," Psychol. Reu.,90, 293-315 (1983). 64. R. E. Nisbett and L. Ross, Human Inference:Strategiesand Shortcomings of Social Judgment, Prentice-Hall, EnglewoodCliffs, NJ, 1980. 65. A. Tversky, "Features of similarity," Psychol. Reu., 84, 327-352 (1e77). 66. P. N. Johnson-Laird and P. C. Wason (ed.;, Thinking, Cambridge University Press,Cambridg", MA, 1978. 67. P. N. Johnson-Laird, Mental Models, Harvard University Press, Cambridge,MA, 1983. 68. M. D. S. Braine, B. J. Reiser, and B. Rumain. Some Empirical Justification for a Theory of Natural Propositional Logic, in G. H. Bower (ed.), The Psychologyof Learning and Motiuation, Vol. 18, Academic Press,New York, pp. 313-371, 1984. 69. P. W. Cheng and K. J. Holyoak, "Pragmatic reasoningschemas," Cogn.Psychol.L7, 391*416 (1985). 70. M. T. H. Chi, P. J. Feltovich, and R. Glaser, "Categotizationand representation of physics problemsby experts and novices,"Cogn. Sci., 5, I2I-152 (1981). 7I. J. H. Larkin, J. McDermott, D. P. Simon, and H. A. Simon, "Expert and novice performance in solving physics problems," Science,208, 1335- 1342 (1980). 72. D. Gentner and D. R. Gentner, "Flowing Waters or Teeming Crowds: Mental models of Electricity," in D. Gentner and A. L. stevens (eds.),Mental Models,Erlbaum, Hillsdale, NJ, 1983. 73. M. L. Gick and K. J. Holyoak, "schema induction and analogical transfer," Cogn.Psychol.,15' 1-38 (1983). 74. J. A. Fodor, T. G. Bever, and M. F. Garrett, The Psychologyof Language, McGraw-Hill, New York, L974. 75. H. H. Clark and E. V. Clark, Psychologyand Language, Harcourt Brace Jovanovich, New York, 19777G. D. J. Foss and D. T. Hakes, Psycholinguistics,Prentice-Hall, EnglewoodCliffs, NJ, 1978. 77. M. K. Tanenhaus,J. M. Leiman, and M. S. Seidenberg,"Evidence for multiple stagesin the p.rocessingof ambiguous words in syntactic contexts," J . Verbl. Learn. Verbl. Behau. 18, 427-440
(1e7e), 78. L. D. Erman, F. Hayes-Roth,V. R. Lesser,and D. R. Reddy,"The Hearsay-Il speech-understandingsystem: Integrating knowledge to resolveuncertainty," Comput. Suru., l2r 213-253 (1980). K. Hot YoAK UCLA
COGNITIVESCIENCE Relationto Other Fields Cognitive Scienceis an emerging field of study whose boundaries are far from being well deflned.A report prepared for the Alfred P. Sloan Foundation (a portion of which is reproduced as an appendix to Ref. 1) defines it as "the study of the principles by which intelligent entities interact with their environ-
ments" and notes that "by its very nature this study transcendsdisciplinary boundaries." In particular, the distinctions among cognitive psychology (qv), AI (qv), and cognitive science are extremely blurred in practice. This blurring is additionally exacerbated by the fact that research that clearly qualifies as cognitive scienceis being done in academicdepartments (as well as government and industrial research laboratories) whose titles identify them with disciplines as diverse as psychology, computer science,linguistics, anthropology, philosophy, education, mathematics, engineering, physiology, and neuroscience,among others. From an informal survey of cognitive sciencepublications, it is shown that papersin cognitive sciencejournals cited other papers in a very wide range of fields (1). Cognitive Science is also extremely closely related to AI. When the editors of the journal Artificial Intelligence decided to help their readership keep up with someof the literature in closely related disciplines by publishing regular "Correspondent's Reports" on work in these fields, they selectedthe areas of philosophy and logic, robotics, software engineering, natural language, cognitive psychology,and vision. Of these, all but perhaps software engineering and parts of robotics would be consideredcore areas of cognitive scienceresearch. Indeed, it has been argued (e.g.,Refs. z-il that AI and cognitive science may be nothing more than two paths to the same endunderstanding the nature of intelligent action in whatever physical form it may occur. The difference between them, 3ccording to this view, consists mainly in research style: AI takes the "high road" of asking how instances of intelligence can be realized (i.e., how they are possible)within the constraints of known computational mechanisms or how they might be attainable by the design of new mechanisms (i.e., new computational architectures); whereas cognitive science places greater emphasis on the question of how instances of intettigence are in fact realized within one particular architecture-the one constituted by the human mind. Becauseof this difference in orientation, many experimentally oriented cognitive scientists tend to place a somewhat greater premium on empirical fit, on testing processesagainst psychologicaldata to determine not only whether the two are input-output equivalent but also whether they are strongly equivalent, that is, whether in both casesthe behavior is produced by the same information-processing means. The notion of strong equivalence is central to much cognitive science,though it is not often discussedexplicitly. According to one interpretation (6), two processescan only be strongly equivalent if they produce the same behavior using the same computational process(or algorithm) and the same symbolic representations,something that is possibteonly if the two systemshave functionally identical computational architectures (i.e., the same primitive operations, the Same resource constraints, and the Same Symbolic notation). Despite this difference in principle between cognitive scienceand AI, differencesin practice are minimal. Indeed,it has even been argued (4) that a convergenceof the two approaches may be inevitable inasmuch as both adhere to a notion of intelligence that is inherently anthropocentric or human relative, at least at the Present time. Of course, the two fields diverge considerably in their applied side. A great deal (though by no means aII) of applied cognitive sciencedeals with such problems as designing better human-machine interfaces (see Human-computer interaction), better pedagogical methods (see Educational applications), better communications techniques, better aids for the
COGNITIVESCIENCE
handicapped (seeProstheses),or better methodologiesfor discovering such useful things as what experts know (seeKnowledge acquisition) or why children fail to read or do mathematics. What identifies these as cognitive science rather than simply applied psychology investigations is the fact that they take a fundamentatly computational view of the nature of the cognitive processinvolved; they view cognitive processas consisting of the execution of symbol manipulation procedures. Although it is clear that the fruits of such pursuits are relevant to what people in AI do, the work itself frequently requires different skills and proceedsusing different methodologresthan are typically (though, again, not always) found in AI laboratories. In contrast to this approach, applied AI places heavy emphasis on finding a practical match between available computational techniques and applications crying out for solution. As in all engineering or applied technology pursuits it must find suboptimal solutions to practical problems and proceedby incremental refinement. In terms of what has been referred to as the power generality trade-off (7), applied AI must perforce settle for the power end of the dimension. But none of this need be true, and indeed generally is not true, of basic research in either AI or cognitive science, where the overlap is gfeat enough that many are tempted to view AI as the more theoretical and more formal end of the spectrum of cognitive science research. This still leavesthe question: "What is cognitive science?" If it is simply the attempt to understand mental activity (or, as in the earlier quote from the Sloan report, to understand how intelligent entities interact with their environments), how is it different from psychology, especially from that branch of psychology that studies thinking, perception, memory, language, and so or, that is, cognitive psychology (qv)? Many people believe that cognitive sciencerepresents a new paradigm for understanding cognition, a paradigm that clearly owes much to developments in computer science. Yet one would like a better charactertzation than this, for if it is a new paradigh, it would be useful to know how it differs from other paradigms and on what assumptionsit stands. One would like to know this both in the abstract (i.e., What are some distinguishing principles of cognitive science?)and in terms of concrete examplesof how the new scienceis practiced and what it is seen as accomplishirg, or at least trying to accomplish. Many attempts at a statement of what cognitive scienceis have been made. One of the earliest was an unpublished report prepared by a committee (under the editorship of George Miller) for the Alfred P. Sloan Foundation, from which the earlier quote was taken. This report characterizeswhat is special about cognitive science and what runs through all the diverse work that falls under its scopeby defining its research objective as being "to discover the representational and computational capacities of the mind and their structural and functional representation in the brain." Although extremely general, this represents a fair statement. In addition to this early statement, the journal Cognitiue Science,th'e official organ of the Cognitive ScienceSociety, has published a number of articles that attempt to charactertze the field, beginning with its initial editorial, and include a number of papers first presentedat the inaugural conferenceof the Cognitive Science Society in 1979 (these were published in several issues of the journal, beginning with volume 4, 1980).An attempt at a systematic argument that cognitive scienceis not just a marriage of conveniencebut a genuine field of study is contained in Ref. 6.
121
In what follows are provided examples of the kinds of probIems that eognitive scientists are interested in pursuing and the approa.h.r that they take, pointers to literature that gives further details of such examples, and a brief statement of why some people believe that cognitive scienceis not just a collection oi r"r"arch problems that in one way or another are concerned with reasoning but a genuine scientific domain or inquiry. The reader must be cautioned, however, that this review is not without personal bias. It is primarily an attempt to char actenzecognitive sciencerather than to catalog someof its current resear.h dir"ctions (which are likely to changeradically in the next few years in any case).Moreover, a view of what is constitutive of cognitive scienceis presentedwhich the author believes to be correct and borne out by the classical work in the field (and defendedat some length in Ref. 6), yet one that nonethelessflies in the face of claims being made by some people who are legitimate researchers in cognitive science.This view concernsthe symbol-processingnature of cognition (what is referred to below as the representational metapostulate). Although this is not the proper forum for a debate on such issues,the author believes that the notion of symbolic representation is so very central to cognitive scienceand continues to be the central theoretical assumption underlying virtually all work in the field that it is appropriate to lay it out explicitly, even in the highty sketchy form presented here. SomeExamplesof CognitiveScienceResearchProblems Language.The study of the human capacity for language is one of the oldest areas of research in cognitive science.It is also one that has changed dramatically in the past two decades, partly under the influence of formal linguistics and partly because of attempts to develop computer systems for understanding natural langUage (see Natural-language understanding). It thus provides a prime example of cross-disciplinary cognitive scienceresearch,albeit one that continues to be steeped in controversy. In recent years this study has also encompassedwork by philosophers, as researchers become more concernedwith issuesof semanticsand pragmatics, with problems of meaning and discoursethat had occupiedphilosophers long before these problems arosein AI. It also brought in the work of clinical neuroscience,which investigated the taxonomy of language deficits causedby trauma and disease. This work has led to computational models of language performance. At the present time a number of alternative models of syntactic analysis (seeParsing) have been published and psycholinguistic research provides provocative evidence that parsing proceedswith only minimal input from the rest of the cognitive system, as is also the case,by the wBY, in most computational language understanding systems[a notable exception is the work of Schank and his colleagues(8)1.Experimental studies have also shown clearly that the lexical lookup (see Morpholory) phase of grammatical analysis retrieves many homogfaphs or homonyms of ambiguous words (9,10), thus empirically validating one computational proposal. Vision. The idea, popular in the 1950s,that perceptionconsists of hypothesis testing was challengedfirst by peopleworking on computational vision (qv) (e.g.,Refs. 11 and l2), who argued that it would be highly wasteful to not extract as much information as possible from the initial image before bringing cognitive processesto bear. Models, such as those developedby Marr and his colleagues (12), showed that a considerable amount of processingcould be done in a data-driven manner
122
COGNITIVESCIENCE
(see Processing,bottom up and top down). These ideas were this particular class of approachesis an allegiance to the netthen validated by psychophysicalinvestigations as well as by work of ideas that might roughly be summanzed as follows (1): findings from neuroscience(e.g., concerning the existence of separatespatial frequency channels,motion detectors(seeMo- 1. The approach is formalist in spirit: That is, it attempts to tion analysis), sensitivity to maxima in intensity derivatives, formulate its theories in terms of symbolic mechanismsof etc.). Someof this cross-fertiltzation is nicely illustrated by the the sort that have grown out of symbolic logic (qv), alpapers in Ref. 13. Although this work is described in some though the apparatus of formal logic itself very rarely apdetail elsewhere in this encyclopedia,it is in fact an excellent pears in cognitive sciencetheories. example of cognitive science research that falls at the more 2. The "level of analysis" or the level at which the explanacomputational end of the spectrum. The relevance of both the tions or theories are cast is functional and they are devision and the psycholinguistics work to the understanding of scribed in terms of their information flow. What this means mind is discussed in an insightful and provocative way in in particular is that this approach factors out questions Ref. 14. such as how biological material carries out the function and how biochemical and biophysical laws operate to produce Expertiseand QualitativeReasoning.The study of expert systhe required information-processingfunction. This factoritems (qv) (or, as it is sometimescalled, knowledge engineerzation is analogousto the separation of electrical engineerirg) both inspires and benefits from experimental investigaing considerations from programming considerations in tions of how experts in such areas as physics, mathematics, computer science.This doesnot mean that questionsof bioelectronics, medicine, or chessdiffer from their inexperienced logical realization are treated as any less important, only counterparts. Findings concerninghow experts structure their that they represent a distinct and to a large extent indepenknowledge and how this structure differs from that of less dent area of study. According to this view, neuroscience experiencedperformers is an interesting chapter in recent cogcontributes an understanding of how such computational nitive science.These investigations also relate to studies in processesas are uncoveredby empirical observationsof huboth psychologyand AI of how peoplereasonby building qualman capacities are realized by biological mechanisms. itative mental models (e.g.,Refs. 15-L7). Not everyone agreesthat cognition can be studied independently of its neurophysiological instantiation. There is, Modelsof Human Performancein VariousTasks.In this catefor example, an approach, sometimescalled connectionist gory one finds computational models of human performanceon (seeConnectionism),which attempts to build modelsof cogarithmetic (18), tasks involving interacting with text editors nition that are guided more closely by ideas from neuro(19), typing and other skills (20), and reasoning with spatial sciencethan by symbol-processingideas from current comproblems (21). Closely related to this work is the general study puter science.Some examples of such models can be found of cognitive skill, its acquisition and its nature Q2). Underin Ref. 27 and the specialissueof CognitiueSciencedevoted standing cognitive skill requires distinguishing cognitive cato this approach t9(1), (1985)1.Although this approachis pacities from performance differences that arise from differextremely promising from the perspective of modeling the ences in knowledge or habit, a difference that parallels the functional architecture of the mind, there is considerable distinction between functional architecture and computadoubt that it can displacerule-governed symbolic processes tional procedures.The importance of this distinction to underentirely, 3s some have claimed (seeRef. 6). standing the nature of cognitive processes(and of strong 3. In addition to factoring apart questions of capacities from equivalence)is discussedin Refs. 6 and 23. questions of biological reahzation, the approach is also characterized by the techniques it uses in formulating its Learning. The area of learning was one of the most thortheories and in exploring the entailments of its assumpoughly investigated during the last half century of psychology, tions. The most widely used (though not universal) techwith very little progresson what peoplecall learning in everynique is that of computer implementation. Thus, an imporday life. The work was guided by preconceivedideas about the tant methodological goal of cognitive scienceis to specify underlying mechanism (namely, association)rather than by a symbol-processingmechanisms that can actually exhibit careful analysis of the types of learning and the types of mechaspectsof the behavior being modeled.Adherenceto such a anisms capable of meeting the sufficiency condition that is "sufficiency" criterion makes this approach in many recentral to cognitive science. More recent work on language spectslike a design discipline rather than natural science, Iearning by cognitive scientists has shown that the acquisition at least insofar as the latter typically attempts to uncover a of syntax from the kind of evidencegenerally available to the small set of fundamental axioms or laws. Its concern with child would not be possiblewithout severeconstraints on both synthesis makes it, to use Simon's phrase (28), one of the the structure of the languages that can be learned and severe "sciencesof the artifici &I," along with AI. constraints on the mechanisms that could learn such lana strategy sometimesreguages. In particular, it is necessarythat the range of gram- 4. The approach tends to emphasize a premium is given in which analysis, as top-down to ferred mars that the organism could consider as possiblehypotheses general cognitive skill the how understanding of task the to must be extremely limited (seeRef. 24)-The same may also be (consonant the constraint of with possible is question in (25). More recent work on learning true of concept acquisition for emaccounting of task the with in contrast mechanism) forms of look at some to provided ways new within AI has also with contrasts in style This difference particulars. pirical learning in humans (26) (seealso Learning' machine). the traditional approach in experimental psychology that emphasizesthe observational fit of models. The contrast is of CognitiveScience Conclusion:SomeCharacteristics examined in Refs. 4, 29, and 30. Cognitive scienceis not the only form in which the search for 5. This commitment to the informational level also placesthe enterprise in contrast to the phenomenologicalapproachin an understanding of mind is proceeding.What characterizes
COGNITIVESCIENCE
123
legal tender). Though in both economicsand cognitive science th. meaning-bearing objects (or the instantiation of the symbols) are physical, it is only by referring to their symbolic or referential character that we can explain the observedregularities in the resulting behavior. There has been some misunderstanding of the significance of the assumption that cognition is explained in terms of regusome people have suggestedthat this is The above general characteristics of cognitive science are larities. For L*u*ple, any other sciencesince all scientific theories also shared to various degreesby other scientific disciplines. no different from (e.g., mathematical symbols that The formalist or symbolic mechanistic character 1 is deeply deal with representations properties). Hence simulations inor objects certain genera' (especially designate in entrenched in contemporary linguistics (e.g., of planetary motions) simulations theories such volving parts anthropolof tive grammar), decision theory, and even in principle from different no be to thought sometimes are is now 2 perspective functionalist ory (..g., Levi-strauss). The the two types of in difference the q,rit. general in psychologyand philosophy of mind as well as simulations of cognition. But caseof cogniin the because fundamental fact in is in engineering, where it is referred to as the black-box ap- simulation not just the modeled, being organism the is that claim the tion, as proach. Both 1 and 2 are fundamental to computer science physical tokens of the symbols, well as to any sciencethat concernsitself with notions such as theorist, actually manipulates parallel in physics unlessthe physino has clearly that a claim Such control. of distribution the or the flow of information ideas have thus affected everything from engineering to man- cist is being modeled! This representation thesis, sometimesreferred to in philosagement scienceand even political science(e.g.,as exemplified the as the "representational theory of mind" (32) and in cogophy as prevalent quite so in Ref. 35). Criteria 3 and 4 are not system" hypothesis first two. For example, the desire to synthesize aspectsof the r,itiu. science as the "physical-symbol phenomena being modeled as part of the attempt to under- (40,41)is one of the foundational cornerstonesof the discipline links it in a stand it is not widespread in the social sciencesoutside of the of cognitive scienceand is one of the features that philosophical and intellectual The AI. to way fundamental areas of cognitive psychology and management science[espelinked cially the branch of the latter called industrial dynamics (36)1, underpinnings of these two fields are now so closely pragnor is it yet very common in biology [see, however, Marr's that the distinction between them remains mostly at the role actual a how big as things such on critique of theories in neurophysiology that fail to characterize matic level, resting the constructive computational aspect of biological function computer programs play and how technical are the immediate (37)1.Even modern linguistics, which is in many ways a proto- applications of the research. Somepeopleexpectthat as cognitypical cognitive science,places little emphasison the human tive scientists becomebetter trained in computer science,and gencapacity to actually generate samples of performance (see, as AI begins to tackle the harder problem of what makes fields the between possible, distinction the intelligence however, an example of the contrary trend in Refs. 38 and 39). eral will fade. Similarly, the philosophy of mind is being influenced more and more by developmentsin AI and might be expected suggested as Although, Metapostulate. The Representational earlier, there are a number of theoretical and methodological to play a more central role in clarifying the difficult conceptual characteristics that pervade a variety of approachesto the issues that face both empirical and theoretical studies of intelunderstanding of intelligence and human cognition, there is ligence. one overriding theme that more than any other appears to charactenze the field of cognitive science.There are a number BIBLIOGRAPHY of ways of expressing this theme, for example, as the attempt to view intelligent behavior as consisting of the processingof 1. Z. Pylyshyn,Information Science:Its Rootsand Relationsas information or as the attempt to view intelligence as the outin F. Machlup of CognitiveScience, Viewedfrom the Perspective comeof rule-governed activity (seeRule-basedsystems).These and U. Mansfield(eds.),The Study of Information:Interdisci' charactenzations expressthe sameunderlying idea. ComputaWiley,New York, 1983,pp. 63-80plinary Messages, tion, information processing, and rule-governed behavior all 2, A. Newell,Remarkson the RelationshipbetweenArtificial Inteldepend on the existence of physically instantiated codes or in R. Banerji and M. D. Meligenceand CognitivePsychology, symbols that refer to or represent things and properties extrinsarovic(eds.),TheoreticalApproachesto Non-NumericalProblem sic to the behaving system. In all these casesthe behavior of New York, 1970. Soluing,Springer-Verlag, the systems in question (be they minds, computers, or social 3. A. Newell,Artificial Intelligenceandthe Conceptof Mind, in R. C. systems) are explained not in terms of intrinsic properties of Schankand K. Colby (eds.),ComputerModelsof Thoughtand the system itself but in terms of rules and processesthat operLanguage, Freeman, San Francisco, 1973. ate on representations of extrinsic things. Cognition, in other 4. Z. Pylyshyn, "Validating computational models: A critique of Anwords, is explained in terms of regularities holding over sederson's indeterminacy of representation claim," Psychol. Reu., 86(4),383-394 (1979). mantically interpreted symbolic representations,just as the behavior of a computer evaluating a mathematical function is 5. Z. Pylyshyn, Complexity and the Study of Human and Machine Intelligence, in J. Haugeland (ed.;,Mind Design, MIT Press,Camexplained in terms of its having representationsof mathematibridge, MA, 1980. cal expressions(e.g.,numerals) and in terms of the mathematical properties of the numbers these expressions represent. 6. Z. Pylyshyn, Computation and Cognition: Toward a Foundation This is also analogous to explaining economicactivity not in for Cognitiue Science,MIT Press, Cambridge, MA, 1984. 7. G. W. Ernst and A. Newell, GPS: A CaseStudy in Generality and terms of the categoriesof natural science(e.g.,speakingof the Problem Soluing, Academic, New York, 1969. physicochemicalproperties of money and goods)but in terms these objects value of symbolic or meaning 8. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals and Underof the conventional (e.g., that they are taken to represent such abstractions as standing, Erlbauffi, Hillsdale, NJ, L977. which the existential notions of signifi.cance,meaningfulness,and experiential content is given a central role in the analysis and with behaviorism, which attempts to analyze behavior without appeal to internal representational states.For a discussionof these issues,seeRefs.6 and 3134.
124
COLOR VtStON
9. D. Swinney, "Lexical access during sentence comprehension: 35. K. Deutsch, "The nerves of government," Gen. Sysf. Yearbk., zl, (Re)considerationof context effects,"J . Verb. Learn. Berb. B ehau., L25-L76 (1963). t8, 645-660 (1979). 36. J. w. Forrester,WorldDynamics, Wright/Allen, Cambridgu, MA, 10. M. S. Seidenbergand M. K. Tanenhaus, "Modularity and Lexical I97I. t Access,in I. Gopnik and Myrna Gopnik (eds.),From Models to 37. D. C. Marr, Approachesto Biological Information Processirg,SciModules: Studies in CognitiueScience,Ablex Press,Norwood,NJ, ence,190, 875-876 (1975) (Book Review). 1985. 38. M. Marcus, A Theory of Syntactic Recognition for Natural Lan11. S. Zucker, A. Rosenfeld,and L. Davis, General PurposeModels: gua,ge,MIT Press, Cambridge, MA, L979. Expectations about the Unexpected, Proc. of the Fourth IJCAI, 39. R. C. Berwick and A. S. Weinberg, The Grammaticul Basis of Tbilisi, Georgia,September3-8, L975,The Artificial Intelligence Linguistic Performance: Language Use and Acquisition, MIT Laboratory, Publications Department, Cambridg", MA, pp. 216Press,Cambridg", MA, 1984. 7 2 L ,1 9 7 5 . 40. A. Newell, "Physical symbol systems,"CognitiueScience,4, L3512. D. Marr, Vision, W.H. Freeman, San Francisco,1982. 183 (1980). 13. M. Brady, Artificial Intelligence: An International Journal (Spe4L. A. Newell, "The knowledgelevel," Artif. Intell.,18, 87-L27 (L982). cial Volume on Computer Visian), 17(1-3), (Aug. 1981), NorthHolland, Amsterdam, 1981. Z. W. PvlvsHvx 14. J. A. Fodor, ThrModularity of Mind: An Essay on Faculty PsyUniversity of Western Ontario chology,MIT Press, a Bradford Book, Cambridge, MA, 1983. 15. J. R. Hobbs and R. C. Moore,FormalTheories of the Commonsense World, Ablex, Norwood, NJ, 1984. 16. D. Gentner and A. L. Stevens,Mental Models,Erlbauffi, Hillsdale, NJ, L982. L7. P. N. Johnson-Laird, Mental Models, Harvard University Press, Color enriches one's everyday visual experience.Comparing a Cambridge,MA, 1983. color image with a monochrome (black-and-white) image, the 18. J. S. Brown and K. Van Lehn, "Repair theory: A generativetheory color picture seemsto be alive with detail becauseof all the of bugs in proceduralskills," Cogn.Scl. 4,379-426 (1980). (qv)
coloR vlsroN
19. S. K. Card, T. P. Moran, and A. Newell, The Psychologyof Human-ComputerInteractions,Erlbaum, Hillsdale, NJ, 1983. 20. W. E. Cooper,Cognitiue Aspectsaf Skilled Typewriting, SpringerVerlag, New York, 1983. 2L. S. M. Kosslyn,Image and Mind, Harvard University Press,Cambridge, MA, 1980. 22. J. R. Anderson, "Acquisition of cognitive skill, Psychol.Reu.89, 369-406 (1982). 23. Z. Pylyshyn, "The imagery debate: Analogue media versus tacit knowledge,"Psychol,Reu.,88, 16-45 (1981). 24. K. Wexler and P. Cullicover, Formal Principles of Language Acquisition, MIT Press,Cambridg", MA, 1980. 25. W. Demopoulosand A. Marras, Language, Learning and Concept Acquisition: Foundational Issues,Ablex, Norwood, NJ, 1985. 26. R. S. Michalski, J. G. Carbonel, and T. M. Mitchell, Machine Learning: An Artificial IntelligenceApproach vol. 2, Tioga Press, Palo Alto, CA, 1986. 27. J. A. Anderson and G. E. Hinton, Models of Information Processing in the Brain, in G. E. Hinton and J. A. Anderson(eds.),ParalIel Models of AssociatiueMemoU, Erlbaum, Hillsdale, NJ, pp. 948,1981. 28. H. A. Simon, The Sciencesof the Artifi.cial, Compton Lectures, MIT Press,Cambridge,MA, 1969. 29. A. Newell, Remarks on the Relationship between Artificial Intelligence and Cognitive Psychology,in R. Banerji and M. D. Mesarovic (eds.), Theoreticq.lApproaches to Non-Numerical Problem Soluing, Springer-Verlag, New York, 1970. 30. A. Sloman, The Computer Reuolution in Philosophy: Philosophy, Science, and the Models of Mind, Humanities Press, New York, 1968. 31. R. Cummins, The Nature of PsychologicalExplanation, MIT Press, a Bradford Book, Cambridge, MA, 1983. 32. J. Fodor, Representations,MIT Press, a Bradford Book, Cambridge, MA, 1981. Haugeland, Mind Design, MIT Press, a Bradford Book, CamJ. 33. bridge, MA, 1981. 34. D. Dennett, Brainstorms, MIT Press, a Bradford Book, Cambridge, MA, L979.
additional information in the image. In computer vision researchers have attempted to harness this additional information. The simplest method for using color is by associating colorswith objects,for example, "trees are green" and "the sky is blue." If specific object colors are not known, BD image can still be chopped into meaningful pieces by finding regions of uniform color. Recent theories have been proposedto analyze color in terms of physical properties of objects,and efforts are beginning to model the processingof color information in the human visual system. Color and Color lmaging Color arises from the spectral properties of light. Figure 1 shows the spectrum of electromagnetic enerry. Wavelengths of enerry are customarily denoted by tr, and the unit of measure is the nanometer (nm). Visible light lies within a range of approximately 380-760 rffi, running the gamut from blue at the low end of the visible spectrum,through green, yellow, and orange, to red at the high end. To the right of the visible spectrum is the near-infrared (near-ir) portion of the spectrum, which is also frequently used in computer vision. Visible Iight is normally a mixture of energy at many wavelengths and is char actertzedby the spectral power distribution (SPD) that tells how much enerry is present at each wavelength. An SPD is usually denotedby S(I) (1). At each pixel (point) in an image the SPD of the incident light determines the pixel value. Monochrome imaging is simpler than color imaging. An imaging sensor(seeSensors)is sensitive to the different wavelengths of light to varying degrees,as expressedby the spectral responsivity s(I) of the sensor(a). Typical spectral responsivities of the two principal types of sensor,vidicon tubes and silicon CCD chips, are shown in Figure 2. The output pixel value p at any point in the image is defined, for a calibrated camera, by p-
dr, p, Js(^)s(r)
COLOR VISION
125
Visible
Ultra violet
TandX rays
UltrahiehfrequencY and radiobroadcast
l n fr a r e d
1 0 -4
10 12
108
104 W a v e l e n g t hn,m (a)
I r lrr
r rlr
rrr
400
I rrr
tr ltttt
rl rtttltt
lttlt 700
600
500 W a v e l e n g t hn, m (b)
Figure 1. (o) Spectrum of electromagnetic radiation (from Ref. 2\. (b) Magnification of visible - yellow, and O - orange. portion of spectrum (from Ref. 3): C - cyan, G - green, Y
where ps is a scaling factor. For a vidicon, s(tr) is appronmately equal to V(tr), the spectral luminous efficiency of the human eye, so a monochromeimage from a vidicon looks similar to the brightness seenby a person.However, CCD sensors are much more sensitive in the near-ir region of the spectrum; for this reason, CCD cameras are frequently fitted with ir cutoff filters or filters to match the responsivity to V(I). Color imaging is more complex, assigning a color to each SPD instead of a single number. However, the set of all perceivable colors,as determined by color-matching experiments, is only a three-dimensional space.Humans cannot distinguish alt different SPDs from each other but only those SPDs that correspond to different colors in color space. Since this is a many-to-one correspondence,there can be many SPDs that have the same color; such SPDs are said to be metameric. The axes of color space,called primary colors,can be chosen arbitrarily. A convenient set, universally used for color measurement, is the X-Y -Z set of colors adoptedby the CIE (International Commission on lllumination); each distinct point in X-Y-Z spacecorrespondsto a unique color perception. A psychophysical color C is defined by C:
: [l'(r)r(r)drl L;l Lj:[]llllilj [*l
The functions f,(I), i(I), and ,(I) are CIE tristimulus values that define the primaries X, Y, and Z (7). Sincei(I) - V (\), Y correspondsto luminance (brightness of color as seen by the human eye); the remaining coordinatesX and Z determine chromaticity (the aspectsof color independentof brightness). Color imaging for computer vision follows the paradigm of color television, in which colors are measured using color filters. For any filter, the output pixel value p is determined by
The filters used for color television are red, green, and blue to maximize the gamut of measurable color values. For a particular sensor s(tr), the fiIter transmittances rr(tr), rr(tr), and ra(tr) of the red, green, and blue filters determine tristimulus values f(I) : r.(L)s(I), gi(I) : rr(tr)s(tr),and 6tfl - ro(tr)s(I). For standard color television, these functions should obey (8)
0.b82 - 0.LG4 - 0.08eI[t ( r ) I [' r r II - l--0.301 0.611 -0.0087 I grrl | | ll ytrl I 4 0.27 0.08G2 )Lz(r)l Larrtj L 0.0128 Color pixel values P are then determined by P:
lnl : ['o J s(r)r(r)drl
lq | lq'{,s_(r)q(r)q^ | s (r)b(r)dr LBJ LboJ
J
where ro, go, and bs are scaling factors. "White" in a color TV image correspondsto CIE Standard Illuminant C (9). A broadcast color camera contains three separatesensors,each with a color filter, and beam-splitting optics to direct the image simultaneously to all three sensors;for research in computer vision, a single monochrome camera is normally used with a filter wheel to rotate each filter in turn into position (Fig. 3). The recent developmentof three-color CCD sensorchips prom-
.= .z a c o oa 0) E
p : Po I S(I)r(\)s(r) d\ where r(tr), the transmittance of the fi.lter, is the fraction of light the filter allows to pass through at each wavelength. To yield color values that uniquely correspondto color perceptions, three filters must be used that span the X-Y-Z space.
400
600
800
1000
W a v e l e n g t h ,n m Figure 2. Spectral responsivities Refs. 5 and 6).
of vidicon
and CCD chips (from
cotoR vrsroN
(m) beam-splittingmirrors (f) R, G, B color filters (s) sensor tubes or chips
(A) Typicalbroadastcolor televisioncamera
$ O
magnificationof sensor area
fifterwheel with R, G, B filters
(B) Typical.colorcomputervisioncamera
R
G
R
G
R G
G
B
B
R
G R G
R G
R
G
B
G B
G B
G
R
G R
G
R
R
monochromecamera
G
G
G B G
R
(C) Single-chipcolor CCD sensorchip with R, G, or B filter on each pixel Figure 3. Color camera arrangements.
ises more compact and inexpensive color camerasfor the near future. In computer vision, unfortunately, other measurement factors are usually uncontrolled including sensor spectral responsivity, nonlinear responseto intensity, and gain within each color band. This results in a lack of correspondencebetween color computer images and NTSC color TV standards. The usual color fiIters for computer vision are Kodak Wratten filters #25 (R), #478 (G), and #58 (B) (10,11).Infrared filters tr(I) highest in near irl are also used in remote sensing.In this entry color refers to colors measured in a computer vision system using these or similar filters. Color Spacesand Transformations In computer vision color pixel values usually contain R, G, and B values each measured in 6 or 8 bits. The set of image colors is thus a cube called the color space(Fig. 4). Intensity, measuredby I - (R + G + B)/3, is the main diagonal of this cube from black (0, 0, 0) to white (max, max' max). Researchers in computer vision have frequently used R'G'B coordinates, but have also explored transformations to other coordinate systems that have useful properties. One such system is the CIE X-Y-Zsystem, a linear transform of R'G'B as defined
above.This system was proposedbecauseof its use as an international standard. However, it is psychophysicalrather than psychologicaland hence doesnot capture the subjective attributes of color perceptions. These latter are incorporated into such systems as the Munsell color order system (2). The Munsell system defines three color attributes (hue, chroma, and value) that correspond roughly to the more familiar brightness,hue (color name, e.g.,blue, purple), and saturation (relative amount of pure hue as opposedto gray). In computer vision, normalized colors r-g-b are first defined by r - R/I, g Gll,and b - Bll; saturation S and hue H are then defined (12) by S:1-min(r,g,b) cos
2r-g-b
if b s g, then II - x; otherwise H - 2rr - x (SeeRef. 13 for a fast algorithm to compute H.) In color spaceI correspondsto distance along the intensity axis; on a plane of constant intensity S and H form a polar coordinate system with S measuring distance from the center (gray) point and H measuring angle from pure red (Fig. 5). It is easily seen that
coloR vlsloN white
color differences aE are then expressedby AE_
Green
However, it seemsunlikely that even the use of CIELUV coordinates will solve the fundamental problems of color image segmentation and analysis. other color spacesfrequently used foi computer vision include the NTSC television broadcasting encodingsystem Y'I'Q defined (15,10)by
0.58? 0.114-l lnl -lilljlgj : lo.zgg --3?7,i' Lil L:;ll
l-vl Magenta
Y is the same as the CIE Y coordinate; I and A measure chromaticity using parameters optimized for the acuity of the human eye. Other researchers have used opponent colors Black Red (black/white, bluelyellow, and red/green) or even normalized colors r-g-b themselves. Figure 4. The R-G-B color space' Kender has pointed out some serious problems in using certain color features (13). All features defined by a division (including HSrgbUVW, etc.) contain singularities and tend to for dethe H-S-f system is analogous to the Munsell system in an image becauseof the small intescribing human color perceptions, but numerically the two are be unevenly distributed ger nature of the values used in their computation. This creates quite distinct. clustering, and Even the Munsell systeffi, however, has drawbacks. Most problems for algorithms based on histograms, bits per output of number small a recommends Kender edges. work with color in computer vision has been based on the (adding a random randomizing features, for such pixel value in a distance Euclidean notion of color differences expressedas pixel color coordinate), and color space, but Euclidean distance in Munsell coordinates real number from 0 to 1 to each their singularities (usunear entirely does not correspond well to subjective perceived color differ- avoiding these features transformations of linear use to is better It low). is I when ence magnitudes. Spaceswith that property have been pro- ally transformations linear such Y-I-Q; as such instead, R-G-B color uniform posed, however, and are generically called value in each maximum the so scaled be themselves should probeen has occasionally LI-V'W, called spaces.The earliest, (16) derived 1'0' Ohta is matrix posed for computer vision; it has been replaced in the color row of the transformation (using algoOhlander's regions image 100 for over (14). statistics In sciencecommunity by a newer system called CIELUV color the analyzed statistically and below) described rithm, the CIELUV system define L*, u*, and v* for each color as frequently most that feature the that found He distributions. incident to the (quantities n refer denotedby subscript follows captured the greatest information was intensity, I. Ohta proillumination color): posedthe use of three features I1-I2-I3 that represent a simple Ln _ 116(Y/Y,,;tra 16 u* - 13L.(u r,,) linear transformation from R-G-B and capture the color infor-- (R + G + B)/3; 12 - R _ mation he observedvery well: I1 v* lBL.(v v, ) : B; I3 (2G R B)12. Ohta found these features to perform where at least as well as any other set of features (X'Y'Z, R'G'B, Y' I-Q, U-V-W, I-r-g, H-S-l) in his system.However,as he noted, v-gY/(X+15Y+32) u-4X/(X+15Y+32)
Green
White
Green Saturation
Blue A plane of constantintensity
Black Color space Figure
5. Hue, saturation,
and intensity.
128
COLOR VtStON
"usefulnessof a color feature is greatly influenced by the structure of the color scenesto be [analyzedf.,,
(qv) can be used instead. Haralick and many others (22) have applied this standard pattern recognition (qv) technique to color image segmentation. A histogram is first created by the color values at all pixels; it tells, for each point in color space, Color as a StatisticalQuantity how many pixels exhibit that color (Fig. 6). Typically, the Most research in color computer vision has been for image colors tend to form clusters in the histogroh, one for each segmentation, breaking an image into pieces that have uni- object in the image. By manual or automatic analysis of the form properties. In this work color is usually regarded as a histogr&ffi, the shapeof each cluster is found. Then, each pixel random variable to be analyzed statistically but without re- in the image is assignedto the cluster that is closestto the gard for the specific physical processesthat give rise to color pixel color in color space.Clustering differs from spectral sigand color variation. The earliest and most obvioustechniqueis nature analysis in that the clusters are found by analysis of called spectral signature analysis, in which prior knowledge the specific image under consideration rather than by prior about characteristic object colors is used to classify pixels. consideration of the expecteddata. In someclustering systems Spectral signature analysis has been used extensively in re- clusters are restricted to be rectangular boxes or ellipsoids mote sensing (satellite and aerial photograph interpretation) (23-25). Features that describetexture have been used along and biomedical image analysis; it has been applied occasion- with color to create a "feature space" with additional dimenally in robotics (qv) research.For example,Noguchi (17) clas- sions (26). All clustering techniques suffer from the problem sifies pixels in biomedical images of cells. He measuresthe that adjacent clusters frequently overlap in color space,caustypical colors of background, cytoplasm, and nucleus in ad- ing incorrect pixel labeling. In conjunction with clustering, a vance; then, for each image, each pixel is individually classi- technique called relaxation is sometimesusedto improve pixel fied into one of these categories. Whichever category has a Iabeling. In relaxation pixel labels are assignedby an iterative characteristic color closest to that pixel's color is assigned method.Each pixel has a probability of belongingto eachclusas the pixel label. The distance metric used in this work ter, and in each iteration step those probabilities are modified. is Euclidean distance in R-G-B space, that is, AP A probability is increasedor decreasedaccordittgto a weighted . A similar technique is used to label combination of two factors: the color resemblanceof the pixel image regions as specificobjectsusing absolute color compari- to the cluster center and the probabilities that the neighboring sons in production systems (18,19).Frequently, nonstandard pixels belong to that same cluster (27,28). color filters or features are used that optimize discriminability A refinement on clustering is region splitting, in which the for the specifictask at hand (20,2L). image is broken into successivelysmaller pieces until each If specificobjectcolors are not known in advance,clustering piece has a uniform color (and presumably representsa single (A)
(s)sky
Inputimage
(b)building
(g) grass
(B) in Histogram colorspaceGreen Yellow
c,
rJ
s- s s s BlueMagenta
Black
Red
g, b, s are pixelsin grass,building,and sky Clustersare outlined Figure 6. Clustenng in color space.
coloR vlsloN objector surface).Ohlander used color to perform this splitting operation (2$. His method begins by computing a variety of color features for each pixel (H, S, I, Y, I, and Q) before the actual segmentation begins. Then, the splitting step is applied (the initial region is the entire image). In the splitting step a histogram is created for each color feature (RGBHSIYIQ) within the region. Each object tends to produce a peak in the histogram so the clustering task becomessimply to find a single prominent peak in one of the histograms (Fig. 7). Each histogram is examined independently, and the feature whose histogram exhibits the most prominent peak is selected for use. The image is then thresholded according to the boundaries of the peak to isolate the pixels that contribute to the selectedpeak. This splits the original region into smaller regions. The splitting step is then recursively applied to each region, stopping when a region has all flat histograms (i.e., it has uniform color). A variation of this method isolates several peaks in each step (30). Some of the general problems associated with histogram analysis in this algorithm are discussed in Ref. 31. Region growing (qv) is a different segmentation technique
in which small regions (initially individual pixels) are merged together to form larger and larger regions according to color similarity. This method merges the pair of adjacent regions with the gteatest similarity of colors,accordingto a statistical measure. It may take into account both the average color and color variance within each region. Merging continues until no two adjacent regions are sufficiently similar. Variations include simple color differencemeasures(32-34), the use of color features other than R-G-B (32,35,36),the use of semantic information about object positions and relations (37,12),and assigning a globally optimal set of pixel labels after examining all pixels (38). Nagao determined the acceptablecolor difference limits by finding a valley in local color difference histograms (36). Edge detection (qv) techniques have also been examined for color images. Nevatia produced a method for color edgedetection in which edges are first detected in each color feature separately (39,40).The computed edgedirections at each pixel are then averaged to determine a hypothesizededgedirection at that pixel. Using that constraint, each color feature is reexamined to see if there is sufficient evidence to confirm the
(A) Inputimage
(s)sky (b)building
(g) grass
max
0
max
Greenhistogram
Red histogram
Blue histogram
(g), (b),(s) indicatepixelsof grass,building, andsky
(c) Regionoutlinesindicatedby thick lines
After thresholding
129
Region # 2
Each regionwill now be split into smallerregions Region # 1
Figure 7. Region splitting by histogram analysis.
130
COLOR VTSTON
presenceof an edge in the hypothesizeddirection. If confirmation is found in each color feature, the edgeis consideredto be present. A different analysis of color at image edges is Kanade's method for color edge profile analysis (41). In this method, when the geometric arrangement of regions suggests that two regions are part of the same surface,the colors along their corresponding edgesare matched to confirm or deny this hypothesis. In summary, every major image segrnentationmethod has been adaptedfor color, and some,like relaxation segmentation and region splitting, are almost always performed on color images. In the near future there will be applications of color for spectral signature analysis in robotics applications and research in the use of color for matching tasks such as stereo image analysis.
Color as a Physicalor PerceptualQuantity
have a crossoverpoint (wavelength at which the sign of the difference changes) if the material being viewed at the two pixels is different. By looking for sign changes in the color components of adjacent pixels, material boundaries can be found. Both of these quantitative methods dependon idealized assumptions about reflection and imaging, but this type of research appears quite promising and wiII be important. Yet another view of color is to interpret it as a perceptual variable in human vision. Still in its infancy within the computer vision eommunity, this work involves the explicit modeling of color procegsingwithin the human visual system. Researchers are currently studying the color constancy phenomenonthat allows humans to seeobject colors the same regardless of illumination (45) and the possible function of retinal and brain cells sensitive to specific color patterns or orientations @6,47).This researchis still in early stagesbut is arousing great interest, and as the understanding of human vision increases,so will the sophisticationof modeling of human color vision. The reader interested in color measurement is referred to Ref. 2, dvery readable discussion,and the more technical discussion of color standards in Ref. 14. Useful reference handbooks are Ref. 7 , which contains many tables and formulas, and Ref. 1, which presents formal definitions of terms and units of measure. A discussion of the physiology of human color vision is found in Ref. 48; Ref. 49 is older but also contains an excellent discussion of color perception. Surveys of color in computer vision are Refs. 50-54; however, these tend to survey human color vision more deeply than computer color vision.
Although most of the work in color computer vision has viewed color as a random variable to be used for image segmentation, there has been some progress in viewing color as a physical variable instead. In this work knowledge about how color is created is used to analyze a color picture and compute some important three-dimensional facts about the objects being viewed. The most successfulsuch research has used heuristic rules (seeHeuristics; Rule-basedsystems),usually embedded in production systems,to label regions as shadows,highlights, and so on, using knowledge about color behavior. Nagao uses Ir-R, the ratio of infrared to red at each pixel, to detect vegetation in aerial photographs (36). This is useful since chlorophyll typically has a high ir reflectance but low red reflectance. unfortunately, the blue rooftops in Nagao's BIBLIOGRAPHY photographs also exhibit high values of Ir-R; thus he uses an absolute B threshold to discriminate such roofs from vegeta1. International Commission on Illumination, International Lighttion. He also detects shadowsby comparing I to a threshold; if ing vocabulary,3rd ed., cIE 17 (E-1.1.),cIE, Paris, 1970. the intensity is low, a region is assumedto be a shadow. SevZ. D. B. Judd and G. Wys zecki,Color in BusinesE,Scienceand Induseral others have detected shadowsby adding the requirement try, Wiley, New York, 1975. g. S. J. Williamson and H. Z. Cummins, Light and Color in Nature that there be an adjacent region (presumably an illuminated part of the same surface) with higher intensity but similar and Art, WileY, New York, 1983. (23,30). similarly Ohlander (hue saturation) and chromaticity 4. F. Grum and R. J. Becherer, Optical Radiation Measurements, labels a region as a highlight if there is an adjacent region Vol. 7, Radiometry,AcademicPress,New York, 1979. with lower intensity and similar chromaticity @D. Sloan, ana5. Hamamatsu Corp., Vid,icons,Catalog SC-5-3, Hamamatsu Corp', Middlesex NJ, 1983. lyzing outdoor scenes,noted that distant objectsappear somewhat bluish (35). All these heuristics are qualitative in nature 6. Hamamatsu corp., silicon Photocells, catalog sc-3-6, Hamamatsu Corp., Middlesex NJ, 1983. and based on some simplifying assumptions about the images being viewed. Some theories have also been proposed for quantitative analysis of color. A theoretical analysis of highlights and object color reflection has been presented to provide a way to iu-one the highlights from large portions of an image (43). Highlight color and object colors can be characterizedby vectors in color space,and each pixel on a surfacehas a color that of these. The colors on a single surface is a linear "o*bination in color space,and by anaryztng a parallelogram a form thus histogram ol such colors, the parallelogram can be found. Then, by noting each pixel's color location within this parallelogram, the rll"ti.r" amounts of highlight and object color can be determined at that pixel. The method is proposedfor such materials as paint, plastic, and paper. Rubin has presented a method for using color to distinguish material (44). changes from artifacts such as shadows and highlights The ipectral power distributions of nearby pixels may only
7. G. Wyszecki and W. S. Stiles, Color Science:Conceptsand Method,s,QuantitatiueData and Formulae,2nd ed., Wiley, New York, L982. 8. J. Wentworth, Color TeleuisionEngineering, McGraw-Hill, New York, 1955. 9. R. W. G. Hunt, The Reprod,uctionof Colour, Wiley, New York, L967. 10. M. D. Levine, Region Analysis Using a Pyramid Data Structure, in S. Tanimoto and A. Klinget, (eds.),Structured Computer Vision,AcademicPress,New York, pp. 57-100, 1980. 11. T. Ito, "Color picture processingby comput€r," Proc. of the Fourth IJCAI , Tbilisi, Georgia, 635-6 42 (L975)' L2. J. M. Tenenbaum, T. D. Garvey, s. weyl, and H. c. wolf' An Interactiue Facitity for Scene Analysis Research,TN 87, SRI International, Menlo Park, CA, January 1974' 1g. J. R. Kender, Instabilities in Color Transformations, tnPRIP-77, IEEE Computer society, Troy NY, pp. 266-274, June L977'
COMPLETENESS
t4. F. Grum and C. J. Bartleson (eds.),Optical Radiation Measure' ments, Vol. 2, Colorimetry, Academic Press, New York, 1980. 15. L. E. DeMarsh, Color Reproduction in Color Television, in Proceedings of the Inter-Society Color Council 1971 Conferenceon Optimum Reproduction of Color, Williamsburg VA, January, 1971,pp. 69-97. 16. Y. Ohta, T. Kanade, and T. Sakai, "Color information for region segmentation," Comput. Graph. Image Proc., L9r 222-241 (1980). L7. Y. Noguchi, Y. Tonjin, and T. Sugishita, "A method for segmenting a clump of cells into cellular characteristic parts using multispectral information," IJCPR- , pp. 872-874, Kyoto, 1978. 18. T. D. Garvey, "An experiment with a systemfor locating objectsin multisensory images," IJCPR-3, pp. 567-575, IEEE, Coronado, CA, L976. 19. Y. Ohta, A Region-Oriented Image-Analysis System by Computer, Ph.D. Thesis, Kyoto University, Kyoto Japan, March 1980. 20. K. Akita and H. Kuga, "Towards understanding color ocular fundus images," Proc. of the Sixth IJCAI, Tokyo, Japan, pp. 7-L2, 1979. 2L. J. Engvall et al., "Development of a mathematical model to analyze color and density as discriminant features for pulmonary squamousepithelial cells,"Pattern Recog.,13(1),37-47 (1981). 22. R. M. Haralick and G. L. Kelly, "Pattern recognition with measurement spaceand spatial clustering for multiple images," Proc. IEEE, 57, 654-665 (April 1969). 23. M. Ali, W. N. Martin, and J. K. Aggarwal, "Color-basedcomputer analysis of aerial photographs," Comput. Graph. Image Proc., 9, 282-293 (1979). 24. D. M. Connah and C. A. Fishbourne, The Use of Colour Information in Industrial SceneAnalysis, in Proc. lst Intl. Conf. on Robot Vision and Sensory Controls, Stratford-upon-Avon, UK, April 1981,pp. 340-347. 25. A. Sarabi and J. K. Aggarwal, "Segmentation of chromatic images,"Pattern Recog.,13(6), 4L7-427 (1981). 26. G. B. Coleman and H. C. Andrews, "fmage segmentationby clustering," Proc. IEEE, 67(5),773-785 (May 1979). 27. A. Rosenfeld, Some Recent Results Using Relaxation-Like Processes,in L. S. Baumann (ed.),ARPA IU Workshop,May 1978,pp. 100- 101. 28. P. A. Nagin, A. R. Hanson, and E. M. Riseman,"Studies in global and local histogram-guided relaxation algorithms," IEEE Trans. Pattern Anal. Machine Intell., 4(3),263-276 (May 1982). 29. R. Ohlander, K. Price, and D. R. Reddy, "Picture segmentation using a recursive region splitting method," Comput. Graph. Image Proc.,8, 313-333 (1978).
131
shape analysis in aerial photographs," Comput. Graph. Image Proc., 10, 195-223 (1979). Decision 37. Y. Yakimovsky and J. A. Feldman, "A Semantics-Based Theory Region Analyz€r," in Proc. of the Third IJCAI, Stanford, CA, pp. 580-588, 1973. 38. S. Rubin, The ARGOS Image Understanding System, Ph.D. Thesis, Carnegie-Mellon University Computer ScienceDepartment, 1978. 39. R. Nevatia, A Color Edge Detector, in IJCPR-3, Coronado,CA, pp. 829-832, r976. 40. R. Nevatia, "A color edge detector and its use in scenesegmentation," IEEE Trans. Sys.Man Cybern, TSMC-7(L1),820-826 (November L977). 4L. T. Kanade, "Recoveryof the three-dimensional shapeof an object from a single view," Artif. Intell., 17, 409-460 (1981). 42. R. Ohlander, Analysis of Natural Scenes,Ph.D. Thesis, CarnegieMellon University Computer ScienceDepartment, 1975. 43. S. A. Shafer, Using Color to Separate Reflection Components, Color Researchand Application, 10(4),Winter 1985,pp. 210-218. 44. J. M. Rubin and W. A. Richards, "Color vision and image intensities: When are changes material?," Biol. Cybern., 45, 215-226 (1982). 45. B. A. Wandell and L. T. Maloney, Computational Methods for Color Identification, presented at 1984 Annual Meeting of the Optical Society of America, San Diego, CA. 46. J. M. Rubin and W. A. Richards, Color Vision: RepresentingMaterial Categories,AIM 764, MIT AI Lab, Cambridge, MA, 1984. 47. R. Gershon, Empirical Results With a Model of Color Vision, in CVPR -85, IEEE Computer Society, San Francisco CA, pp. 302305, June 1985. 48. C. J. Bartleson and F. Grum, Optical Radiation Mea.surements, Vol. 5, Visual Measurernents,AcademicPress, New York, 1984. 49. Committee on Colorimetry. The Scienceof Color, Optical Society of America, Washington, DC, 1963. (Originally published in 1953 by Thomas Y. Crowell.) 50. D. Taenzer, Physiology and Psychology of Color Vision-A Reuiew, AIM 369, MIT AI Lab, Cambridge,MA, 1976. 51. A. Nazif , A Suruey of Color, Boundary Information, and Textureas Features for Low-leuel Image Processing,TR 78-7R, McGill U. Elec. Engrg. Dept., Montreal, 1978. 52. C. M. Brown, Color Vision and Computer Vision, TR 108, U. Rochester Computer ScienceDept., Rochester,NY, 1982. 53. R. Gershon,Suruey on Color: Aspectsof Perceptionand Computation, RCBV-TR 84-4, U. Toronto Computer ScienceDept., Toronto, Quebec,1984.
30. B. Schachter, L. S. Davis, and A. Rosenfeld,SceneSegmentation 54. S. A. Shafer, Optical Phenomenain Computer Vision, in Proceedby Cluster Detection in Color Space, TR 424, U. Maryland Comings CSCSI-94, Canadian Society for Computational Studies of puter ScienceCenter, 1975. Intelligence, London, Ontario, May 1984. 31. S. A. Shafer and T. Kanade, Recursive Region Segmentation by Analysis of Histograms, in Proc. Intl. Conf. on Acoustics,Speech, S. A. Suarnn and T. KaNADE and Signal Processing,IEEE,Paris, France,May L982,pp. 1166Carnegie-Mellon University 1171. 32. M. Yachida and S. Tsuji, "Application of color information to vi- COMMON LISP. See Lisp. sual perception,"Pattern Recog.,3, 307-323 (1971). 33. R. Bajcsy, "Computer identification of visual surfac€s," Comput. Graph. Irnage Proc.,2, L18-130 (1973). 34. M. D. Levine and S. I. Shaheen, A Modular Computer Vision System for Picture Segmentation and Interpretation, Part I, in PRIP-79, IEEE Computer Society, Chicago,IL, August 1979, pp. 523-533. 35. K. Sloan, World Model Driven Recognition of Natural Scenes, Ph.D. Thesis, University of Pennsylvania Moore Schoolof Electrical Engineering, June L977. 36. M. Nagao, T. Matsuyama, and Y. Ikeda, "Region extraction and
COMPETENCEUNGUISTICS. See Linguistics, competence and performance.
COMPLETENESS Completenessis a property of deductive systems,or theories, as are consistency and soundness;these terms inherit their meaning from logic (qv). Informally, a complete theory is one strong enough to allow proof of any statement that ideally one
132
COMPLETENESS
would want to prove, a consistent theory is a theory free of formal contradiction, and a sound theory is, in some sense,a "correct" theory, that is, only true statements are provable. Reference to completeness within AI has been primarily within the automated theorem-proving (qv) subarea. Although its exact importance for AI has been controversial, it is generalty agreed that completenessis less important for an AI system than soundnessand consistency(and even consistency may have to be given up in larger systems),but that completeness can be an important property when basic deductive systems are considered. The term completenesshas also been used in the context of knowledge representations (qv) in AI to indicate that the notation can represent every entity within the intended domain. Proof Procedures To be precise,a first-order theory is completeif and only if (iff) for each closedformula A of the language of the theory either A or -A is provable (1). (A first-order theory is consistent iff for each formula A at most one of A and -A is provable. A theory is sound iff its theorems are a subsetof the intended set of theorems. This last definition involves interpretations of the theory; it is dependent on the notion of "intended" and so is less used in logic than the first two definitions.) Completeness is also defined for proof proceduresand is the meaning usually intended in the automated theorem-proving field. A proof procedure for a logic (a theory) is complete iff it is capable of generating a proof for every valid (true) formula of the logic (theory). Implicit in this definition is the acknowledgment that normal forms for formulas may be used, and only the normal forms may be provable; indeed, one may view the logic as restricted to these normal forms whereupon the definition is literal. For refutation logics and their associatedproof procedures, such as resolution (qv) and its various refutation procedures, the definition is the obvious variant: The refutation procedure is complete iff the procedure is capable of generating a refutation of every unsatisfiable formula (2,3).(A refutation procedure is consistent iff for no formula A are both A and -A refutable by the procedure. The refutation procedure is sound iff every formula refuted by the procedure is unsatisfiable.) Concern with completenessentered the AI community via interest in automated theorem proving (ATP) in the late 1950s.Abraham Robinson (4) was the first logician on record as proposing that a complete proof procedurebe used for ATP (in 1954), and Prawitz, Prawitz, and Voghera (5) in 1957 implemented a proof procedurefor the first-order predicate calcuius closely r.lut.d to Beth's (14) semantic tableaux method. Several other logicians respondedto the heuristic approachto proving propositional theorems taken by Newell, shaw, and Simon (6) by showing that complete proof procedures could perform as well as heuristic procedures(actually much better at that time) and could handle a much wider scopeof problems. In particular, the procedure of Gilmore (7) that used the socalled Herbrand theorem (better named the skolemHerbrand-Godel theorem) was improved by Davis and Putnam (8) and then J. A. Robinson (9), who proposed the resolution procedure (see Resolution). Each dramatically improved an aspectof the previous procedurewhile maintaining completeness,thus being able to claim that efficiency was not gained at the expense of generality (at least as a first-order uppro*imation). (For a detailed view of this first period in ATP
history seethe opening articles in Ref. 10.) However, the difficulties encounteredin proving deeper theorems using resolution techniques, in spite of a sizable repertoire of resolution "strategies," led several AI researchersto alternate methods. In particular, both Nevins (11) and Bledsoe (L2) developed incomplete theorem provers that proved some theorems not previously proved by resolution provers and illuminated techniques that promised further gains (13). These provers and a general reaction to an apparent overemphasison completeness moved the AI community to an "anticompleteness" attitude, which is gradually decreasingin intensity as researchersgain a feeling for contexts in which completenessmakes sense.To oversimplify, completenessis useful when dealing with basic mechanisms for deduction since experienceshows that otherwise some very simple deductions may be omitted- On the other hand, control structures (qv) are often designed with little regard for completenessbecause of the desire for anything that works well in reasonable domains; moreover, re-
il::',""1T:T:1"ilii l?;::Tt*:Je
aI IiremeanthatcompIete-
BIBLIOGRAPHY 1. J. Shoenfield,Mathematical Logic, Series in Logic, Addison-Wesl"y, Reading, MA, 1967. 2. C. L. Chang and R. T. C. Lee, Symbolic Logic and Mechanical TheoremProuing, Academic Press, New York, 1973. 3. D. W. Loveland, Automated TheoremProuing: A Logical Basis, Fundamental Studies in Computer ScienceSeries,North-Holland, Amsterdam, 1978. 4. A. Robinson, "Proving theorems, as done by man, machine and of Talks Presentedat the Summer Institute logician," Su.nl,maries for Symbolic Logic, 1957,Znd ed., Institute for DefenseAnalysis, 1960. 5. D. Prawttz, H. Prawitz, and N. Voghera, "A mechanical proof procedure and its implementation in an electronic computer," J. Assoc.Comput. Machin., 7, 102-128 (1960). 6. A. Newell, J. C. Shaw, and H. Simon, "Empirical explorations with the logic theory machine," Proc. West.Joint Comput. Conf., 218-239 (1957). 7, P. C. Gilmor€, "A proof method for quantification theory: Its justification and realization,IBM J. Res.Deuel.,28-35 (January 1960). 8. M. Davis and H. Putnaffi, "A computing procedurefor quantification theory," J. Assoc.Comput.Machin.,7r 20L-2L5 (L960). 9. J. A. Robinsor, "A machine oriented logic basedon the resolution principle," J. Assoc.Comput. Machin., 12r 23-4I (1965). 10. J. Siekmann and G. Wrightson (eds.),Automation of Reasoning, Vol. L, ClassicalPapers of Computational Logic 1957-1966, Symbolic Computation Series,Springer-Verlag,Berlin, 1983. 11. A. J. Nevins, "A human oriented logic for automatic theorem proving," J. Assoc.Comput.Machin-,21 606-621 (1974). LZ, W. W. Bledsoe, "splitting and reduction heuristics in automatic theorem proving," Artif. Intell., 55-77 (1971). 18. D. W. Loveland, Automated Theorem Proving: A Quarter-Century Review, Automated Theorem Prouing: After 25 Years, Vol. 2g, Contemporary Mathematics Series, American Mathematical Society,Providence,1984. 14. E. W. Beth, "A topological proof of the theorem of LdwenheimSkolem-Godel,"Kominkl. Nederl. Akademie uan Wetenschappen, Amsterdam, Proceedings,series A, 54 No. 5 and Indagationes Math 1.3(5)436-444 (1951). D. W. Lovnlaxn Duke UniversitY
LINGUISTICS 133 COMPUTATIONAL
LINGU ISTICS COMPUTATIONAL Researchin computational linguistics (CL) is concernedwith the application of a computational paradigrn to the scientific study of human language and the engineering of systemsthat processor analyzewritten or spokenlanguage. The term natu' ral-language processing (NLP) is also frequently used, especially with regard to the engineering side of the discipline. As an historical note, the term computational linguistics included the study of formal languages and artificial computer languages(e.g.,ALGOL), as well as natural languages,until the middle 1960s, but this entry concernsCL as it is presently conceived. Theoretical issues in CL concern syntax, semantics, discourse,language generation, language acquisition, and other areas, whereas areas for applied work in CL have included automatic programming, computer-aided instruction, database interface, machine translation, office automation, speech understanding, and other areas. Historically, much CL researchhas been done by researcherswhose language interests overlap with interests in such related disciplines as AI, cognitive science,computer science and engineering, information science, linguistics, philosophy, psychology, and the speech sciences.The middle 1970s,however,witnessedan increasein hybrid efforts, so that present efforts in CL typically draw from and contribute to work in one or more of these cognate areas. This entry serves primarily as an overview of the primary topics in CL. It begins with a historical introduction to the field, followed by brief remarks on someof the more important theoretical probleffis,and concludeswith pointers to the literature. Since space has permitted only a general statement of the goals of a theory or implementation, with occasionaletcamples of either I/O behavior or internal representation formalisms, conclusionscannot be drawn from this entry alone concerning the capabilities of the work to be described.More detailed information is available in the separate entries related to the topics consideredhere.
EarlyWork (1950-1965) Most CL work prior to 1960 concernedmachinetranslation, as defined below, but the advent of transformational grarnrnar and the emergenceof paradigms for information retrieuol also played an important role in the formation of a CL community. is a discussion of the essential work in these three ilji::ins Machine Translation.Many of the first attempts at using computers to processnatural language concernedthe problem of translating from one natural-language text into another. Although actual computer programs seeking to solve this task were not written until the early 1950s,the idea of mechanical translation can be traced to conversations as early as L946 betweenWarren Weaver and A. D. Booth. The initial impetus came in 1949, when Weaver wrote and privately circulated a paper titled "Translation" (1). This paper, along with a detailed account of initial work in machine translation, can be found in Locke and Booth (2). Most early work on machine translation, also known as automatic translation, mechanical translation, or simply MT, was conductedin the United States and the USSR, where the
political and military interests in natural-language translation were especially strong. There were also two British projects and somework done in Italy, Israel, and elsewhefe.Typically, efforts at machine translation, which predated the important work in linguistics and computer scienceon syntax, gfammars, and languag€s,were basedon word-by-wordtranslation schemes.In particular, no attempt was made to "parse" sentences (i.e., determine their syntactic structure) and, at least as significantly, no attempt was made to actually "understand" the material to be translated. A char acterization of the basic approach of word-for-word processing can be found in Ref. 3. As an example of what had been achieved by about 1960, the first sentenceof a L956 Russian article yielded the output "'razviti' electronics (allowed permitted) (considerablysignificantly considerablesignificant important) to (perfectimprove) (method way) 'frt' (measurement metering sounding dimension) (speedvelocity rate ration) (light luminosity shine luminous)," where parenthesesindicate uncertainty on the part of the system and where razuiti and fiz were unknown and thus untranslated (ft.zderives from a proper name). From this output, a human posteditor produced"Development of electronics permitted considerably to improve method Fizeau of measurement of speedof light," which may be comparedwith the fully human translation "The development of electronics has brought about a considerableimprovement of Fizeau'smethod of measuring the velocity of light." This example is discussed in detail in Oettinger (3). Concerning the distinction between fully automated as opposedto machine-assistedhuman translation, evenBar-Hillel, an outspoken detractor of much MT work, observed that "word-by-word Russian-to-English translation of scientific texts, if pushed to its limits, is known to enable an English reader who knows the respective field to understand, in general, at least the gist of the original text, though of coursewith an effort that is considerably larger than that required for reading a regular high quality translation" (4). Nevertheless, researchers and government funding agencies continued to anticipate systems that would provide "fully automatic highquality translation" (FAHQT). It was with respectto this more ambitious goal that the Automatic Language ProcessingAdvisory Committee (ALPAC) was formed in April of 1964 "to advise the Department of Defense, the Central Intelligence Agency, and the National ScienceFoundation on researchand development in the general field of mechanical translation of foreign languages." In essence,the committee found that "there has been no machine translation of general scientific text, and none is in immediate prospect" (5). They further observedthat, in some cases,"the posteditedtranslation took slightly longer to do and was more expensive than conventional human translation" and also noted that "unedited machine output from scientific text is decipherablefor the most part, but it is sometimesmisleading and sometimeswrong (as is posteditedoutput to a lesser extent)." Although the ALPAC committee had presumably intended its report to effect "useful changesin the support of research," their findings resulted in the virtual elimination of federal funding for work in MT. As a consequence,very little work was done, and few papers published, for roughly a decade. Since the middle 1970s,however, a number of projects have been spawned or reactivated. The entry on machine translation provides technical details and also discussesmore recent work in the area.
134
COMPUTATIONALLINGUISTICS
TransformationalGrammar. In 1957 an event occurredthat not only revolutionized the world of linguistics but left a lasting impression on philosophy, psychology, and other areas. That event was the publication of a short monographby Noam Chomsky entitled Syntactic Structures (6) that explored the implications of automata theory for natural language. In it, Chomsky first argued that the sentencesof a natural language cannot be meaningfully generatedby a finite-state machine or by any context-free glammar, or at least that "any grammar that can be constructed . . wiII be extremely complex,ad hoc, and 'unrevealing"' (7). He then proposeda theory of what he called transformational grarnrnar (TG) and began to work out its details. At the most abstract level the theory of TG involves specifying a set of "kernel" sentencesof a language; an assortment of "transformations," such as verb tensing and passivevoice;and an ordering in which transformations are to be carried out. For example, to avoid producing a sentencesuch as "John are liked by the students," the passivetransformation must apply to the kernel sentence"The students liked John" before the rule for subject-verb agreement. The entry on transformational grammar provides details of the theory. With the publication of Syntactic StruchtreE,Chomsky had argued for, if not established, the efficacy of a transformational component,but he recognizedthat TG would have to be "formulated properly in . . terms that must be developedin a full-scale theory of transformations." As a suggestive first step, his appendix provided a sample grammar for a very small subset of English that included 12 content words and fairly elaborate auxiliary verb structures. The period from 1957 to 1965 was one of intense activity by Chomsky and several students, culminating in 1965 with the publication of Aspectsof the Theory of Syntax (8) and its far-reaching theory of deep structure, which relates to an internal sentence-independent representation of (the meaning of) the sentence. Although TG has had an uneven impact on CL, centered mostly around matters of syntax, its influence on early work in CL is evidenced through bibliographic references and, more substantively, by conceptsand borrowed terminology that appeared in the CL literature of the 1960s.In the long term the hypothesis of TG most significant for work in CL is that an understanding of the syntax, or structure, of natural-language sentences can be arrived at on a solely grammatical basis, without consideringthe real-world properties(e.g.,meanings) of the terms being discussed.This notion, sometimesknown as the "autonomy of syntax," continues to provide a useful, if regrettable, division in categorizingcurrent work in CL, as the debate continues as to what interactions are desirable, or necessary,between the structural (syntactic) and interpretive (semantic, pragmatic) componentsof a theory or implementation.
is "concernedwith the structure, analysis, organization, stor&Ee,searching, and retrieval of information" and has grown to include proceduresfor "dictionary construction and dictionary look-up, statistical and syntactic language analysis methods, information search and matching procedures,automatic information dissemination systems, and methods for user interaction with the mechanizedsystems"(9). Although little association remained between CL and IR by the middle 1960s,early work in IR did overlap that being doneby the early workers in CL. The evolution of work in IR is chronicledin Refs. 9-12. Broadeninglnterests(1960-1970) In contrast to the 1950s, during which time CL researchers concentratedprimarily on machine translation, the 1960switnessedthe application of CL techniques to databaseretrieval, problem solving, and other areas. For the most part, these early NL systems provided quite limited forms of interaction and were often based on techniques specifically tailored for a single domain of discourse. Nevertheless, the work represented interesting and important, if tenuous' first steps at seeking computational solutions to problems of human language processing.In addition, Raphael notes that these programs "contain the seeds,or at least surfaced the issues,that led to many of today's major computer science concepts:semantic net representations, data abstraction, pattern matchirg, object oriented programming, syntax-driven natural language analysis, logic programming, and so on" (13). One important aspectof CL implementations of the 1960s, largely without counterpart in CL work of the 1950s,was that the "processing" to be done required programs to understand their inputs to some nontrivial degree.For example, although Bobrow recognized that "we are far from writing a program that can understand all, or even a very large segment,of English" (L4), he claimed that "a computer understandsa subset of ntrglish if it acceptsinput sentenceswhich are members of this subset and answers questions based on information contained in the input" (15). This issue was not without controversy, however, as suggestedby Giuliano's complaint that an . which is used in several "arbitr ary heuristic procedure not "becomea principle does computer programmed systems" (16). Simmons QD reargument, this To its use" through spondedthat "theory often lags far behind model building and sometimes derives therefrom" and further maintained that the early systems represented "truly scientific approachesto the study of language" (18). The following discussion seeks to convey a sense of the problems addressedby NL applications in the 1960s.They are gfoupedin terms of question-answering,problem-solving,consultation, and miscellaneoussystems.
Question-Answeringsystems. one of the first fully implelnformation Retrieval. It is fairly well known that the emerdata retrieval systems was BASEBALL (qv)' "a commented the during gence of the modern digital computer occurred that answers questions posed in ordinary En' 1g40s and that the probi=emsfirst solved by these computers puter program gtislt alout data in its store" (19). This system was designedto were numerical in nature and often military in origin. Iiuring interact with a primitive database, stored as attribute-value providi the 1g50s computers were increasingly called opoi to pairs, that contained information about the month, day, place, p,rrpor", for such autu accessto large volumes of nonnum"ri, teams, and scores for American League baseball games' An as database retrieval and on-line bibliographic search. that provided for example input is "What teams won 10 games in July?" Most systems of the 1950sand early f-SOOs ,,inputs,, were directed toward biblio$aphic search Another early program was SAD SAM, desigrredto "parse English written in Basic English and make inferences about sentences into a coalesced efforts these and other library services, and (IR), which kinship relations" (20). This system comprised two modules, field that becameknown as,,information retrieval"
COMPUTATIONALLINGUISTICS
one for parsing (the syntactic appraiser and diagramer, SAD) and one for semantic analysis (the semantic analyzing machine, SAM). The basic operation of the semanticsmodule involved searching a previously constructedparse tree for words denoting kinship relationships in order to construct a family tree, which was stored as a linked structure. The SIR system had the goal of "developing a computer [program] . . having certain cognitive abilities and exhibiting some humanlike conversational behavior" (2L). The system was similar to SAD SAM in allowing a user to input new information, then ask questions about it. However, SIR emphasizedrelations such as set-subset,part-whole, and ownership, as suggestedby the following: Every boy is a person. A finger is part of a hand. Each person has two hands. John is a boy. Every hand has 5 fingers. How many fingers does John have?" The DEACON system, which was designedto answer questions about "a simulated Army environment" (22), represents an important precursor of the databasefrontends of the 1970s. Its internal "ring"-like data structures could be dynamically updated, thus enabling users to supply new information ("The 425th will leave Ft. Lewis at 21950!")as well as ask questions ("Is the 638th scheduledto arrive at Ft. Lewis beforethe 425th leaves Ft. Lewis?"). In reflecting upon their experienceswith DEACON, the authors noted that "perhaps the most significant new feature needed is the ability to define vocabulary terms in English, using previously defined terms" (zil. This rcalization led directly to the REL system and its successors. The REL system (Rapidly Extensible Language) represented the logical continuation to the work with DEACON, and its primary goals were "to facilitate the implementation and subsequentuser extension and modification of highly idiosyncratic langu,ageldatabase packages"(24). An example customization is def:power coefficient:high speedmemory srzeladdtime From a theoretical standpoint, REL was based on the notion that an English language subset could be treated as a formal language "when the subject matter which it talks about is limited to material whose interrelationships are specifiablein a limited number of precisely structured categories"(25). The first sizable application of REL was to an anthropological database at Caltech of over 100,000items. As indicated below, work on the REL project continued well into the 1970s,until the system,now quite advancedover its early prototyp€s,was renamed ASK. Another early database interface, CONVERSE, was designed as an "on-line system for describi.g, updating, and interrogating data bases of diverse content and structure through the use of ordinary English sentences"Q0. It was intended to strike "a reasonablecompromisebetweenthe difficulties of allowing completely free use of ordinary English and the restrictions inherent in existing artificial languages for data base description and querying" (26). An example input is "Which Pan Am flights that are economy class depart for O'Hare from the city of Los Angeles?" In addition to questionanswering capabilities, the system included facilities for English-like data definitions and English-like means of populating the database.
135
in a natural language within somerestricted problem domain" (14). It sought to solve high-school-level algebra word problems stated in what the author considereda "comfortable but restricted subset of English" by constructing an appropriate set of linear equations to be solved. As an example of STUDENT's capabilities, a sample problem is: The price of a radio is $69.70.If this price is 157oless than the marked price, find the marked price. Having been given equations such as "distance equals speedtimes time" and by employing a number of clever pattern-matching techniques (e.g., "years older than" is converted to "plus"), STUDENT could answer somevery complex problems. Its designer suggestedthat the system "could be made to understand most of the algebra story problems that appear in first-year high-schooltextbooks" but noted that "the problems it cannot handle are those having excessiveverbiage or implied information about the world not expressible in a single sentence" (27). In building upon Bobrow's work with STUDENT, Charniak set out to develop a program to solve calculus word problems (28). This led to the implementation of the CARPS system, which was restricted to freshman-level problems related to rates, as Bobrow had in fact suggestedin his paper. An example input processedby CARPS follows. Water is flowing into a conical filter at the rate of 15.0 cubic inches per second.If the radius of the base of the filter is 5.0 inchesand the altitude is 10.0 inches,find the rate at which the water leuelis rising when the uolumeis 100.0cubic inches.(29). In reflecting on his work, Charniak noted that, despite its advances over STUDENT, "a very powerful calculus word 'common sense' problem solver will require a good deal of knowledge" (30). In advancing this claim, Charniak foreshadowed an interest on the part of the AI community in the 1970s in story understanding. Consultation.One of the most widely publicized AI programs of the 1960s,the ELIZA (qt) system, was designed"for the study of natural language communication between man and machine" (31). In essence,ELIZA was intended "to respond roughly as would certain psychotherapists(Rogerians)" since "the psychiatric interview is one of the few examples of categorizeddyadic natural language communication in which one of the participating pair is free to assume the poise of knowing almost nothing of the real world" (31). An excerpt from a "typi cal" conversation with ELIZA follows. "I need some help, that much seemscertain." WHATWOULD IT MEAN TO YOU IF YOU GOT SOME HELP "Perhaps I could learn to get along with my mother." TELL ME MORE ABOUT YOUR FAMILY "My mother takes care of me." WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU "My father." YOUR FATHER (31)
As can be inferred after considering a dozenor so responses ProblemSolving. The STUDENT (qt) system was designed from the system, ELIZA sought to match its current input as "a computer program that could communicate with people against one of a known set of patterns. It then selectedone of
136
COMPUTATIONAT TINGUISTICS
possibly several transformations associatedwith that pattern. Actually, patterns were associatedwith a keyword, and the algorithm considered various priorities in choosing among candidate matches. The idea of maintaining a "script" of data separate from the algorithms of the program itself was not without precedent, but ELIZA carried this out more fully than had previous systems. In addition to its technical contributions and the excitement it caused,the system convincedat least its designerthat "the whole issue of the credibility (to humans) of machine output demandsinvestigation" (31). This thought led Weizenbaum to his widely publici zedsocial criticisms of AI research (32). An interesting and also famous follow-up of ELIZL, in which the program played the role of the patient rather than the analyst, is reported in Colby et al. (33). Miscellaneous.Within the tradition of information retrieval established in the 1950s,but with greater attention to syntax and other linguistic issues,Protosynthex sought to accept natural English questions and search a large text to discover the most acceptablesentence,paragraph, or article as an answer (17). The system was applied to portions of Compton's Encyclopedia,and an example of a question posedto the system is "what animals live longer than men?" The project continued for several years and evolved into "a general purpose language processor. . basedon a psychologicalmodel of cognitive structure that is grounded in linguistic and logical theory" (34). A few systemswere designedto produce English output, as describedby Simmons (17). One system, NAMER, was designed to generate natural-language sentences from line drawings displayed on a matrix (35). It produced sentences such as "the dog is beside and to the right of the boy." Another system, the Picture Language Machine (36), would be given a picture and a sentenceas input and, after translating both the picture and the English statement into a common intermediate logical language, would determine whether the statement about the picture was true. An example input is "all circles are black circles." (1965-1970) FormalismDevelopments In addition to system-building activities, a number of formalisms were developedduring the 1960s,especially in the latter half of the decade,relating to linguistic, psychological,and other aspectsof natural languages.Basedon experienceswith previous attempts to construct natural-language-processing systems, and upon developments in linguistics and various areas of AI, these formalisms provided more sophisticated ways of representing the results of a partial or completeanalysis of inputs to an NL system. A few of the more important of these formalisms are summanzed here, namely, augmented transition networks (seeGrammar, augmented transition networks), casegrammar (qv), conceptualdependency(qv), procedural semantics (qv), and semantic networks (qt). Further details appear in the individual entries. By extending the expressivenessof the transition network models describedby Thorne et aI. (37) and Bobrow and Fraser (38), which were themselves based on the basic finite-state machine model stemming from work in formal language theory, Woodsdevelopedan augmented transition network (ATN) model for the syntactic analysis of natural-language sentences
(39). One of the primary advantagesof the ATN model over its predecessorsrested in its "hold-register" facility, which alIowed information to be passed around in a parse tree under construction. This enabledthe handling of deeply nestedstructures and other syntactic complexities.The hold-register facility derives, at least in spirit, from the desire to construct the "deep structure" correspondingto a sentenceunder analysis, a conceptderiving from work in transformational grammar. The theory of casegrammar, as proposedby Fillmore (40), expands on the view that "the sentencein its basic structure consistsof a verb and one or more noun phrases,each associated with the verb in a particular caserelationship." For instance, Fillmore observes that understanding the sentence "The hammer broke the window" involves recognuzrngthat the noun hammer acts differently from John in John broke the window." Specifically,it is an instrumenf ("the inanimate force or object causally involved in the action") rather than an agent ("instigator of the action identified by the verb"). Fillmore's original theory included these and six additional case roles. One important aspect of casegrammar theory is its distinction between "surface" roles (e.g., subject) and "deep" cases(e.9., agent or instrument). Bruce (4L) provides a survey of ways in which the notion of case grammar was taken up by computationalists in the 1970s. Having adopted a view that language-processingsystems should not produce a syntactic analysis of an input divorced from its meaning, Schank proposeda conceptualdependency (CD) model of language and exhibited its operation in the context of an implemented parsing system (42). Deriving Ioosely from ideas to be found in Hays (43), Kay (44), and Lamb (45), CD is based on a small number of "conceptual categories," including picture producers (PPs), PP assisters (PAs), actions or abstract nouns (ACTs), and ACT assisters (AAs). Developments in the original theory, including more sophisticated conceptual categories such as mental information transfer (MTRANS) and ingestion (INGEST)' are outlined in Schank (46). In addition to its central role in the development of the MARGIE syst€ffi,discussedbelow, CD contributed to philosophical discussionsconcerningthe role of "primitives" in theories of meaning. In seeking to develop "a uniform framework for performing the semantic interpretation of English sentences"(47), Woods devised a framework that he termed "procedural semantics" that acted as an intermediate representation between a language analyzer, e.g., a question-answering (qv) system, and a back-end database retrieval component. In essence,the idea behind procedural semantics is to define, given a particular database,a collection of "semantic primitives" that comprisea set of predicates,functions, and commands.This strategy was first demonstrated in the context of a hypothetical questionanswering system for an airlines reservation system and was soon to be used in building the LUNAR syst€ffi, as described below. Motivated by work in linguistics and psychology and attempting to formulate "a reasonable view of how semantic information is organizedwithin a person'smemory" (48), Quillian proposeda memory model that has cometo be known as a sernantic network. Although precursors of semantic networks are to be found in the use of property lists by designersof early NL systems, Quillian provided a theoretical and more formal treatment. In essence,a semantic network consistsof a set of "nodes," typically representing objectsor concepts,and vari-
COMPUTATIONALLINGUISTICS
ous "arcs" connectingthem that are typically labeled to indicate a relation betweennodes.Quillian's initial use of his network structures involved their role in making inferences and finding analogies. Semantic networks have been important not only becauseof the many systems that incorporate them but also in their contribution to the developmentin the middle 1970s of various theories of knowledge representation. The evolution of semantic network structures, together with a discussion of applications based on them, is traced by Simmons (49), Findler (50), and Sowa (51).
137
that Winograd felt the system could successfullyprocess"(58). Nevertheless,SHRDLU was an impressive demonstration system that rekindled the hope of truly "natural" Ianguage-understanding systems and touched upon many still unsolved research topics.
LUNAR. The task of LUNAR, a system deriving from the work discussedabove on procedural semantics,was to provide lunar geologists with a natural-language interface to the ApoIIo moon rock database.The system had three main components. The first phase formed a syntactic parse using an elaborate ATN grammar and a dictionary of 3500 words. The A TurningPoint (ca. 1970) parser created a deep-structure representation, which was In the aftermath of disappointing results from work in ma- then passedto a rule-driven semantic interpreter. The antechine translation (qv) in particular and the difficulty of con- cedent of a semantic rule specified a tree fragment to be structing sophisticated natural-language-processingsystems matched against the deep-structure representation plus sein general, two natural-langu age projects in the early 1970s mantic conditions on the matched nodes.The right side of a captured a degree of attention that served to boost the confi- semantic rule was a procedural template for the final, redence of AI researchers regarding the prospectsfor broadly trieval component.For example, the sentence based, well grounded NL systerns.These projects, which are What is the average concentration of aluminum in high discussedin turn, were the SHRDLU (qv) system of Winograd alkali rocks? (52) and the LUNAR (qv) system describedin Woodset al (53) was translated as and Woods 6$.
SHRDLU. Winograd's SHRDLU systemprovideda naturallanguage interface (qt) to a simulated robot arm in a domain of blocks on a table. The systemcould handle imperatives such as "Pick up a big red block," questions such as "What doesthe box contain?" and declaratives such as "The blue pyramid is mine." Since SHRDLU maintained information about its actions, it could also be asked questions such as "Why did you pick up the green pyramid?" to which the system might respond "to clean off the red cube." The primary design principle of SHRDLU was that syntax, semantics, and reasoning about the blocks world should be combinedin understanding natural-language input. The main coordinator of the system was a module (effectively a parser) consisting of a few large programs written in a special programming language called PROGRAMMAR, which was embedded in LISP. These programs correspondedto the basic structures of English (clauses, noun groups, pr€positional groups, etc.) and embodieda version of the systemic grammar theory of Halliday (55). A semantics module that was similarly organized coordinated with the parser and made calls to a reasoning system programmed in MICROPLANNER (qv), a theorem-proving (qv) language. Procedural representations for most of the knowledge in the system gave SHRDLU a considerable amount of flexibility to integrate semantic and pragm attc tests, to apply heuristic procedures for anaphora resolution, etc. The successof the procedural representations sparked the procedural-declarative controversy (56), which led to the identification of important knowledge representation issues. In the final analysis, many have agreed with Wilks (57) that SHRDLU's power seemsto derive in large measure from the constraints of its small, closeddomain and that the techniques would fail to scale up to larger domains. Furthermore, the grammatical coverageof SHRDLU was spotty in the sense that "although a large number of syntactic constructions occur at least once in sample sentencesappearing in the published dialog, our attempts to combine them into different sentences (involving no new words or concepts)produced few sentences
(FOR THE X13 / (SEQL (AVERAGE Xr4 t (SSUNION X15 / (SEQ TYPEAS):T; (DATALTNE (WHQFTLEX15) X15 (NPR*X16 / (QUOTE OVERALL)) (NPR*XI7 / (QUOTE AL203)))):T)):T; @RINTOUT X13)). The databasewas a flat fiIe containing 13,000entries. Run time performance of the system was acceptable;the sentence abovewas parsed in just under 5 s. In an informal demonstration of the system at the SecondAnnual Lunar ScienceConferenceheld in Houston in January of 197L,78Voof the 111 requests were handled without error. After correcting minor diction ary coding errors, this rate was improved to 90Vo. In discussingthe coverageof the system, Woodsconsidered the syntactic coverageto be "very competent" but noted that "tf a fiunar geologist] really sat down to use the system to do some research he would quickly find himself wanting to say things which are beyond the ability of the current system" (54). In summary, the LUNAR system demonstrated that a sizable, important databaseproblem could be handled using the techniquesof ATNs and procedural semantics. A Varietyof ApplicationAreas(1970-1984) Following the technical advancesof the 1960sand in the wake of the rather dramatic work of Winograd and Woods,the period from the early 1970switnessed a variety of applied natural-language projects. Application areas include database interface, computer-aided instruction (qv), office automation (qv), automatic programming (qt), and the processingof scientific text. These areas are discussedin turn. Databaselnterface. Typical NL databaseinterfaces operate by translating English or other natural-language inputs into a formal databasequery language to be run against an existing relational or other databasemanagement system. For a number of reasons,this formed the most frequent application area
138
COMPUTATIONALLINGUISTICS
for applied NL work in the 1970s.First, the growing presence of database systems in business and industry resulted in a rapid increase in the number of potential computer users, many of whom preferred not to have to learn a "formal" computer language. Second,the idea of database query followed logically from the question-answering mode of many NL systems of the previous decade.Third, the NL system designer could, by starting with an existing set of data and by assuming an implemented back-end retrieval module, avoid the need to addresslow-level representation issues. An early attempt at providing natural-language accessto a relational database is the RENDEZVOUS system, which emphasized human factors and conceptsderiving from the database world, without less attention to techniques developedin AI and CL. The primary design goal of the system was "to accept queries stated in any English, grammatical or not, rejecting only those that are clearly outside the domain of discoursesupportedby the data base at hand" (59).To accomplish this, RENDEZVOUS temporarily ignored portions of the input it could not recognize and, to compensate,engagedthe user in "clarifi,cation dialog" to refine its understanding. A representative initial input to RENDEZVOUS is "I want to find certain projects.Pipeswere sent to them in Feb.I975." To help ensure reliable processing,RENDEZVOUS provided a paraphrase of its current understanding of the user's request. In situations where a phrase was ambiguous to the system, a request for clarification was generated.Although RENDEZVOUS tended many seemingly to overburden the prospective user with too((37" in "part 37" pointless questions (e.g., it thought that might be a quantity on order rather than a part numbet), it addressed human-factors issues that were taken up by PLANES (qv) and other sYstems. A sharp contrast to the previous system in terms of linguistic sophistication is found in TQA, formerly REQUEST (60), whose syntactic processingis basedon the principles of transformational grammar. This decision was made "in an attempt to deal with the complexity and diversity that are characteristic of even restricted subsets of natural language" (60). In essence,the parsing involves applying transformations in reuerseto reconstruct the deep structure associatedwith a question to be answered. Details 6oncerning this process,and further motivation for it, are given in Petrick (61). Initial applications included Fortune 500 data and a database on White Plains land usage.During the latter field study, a set of "operating statistics" was collected(62). A system intermediate between the previous two in terms of linguistic sophistication is LADDER (63), whose syntactic pro..tsing was based on a "semantic gTammar" designed around the object types of the domain at hand, such as ship and port, rather than linguistically motivated lexical and syntactic categories,such as noun and verb (seeSemantic grammar). The system provided an ability "for naive users to input English statements at run time that extend and personaltze the language acceptedby the system" (64). The specificset of ,,tools" r..king to "facilitate the rapid construction of natural language interfaces" $4) was called LIFER (qv). The system contaitred facilities for the user to add synonyms and define paraphrases; mechanisms to handle ellipsis (incomplete inputs) were also provided. For example, after asking "What is ln" salary of Johnson?"the user could type "position and date hired" and the system would answer the question "What are the position and date hired of Johnson?" The development of
LADDER helped to reassure the CL community that "genuinely useful natural language interfaces can be created and that the creation processtakes considerably less effort than might be expected,'(64). Hershman et al. (65) describean experiment in which LADDER was used for a simulated Navy operation. search-and-rescue Other interesting and important NL databasesystemswere constructed, but space permits only brief descriptions. The TORUS system (66) representedan early attempt at formulating an "integTated methodology" for designing NL database front ends. It employed a semantic network representation, and the prototype was developedaround a databaseof graduate student files. As described in Thompson and Thompson (67),work continued on the REL system mentioned earlier and eventually gave rise to the desk-top systems POL and ASK (68,69).In addition to manifesting refinementsand extensions over earlier work, ASK was extended to allow for French as weII as English inputs. Results of an experimental study with the REL-ASK family appear in Thompson(70).EUFID (7L,72) was also designed to be database independent and, like LADDER, provided a "personal synonym" facility designed to be "forgiving of spelling and grammar errors" (71). Several systems seeking to addressthe issue of cooperative responsewere designed.For instance, the construction of COOP (73) was based on the belief that "NL systems must be designedto respond appropriately when they can detect a misconceptionon the part of the user." For example, the probable presumption of a user who asks, "how many students got a grade of F in CIS 500 in spring I97 7" is that the coursewas in fact given at the time in question. If this were not the case, CO-OP would so inform the user, rather than simply give the literal but misleading answer "nobody."The PLANES system (74) was based on the notion that an effective NL system "must be able to help guide and train the user to frame requests in a form that the system can understand." According to tne designers,the work derives in spirit from Codd'swork (59) with RENDEZVOUS. For instance, PLANES incorporated novel techniques of "conceptcaseframes" fot generating dialogue to flesh out an incomplete understanding of the user's request and "context registers" for handling pronouns and other anaphora. JETS, a successorto PLANES, respondedto some "interesting questions about the conceptual completeness of question-answering systems" (75) that arose during experiments with the earlier system (76). A number of high-quality European systems were developed,each manifesting someinterest in domain independence. The USL system gT was designedin Heidelberg as a domainindependent,German langu agedatabasefront-end. It incorporated a "revised version" of a parser built by Martin Kuy, and "the method of [semantic] interpretation used in the REL system . . was taken as a point of departure" (78). Additional grammars were constructed to enable USL to answer questions posedin English, Dutch, and Spanish as weII as German. A user study with the usl, system was reported by Krause (Zg). PHLIQA was built in Eindhoven to answer English questions about information stored in a CODASYL database (80). It operated on hypothetical data concerning computers installed at European establishments and was intentionally cause to include "some features that structured 'real' databases" in found also are which . and . difficulties (g1). Attention was paid to isolating the parts of the system that dependedon the chosendata base,and an effort was made
COMPUTATIONALLINGUISTICS
"to derive the parts dependent on a data base in a systematic way from the structure of the data base and its subject domain" (81). HAM-ANS (82) was developedin Hamburg as "a robust and cooperative system" to enable natural-language accessin German to database and other software services. The ROBOT system (83,84), rather than emphasizing linguistically complex (e.g., deeply nested) English structures, represented a database front-end with concern for report generation facilities, as suggested by the representative input "Print a report of direct commission, net loss incurred, and change in INBR for region B, sorted by net loss." The system was interesting in its use of the databaseas a world model in which to carry out disambiguations, but its primary importance is that it led in the late 1970sto a product for mainframe databasequery, INTELLECT (qv) and the concomitant founding of Artificial Intelligence Corporation. More recent systems for database and information retrieval, which represent an important direction but for which spacedoes not exist and for which a historical perspective is not yet possible,are mentioned below in the sectionon current trends. Also, there has been a small amount of attention to providing natural-language facilities for database update, &s opposedto query. Examples of this work are found in Salveter (85) and Davidson and Kaplan (86). Computer-AidedInstruction. The first attempt at incorporating AI and CL techniques into an integrated system for computer-aidedinstruction (CAI) was the SCHOLAR (qv) program (87). By representing the information to be learned in serrlantic network structures, this system was designed to be independent of the actual "lesson" at hand, which for the prototype consistedof information about South American geography. In particular, "no specific pieces of text, questions, with their predicted answers, errors, or anticipated branchitrg form part of this data structure" (87). An example interaction with SCHOLAR follows. THE LANGUAGE IN BRAZIL IS FRENCH. RIGHT OR WRONG? "'Wrong" YOU ARE CORRECT. THE LANGUAGE IN BRAZIL IS PORTUGESE. As indicated, the system included a nontrivial English-generation component.More significant, however, was the provision for "mixed-initiative" dialogues, where the user could ask questions of the system at any time rather than supply an answer for a pending question. For example, if the system had just asked "What is Brazil," the student might delay answering the question by asking the system to "Tell me somethittg about Peru." In addition, the system allowed its user to switch to a question-answering mode, thus acting as a question answerer for the data at hand. Historically, however, SCHOLAR was most important by establishing the feasibility of mixedinitiative NL systems. In continuing the work begun with the initial SCHOLAR prototype, Collins et al. (88) extended the system's basic matching mechanisms to enable it to operate in situations in which requested information is missing. The resulting system would sometimes make inferences and sometimes inform the questioner that the requested information was not available.
139
In the latter case, related or partial information might be given. As an example, the system would respond to the question "Is Bolivia a capital?" by answering "No, Bolivia is a country, not a capital." In formulating this response, SCHOLAR has used its knowledge that country and city are mutually exclusive object types and only a city can be a capital. Another extension to the work of SCHOLAR is represented by the SOPHIE (qv) system, which sought to encourage "a 'reactive' wider range of student initiatives" by presenting "a environment in which the student learns by trying out his ideas rather than by instruction" (89). To explore this idea, SOPHIE was designed as an electronics lab troubleshooting system that would "mimic the roles of a human teacher" (89). To use the system, a student would receive a schematic diagram of a power supply and be told the control settings of a simulated faulty circuit. An example interaction follows. "What is the output?" THE OUTPUT VOLTAGE IS 11.7 VOLTS. "What is it in a working circuit?" IN A WORKING CIRCUIT THE OUTPUT VOLTAGE IS 19.9 VOLTS. "What is the voltage between R8 and Q1?" THE VOLTAGE AT Nl IS 47.6VOLTS. When the student finally settled on a diagnosisof the problem and recommended a course of action, such as "replace Q6," SOPHIE would seizethe initiative and begin asking questions of the student. In addition, SOPHIE could alter the parameters of its simulated circuit, thus providing "what-if" capabilities. Despite the fact that it was carefully designedfor a troubleshooting application by the use of a "semantic" grammar and thus lacked the domain independence of SCHOLAR, SOPHIE contained a variety of non-Nl, capabilities that themselves proved to be interesting and important. Subsequentto the efforts reported above,Weischedelet al. (90) constructed a system to aid students in learning first-year German. The designers were interested, among other things, in enabling computers to deal with ungrammatical sentences, and in their chosensetting, it was mandatory for the system to respondmeaningfully to inputs that were linguistically flawed as well as those that were factually incorrect. An example of such a response follows (the system's question translates as "Where did Miss Moreau learn German?"). WO HAT FRAULEIN MOREAU DEUTSCH GELERNT? "Sie hat es gelernt in der Schule." ERROR:PAST PARTICIPLE MUST BE AT END OF CLAUSE. A CORRECT ANSWER WOULD HAVE BEEN: SIE HAT DEUTSCH IN DER SCHULE GELERNT. In addition to detecting incorrect grammar in the context of an otherwise acceptableresponse,the system was able to recoglarzewhen an input was incorrect or, more subtly, correct but not fully responsive to the question. As suggestedabove, the tutoring program dealt with reading comprehension,and the prototype was applied to several "lessons,"each consisting of a
140
COMPUTATIONALLINGUISTICS
paragraph. Concerning generality, the designers pointed out that "the texts that appear in foreign language textbooks very rapidly surpass the ability of artificial intelligence systems (90). They also observedthat "there doesnot seem to be any way to tune the system to particular types of errors," which means that an instructor would have to construct each lesson by hand, unlike for the semantic network approachadoptedfor SCHOLAR (which carefully avoided storing textual information). Perhaps the most significant outcome of the project was to demonstrate ways in which "ill-formedness" can extend to morphological, semantic, and pragmatic problems as well as syntactic ones. The ILIAD system (91) was conceivedas a way of helping instruct people having a language-delaying handicap (e.9., deafness)or who are learning English as a secondlanguage. It included a powerful English generator based on the transformational grammar model. Office Automation. The SCHED system was basedon techniques and formalisms developerlfor the automatic programming system NLPQ describedbelow and representedan initial study of "the feasibility of developing systems which accomplish typical office tasks by means of human-like communication with the user" (92). Atthough the long-range goal of SCHED was to provide an on-line system to review and update one'sown desk calendar and those of fellow office workers, the implemented system was restricted to information pertaining to a single user. An example input for SCHED is Schedulea meeting, Wed, ffiY office, 2 to 2:30, with my manager and his manager, about'a demo'. to which the system would respond by stating in English its understanding of the input. Subject to user verification, the system would issue an appropriate command to a resident calendar management system. In situations where a user input failed to supply all necessary information, SCHED was able to ask for specific information, thus providing for mixed-initiative conversations reminiscent of the previously mentioned work in CAI. The GUS system, similar in spirit to SCHED, though quite different in its methods, was "intended to engage a sympathetic and highly cooperative human in an English dialog, directed towards a specifi.cgoal within a very restricted domain of discourse"(93).In particulat, GUS played the role of a travel agent able to assist a user in makittg a round trip from a city in California. Although its implementation was apparently less robust than SCHED, its designerssuggestedthat "the system is interesting becauseof the phenomenaof natural dialog that it attempts to model and becauseof its principles of program organization" (93). The VIPS system, which seeksto "allow a user to display objects of interest on a computer terminal and manipulate them via typed or spokenEnglish imperative sentences"(94), is unusual in that it incorporates a hardware voice recognuzer into an NLP. Initial applications have been to the numerical domain of its predecessor(the automatic programming system NLC) and to text editing, where objects may be referenced either in English or by use of a touch-sensitivedisplay screen. ARGOT represents a "long-term" research project seeking to "partake in an extended English dialogue on somereasonably well specifiedrange of topics" (95). The initial task domain for ARGOT was that of a computer-centeroperator.
The UC system is designed as "an intelligent natural-language interface that allows naive users to communicate with the UNIX(TM) operating system in ordinary English" (96). It answers questions such as "How do I print the file fetch.l on the line printer?" FinaIIy, some research has applied CL techniques to the analysis, as opposedto processing,of written texts. One such system is the Writer's Workbench (97), which, upon scanning the draft of a document, flags words, phrases, and sentence types that have been felt to negatively affect the readability of the text. For example, the previous sentence could be improved by eliminating the passivevoice ("have been felt") and unsplitting the infinitive ("to negatively affect").Another system, CRITIQUE, formerly EPISTLE (qv) (98), similarly performs stylistic analyses but also diagnosesseveral classesof grammatical errors. For examPle, Your statement of deficiencieshaue not been completed. (should be has) AutomaticProgramming.The term automaticprogramming (qv) refers to the effort to relieve a computer user from the details neededto get the machine to perform a desiredtask. In certain ways attempts at a natural-language solution to this problem are similar to the work on problem-solving systems (STUDENT and CARPS) described earlier, although no specific techniques from these systems appear to have been employed. A survey of early NL work in automatic programming appearsin Heidorn (99). One early system,describedby Gelb (100),was designedto answer simple probabitity questions such as "What is the probability of getting two or more heads or exactly three tails when four coins are tossedonce?" Another early system is NLPQ (101),which producedsimulation programs in GPSS for a class of queueing problems.An example of an initial specificationof a user's problem follows: When a vehicle arrives at a station, it leaves there immediately if the ler'gth of the line at a pump in the station is not less than 2. Seventy-five percent of the vehicles are cars and a fourth are trucks. There is just one pump. A simulation run time of 8 h is desired. At this point the system would take control and begin asking questionsto clarify and extend what it had been told. For example, it would begin by asking "How often do the vehicles arriv e at the station," and the user might answer "The arrivals of vehicles are normally distributed with a mean of 8 minutes." Having acquired all requisite information, NLPQ would first produce an English statement of its understanding of the entire problem and then a completeprogram for the task. As reported in Green (102), the PSI system included modules for English input and output, along with many other components, in a large-scale automatic programming system. An excerpt from a sessionwith the system follows. "[The proglam] first inputs a concept.Then it repeatedly accepts an input scene from the user, classifies it based on whether or not it fits the concept,and outputs this classification to the user." WHAT IS THE STRUCTURE OF THE CONCEPT? "A concept is a collection of relations. Each relation is just a string." WHAT IS THE EXIT TEST OF THE LOOP .
COMPUTATIONATTINGUISTICS
Due to the complexities involved in program synthesis, PSI was tested only on problems related to sortirg, graph reachability, and determining prime number. Unlike the precedingwork, the NLC system (103) choseto regard an English input as a statement in a very-high-level program. The system acted as an interpreter, providing an interactive matrix-domain programming environment with visual feedbackas each English command was processed.An example input follows. "Subtract six from the next to last positive number in the row that was multiplied by 5." In addition to simple imperatives,NLC providedfor loops("repeat"), conditionals ("if . ."), and proceduredefinitions ("define a way to ."). An experiment study of programming with the system is describedin Biermann et al. (94), and an application of the system for college sophomore-levellinear algebra instruction is discussedin Geist et al. (104). ScientificText Processing.Basedon many years of work developing a comprehensivegrammar for English (10b),a group of researchers constructed a system intended "to allow the health care worker to create [a] medical report in the most natural way-in medical English, using whatever syntax is appropriate to the information (106,107).After gathering reports in English and converting them to a textual database form, the system could be intemogated as though it were a conventional databasesyst€ffi, again using English for inputs. Some examples of the types of inputs gathered in a clinical setting are the following:
141
and Hendrix (110), Hendrix and Lewis (111), Mark (LLz), Thompson and Thompson (68,69), Wilczynski (113), Warren and Pereira (LI4), Bates (115), Ginsparg (116), Grosz (LI7), Ballard et al. (118),and Grishman et al. (119).Also, several papers deriving from a recent workshop on transportability (120) have appeared, including Damerau (LzL), Hafner and Godden(L22),Marsh and Friedman (109),Slocum and Justus (L23), and Thompson and Thompson (I24). The Reemergence of MachineTranslation.As indicated earlier, the ALPAC report of 1966nearly eliminated U.S. government funding of projects in machine translation (MT). Naturally, this caused a marked decline in the amount of work being done in the area and in the number of paperspublished. Nevertheless,due in part to progressin AI and other areas of CL, a gradual resurgence of interest in MT has occurred over the past decade,and the field gives evidenceof becomingwell populated once again. A bibliography of about 480 publications since 1973 related to MT can be found in Slocum (125), along with summary papers on several of the major full-scale translation systems in existence.Papers from a recent conference on MT (126) are also available.
The Commerciafization of NLP. As indicated above,Harris's databasefront-end, RoBor, becamethe proprietary software of Artificial Intelligence Corporation in the late 1970s.Under the name INTELLECT, this system was for several years virtually the only natural-language product on the market. In the early 1980s,however,several well-known I\L researchers, including Gary Hendrix and Roger Schank, formed or became associatedwith start-up ventures. In recent years products X-rays taken 3-22-65reveal no evidenceof metastatic disease. from these and other companieshave been appearing for database and other applications [e.g., a developingexpert system Chest X-ray on 8-L2-69showedno metastatic disease. interface is discussedin Lehnert and Schwartz (127)1.More 3-2-65chest film shows clouding along left thorax and pleural recently researchersat Carnegie-Mellon University and other thickening. academic institutions have formed companies;Texas Instruments has produced a menu-basednatural-language-like inand an example question is "Did every patient have a chest X- terface; at least one project at BBN Laboratories is slated for ray in 1975?"A distinctive feature of the systemis its method commercial release; and other corporate flirtations are occurof creating so-calledinformation format structures, which are ring. In addition to databasequery, machine translation syssimilar to structured database records but capture the infor- tems are also being sold, and prospectsexist for additional mation initially supplied in textual form. Defining an informa- application areas. All of these activities, as well as an overtion format for a particular application involves, first, isolat- view of ongoing research into the theoretical and applied side ing word classesby syntactic properties,e.g.,the verbs reueal of CL, are reported in Johnson (128). and show are alike in taking X-ray as a subject, although Xray and film are alike in taki ng show as a verb, and, second, Theoreticallssues defining the columns of the table from the word classesso that any input sentence will have a paraphrase like that shown Having thus far conducted a chronological review of projects above (108). The system, which is unusual in containing as- within CL that relate more or less directly to specific applicapects of both database and information retrieval, was subse- tions, this section provides a brief overview of some of the quently adapted to the domain of navy messages,as described major theoretical topics associated with CL. The discussion in Marsh and Friedman (109). relates specificsystemsto the theoretical issues,but it primarily emphasizes theoretical techniques and formalisms that have contributed to the classification of research in CL. The CurrentTrends topics are parsing and grammatical formalisms, semantics, Domain-lndependentlmplementations.Several domain-in- discourseunderstanding, text generation, cognitive modeling, dependent database systems have already been mentioned, language acquisition, and speechunderstanding. Further inbut the intensity of effort at enhancing the transportability of formation is available in separatearticles. systemsfor this and other applicationsareas should be noted. In particular, a number of projects are seeking either to allow Parsingand GrammaticalFormalisms.Parsing (qv) issues users themselves to carcy out a customtzation or to have the have been of central interest in CL since its inception, when system adapt itself automatically to a user or a domain of cL included the study of formal languages and programming discourse.Representativeexamplesof this work include Haas languag€s, as well as natural languages. As the term is used
142
COMPUTATIONALTINCUISTICS
here, parsing refers to the processof assigning structural descriptions to an input string. Classically, parsers have used various forms of phrase structure grammars and have assigned phrase structure markers to produce derivation trees. Parsers with accessto semantic and pragmatic knowledge, however, may build semantic descriptionsdirectly without explicitly creating derivation trees. Directionof Analysis.Parsing strategies are often classified as top-down (or goal-driven) if they begin with the start symbol and backward chain from the consequentsof rules to their antecedents.Recursive-descentparsers, the PROLOG execution procedurefor definite-clausegrammars (DCGs)(L29),and the usual execution procedure for augmented transition network (ATN) grammars (39) all use top-down approaches(see Processing, bottom-up and top-down). Bottom-up (or datadriven) techniques proceed in a forward direction from the terminal symbols (words) in the grammar toward the start symbol. Left-corner (including shift-reduce) parsers (130), word-basedparsers(131)and its descendants(132),chart parsers (44,133-136),and deterministic parsers(137,138)are primarily bottom-up. Parsers can also be classifiedaccordingto how they analyze the input string: from left to right, from right to left, or from arbitrary positions in the middle outward. The left-to-right ordering is simple and natural and lends itself to easy bookkeeping. It is also of theoretical interest for parsers that attempt to model aspects of cognitive processes,such as attention focusing, that are dependent on temporal ordering. Middle-out, bottom-up parsers have been used particularly in speechsystems(139,140),where the parser can use its analysis in regions of greatest certainty to help in noisy or unintelligible regions, which would cause trouble for a rigid left-toright parser. Parsing techniques bear a close relationship to grammatical formalisms, although a particular grammar or class of grammars can sometimesbe parsed in a variety of ways. ATN grammars, for example, have been parsed by both top-down/ left-right and bottom-up/middle-out methods. Another technique for matching grammar and parser is to preprocess a grammar into an equivalent grammar suitable for a particular parsing method. Search and Nondeterminism. Controlling the search effort for a parse and handling nondeterminism are major problems for parsers. Many actual parsers use a blend of top-down and bottom-up techniques. As a simple example of this, almost every recursive descent (top-down) parser uses some kind of (bottom-up) scanner to identify the tokens in the input. Partof-speechclassifications in a lexicon are a form of bottom-up information. Another method for improving top-down parsing techniques is the use of a precomputedleft-branching reachability matrix that can be used to decidewhether the next input symbol can appear in the leftmost branch of a derivation tree headedby a particular nonterminal. Word expert parsing (qv) (132) uses an idiosyncratic combination of top-down expectations and bottom-up processing. The three principal methods for dealing with nondeterminism are backtracking, parallelism, and transforming the grammar so that a deterministic algorithm (perhaps using bounded look-ahead) can efficiently parse it. Backtracking parsers pursue one alternative at a choicepoint and return to select another alternative on failure of the first one. Forcing failure after a successful parse can cause the backtracking
parser to find additional parses. The standard execution procedure of PROLOG provides such a facility directly for logic grammars such as DCGs, which can be represented as PROLOG programs. Backtracking techniques are especially popular and natural for context-free grammar formalisms. Context-free grammars, which have a single consequentnonterminal, lend themselves to backward-chaining execution methods that work nicely in conjunction with backtracking. Parallel parsing methods keep track of multiple derivations at each point in the processing.The derivations can be developed concuruently using sequential algorithms and machines or truly in parallel if multiple processingresourcesare available. Interest in parallel approacheshas increasedas parallel hardware is becomingavailable; it has also beenbuoyedby the resurgence of connectionist and neural network research (141-143). Chart parsing (qv) algorithms use a particularly efficient way of recording which derivations have already been found to cover substrings of the input string. By splitting the computation at choice points, backtracking methods can also be parallelized. Another alternative, explored by Marcus (137),is factoring the rules in such a way that limited look-aheadis sufficient to resolve most of the nondeterminism. Although not all of English, for example, can be treated deterministically in this manner, the parser interestingly fails on many of the same "garden path" sentencesthat cause people trouble. Cases of Iexical and structural ambiguity that the parser cannot resolve are left for other modules. Word-basedparsing systems generally attempt to incorporate enough knowledge to determine a unique interpretation. When ambiguity cannot be resolved without look-ahead,two possibilitiespresentthemselves.One techniqueis to let a later constituent complete the interpretation; for example, verbs can be responsible for assigning the role of the subject noun phrase. Another solution is to spawn demons that check for the appearanceof disambiguating items. For example, sensespecificdemonsmight check for the presenceof particular particles to detect multiword verbs. Grammatical Formalisms.Many different grammatical formalisms have been used by natural-language-processingsystems. One of the earliest systems, the Harvard Syntactic Analyzer (I44), recognized context-free grammars. Transformational grammar (TG) theories had a direct influence on the Mitre (145) and Petrick (146) parsers and an indirect influence on many others. The UCLA glammar combined TG with case grammar theory (L47). In simplest terms, transformational grammars specify a set of (usually context-free) base phrase structure rules, I set of structure mapping rules, and various conditions, filters, or principles that generated structures must satisfy. Since TG is stated as a generative theory, parsers must try to guesswhich transformations must have been applied by effectively inverting the rules. This has proved to be quite difficult in practice. One of the most comprehensivecomputer grammars of English has been developedat the Linguistic String Project (105). The grammar consists of a set of 180 BNF phrase structure rules, 180 restriction rules that check feature conditions, string-transformation rules, and ellipsis rules. Additional sublanguage categorizations are added to the lexicon together with domain-specificrestriction rules to increase parsing efficiency. ATN grammars (39) have been very influential on compu-
COMPUTATIONALLINCUISTICS
tational approachesto language processing.They augment recursive transition-network grammars, which recognize context-free languages,with actions and tests that give them the recognition power of Turing machines. With suitable self-restraint, however, one can produce disciplined, well-structured grammars in many different grammar-writing styles. The LUNAR grammar (53) was a quite detailed, large grammar (see Grammar, ATN). Other augmented phrase structure formalisms include the DIAMOND grammars at SRI [e.g., DIAGRAM (148)l and the APSG (augmented phrase structure grammar) formalism used in the CRITIQUE system developed at IBM (149). The systemic grammar theory of Halliday (55) has been incorporated in many NLP systems,notably in Winograd's SHRDLU system (52) and in the large NIGEL grammar (150). More recent work in linguistics has revived interest in nontransformational theories of phrase structure grammars, particularly context-free grammars. These theories hold that notational augmentations to phrase structure grammars can express such difficult, "transformational" phenomena as movement nontransformationally and, furthermore, in most cases, that there exist equivalent context-free grammars. From a parsing perspective it will be most useful if the augmentations can be processedon the fly with little overhead above that required for context-free parsing. The augmentations include metarules, complex features, and principles of feature instantiation (151).The major theoretical frameworks include generalized phrase structure grammar (GPSG) (1b2), tree adjoining grammar (TAG) (153), head grammar (HG) (L54),lexical functional grammar (LFG) (155),and functional unification grammar (FUG) (156). Providing for Ungrammaticality. It has been observed that "while signficant progresshas been made in the processingof correct text, a practical system must be able to handle many forms of ill-formedness gracefully" (I57). When the ill-formedness in question is syntactic in origin and when the expected deviations can be grouped into a manageable number of classes,it is possible to prepare for "errors" by explicitly including extra rules in the system grammar so that a predictably deviant input is in fact treated as though it was grammatical. Due to the possible ambiguities that this practice introduces and to confront general situations where either the full range of errors cannot be predicted or the intended meaning cannot be recovered,more sophisticated mechanisms are called for. Attacks on the problem of ungrammaticality are represented by work described by Weischedel and Black (1b8), Hayes and Mouradian (159), Kwasny and Sondheimer(160), Jensenet al (161),Weischedeland Sondheimer(L6z),Granger (163), and Fink and Biermann (164). It is also worth noting that, for some applications, S&hmatical errors are part of the problem being addressedrather than a regrettable accident. Examples include the German language CAI system and the text-critiquing systems mentioned earlier. In addition, most AI work in speechunderstanding is fundamentally concerned with the rampant and perhaps inherent uncertainties associated with speech-recognitiondevices.These uncertainties actually make error conditions the rule rather than the exception. Semantics.Semantics concerns the study of meaning. In the context of CL, this most often relates to problems of finding
143
and representing the meaning of natural-language expressions. The previous discussionhas already touched on several approaches to semantics, including conceptual dependency, procedural semantics, and semantic networks. Some others are preference semantics and other decompositionalsystems, Montague semantics,and situation semantics.Details on each of these can be found in the entry on semantics. Among the more significant questions to be asked of an approachto semantics,at least insofar as its relevanceto CL is concerned,are what sorts of noncompositionalitydoesthe system involve and what role, if any, is played by primitiues. In essence,the idea behind compositional semantics is to determine the meaning of an entire unit under analysis (phrase, sentence,text) in a systematic (ideally simple) way from the meanings of its parts. This approachhas obvious advantages in terms of being tractable for incorporation into an automated scheme for language understanding. The idea behind primitives (qv) (in its strong sense) is to determine a finite set of terms that by themselvescan expressthe meaning of any word and, by implication, the meaning of any utterance. Of the semantic schemes discussed earlier, conceptual dependency (qv) adheresto this goal, where procedural semantics(qv) does not. Many interesting debates on these and other issues of semantics have occurred, as discussedby Jackendoff (165). In building entire NL systems, many designers have attempted to separatesyntax from semanticsby performing syntactic analysis first and then converting the resulting structure (producedby the parser) to a meaning representation (see Natural-language understanding). In other cases, however, the two processeshave been much more tightly integrated. whereas problems in cL related to syntax have largely involved issues also addressedby the fietd of conventional (as opposedto computational) linguistics, problems of semantics have typically concernedwork in philosophy. In the context of AI, important work related to natural-language semantics is to be found in the area of knowledge representation(qv). Discourse Understanding.Discourse understanding includes natural-language-processingphenomena that span individual sentences in multisentence texts or dialogue. The work in discourse understanding acknowledgesthat the syntactic and semantic representations of sentencesin discourse contextsrelate both explicitly (e.g.,by clue words such as now, but, anyu)ay) and impficitty i..g.l by world knowledge) to the representations of other sentencesin the discourse. As an example of how an am azing amount of complexity can enter into even simple interchanges, consider the loilowing brief dialogue: Q: Can you tell me where John is? A: oh, he was hungry for one of Joe'sp:zzas.He,ll be back soon. The petitioner's use of a yes-no question is an example of an indirect speechact.In an indirect speechact one illocltionary act is performed indirectly by way of performing another (1G6168). The yes-no question is interpreted ur u form of politeness instead of a more direct utterance such as "Whlre is John?" Grice (169) noticed that conversational participants follow cooperative principles that he subcategorizld as q,rurrtity (be informative), quality (be truthful), relation (be rele-
144
COMPUTATIONALLINGUISTICS
vant), and manner (be brief). The response in the example abovemeets Gricean notions of appropriateness,but it, too, is indirect in communicating both where John is and why he is there. To infer John's location from his state of hunger and desire requires a plan and goal analysis from pragmatic (extralinguistic) knowledge. The final part of the answer respondsto an inferred petitioner's goal of being copresentwith John by suggesting that he will be back soon. In cooperative conversations Grice noted that speakers causedlisteners to make certain inferences,which he termed conuersationalimplicatures. Hirschberg (170) has studied a class of implicatures called scalar implicatures. In the sentence "Somepeople left early," for example, the hearer may reasonablyconcludethat "not all peopleleft early." A cooperative responseoccasionallyrequires that faulty presuppositions in the question be corrected. For a database query such as "How many juniors failed CS 200?" an answer of "none" is misleading if there were no juniors enrolled. The CO-OP system performed this type of presupposition checking (73). Another type of cooperativeresponseinvolves informing the user of discontinuities (171). In a flight reservation databaseone might want to know of any flights leaving before noon. It or one the next or might be helpful to suggestone at 12:05p.rvr. previous day if none are otherwise available. A natural idea for discouseunderstanding was to extend some of the conceptsof grammars and schemasfrom sentence parsing to discourse.Conversation-related work includes the Susie Software system (L72,L73) and discourseATN grammars (174). In story understanding Rumelhart ( 175) and Coreira (176) developedthe idea of story grammars. Many of the language-understanding systems of Schank and his students use knowledge structures [such as scripts, plans, memory organi zation packets (MOPs), and thematic abstraction processes.Diunits (TAUs)l to guide discourse-understanding thegoal-centered a use to alog-games$77) were an attempt on work (178) integrated also has Litman ory for dialogues. planning and discourse. Focus is an important technical notion in discoursework that relates to the shifts in attention during comprehension. Focus influences many aspectsof language understanding, including choice of topic, syntactic ordering, and anaphoric reference.Grosz (179) did pioneering work on global focus, i.e., how attention shifts over a set of discourseutterances.Immediate focus represents how attention shifts over two consecutive sentences.Sidner (180) used focus to disambiguate definite anaphora by tracking three things: the immediate focus of the sentence,a potential focus list created from discourseentities in the sentence,and the past immediate foci in a focus stack. The resolution of anaphora is an important problem within discourseunderstanding. Early techniques principally used a simple history list of discourseentities combinedwith a heuristic method for selecting them (often a variation of the most recently encounteredentity satisfying the reference).The simple techniques are inadequate, largely because they fail to u..o,rnt for focuseffectsand becausediscoursereferents do not have to be explicitly mentioned (e.g.,the referent of he tn the sentence"I got stopped yesterday for speeding,but he didn't give me a ticket"). BesidesSidner'stechniquedescribedabove, [h.p are several other notable approaches(seeRef. 181 for a more detailed account). Other methods include concept acti-
vatedness(182),task-oriented dialogue techniques(179),logical representations(183),and discoursecohesion(184,185). Text Generation.Text generation is the processof translating internal representations into surface forms. The forms of internal representationshave included deepstructure, semantic networks, conceptual dependency graphs, and deduction trees. The strateglc componentof a generation system chooses what to say-the messageto be conveyedincluding any propositional attitudes. The tactical componentdetermines how to say it. The earliest systemsgenerated sentencesat random to test grammars (186,187).Later AI efforts used generation techniques as a part of paraphrase systems,which parsed input strings into meaning representationsand then generatedback out into surface representations.Klein (188) used dependency gxammars that generated a semantic dependencytree and a standard phrase structure derivation tree. Dependencytrees from multiple sentenceswere related by nominal coreference links. A generation grammar matched portions of the dependency trees. Simmons and Slocum (189) produced sentences from a semantic network using an ATN modified for generation. Eventually, a parser was added to fully automate the paraphraseprocess(49). Similarly, Heidorn (190)reported an algorithm based on an augmented phrase structure grammar for producing English noun phrasesto identify nodesin a semantic network. Goldman (191) used a discrimination net for conceptual dependencygraphs. The net tested the primitive action types and roles to select an appropriate surface verb. This generator was later used as a part of the MARGIE system
(1e2). The generation technique in SHRDLU (.52)is an example of the template-basedapproachthat has predominated in generation techniques.The program used several types of patterned responsesincluding completely canned phrasessuch as "ok," param etenzed phrases such as "sorry, I don't know the word ," and more complexparameterizationsthat involved of determiners, discoursephrases,and dictioi[ffiFtitution nary definitions. Smali programs were responsiblefor formatting the descriptionsof objectsand events. For example, the definition for the event PUTON was (APPEND (VBFIX (QUOTE PUT)) OBJ1 (QUOTE (ON)) OBJ2) A heuristic pronominal substitution mechanism improved the quality of the responses,allowing for the generation of noun phrases as complex as "the large green one that supports the pyramid." Although most generation systemsof the 1970sused techniques similar to SHRDLU, two different, important generation programs appearedin 1974. Davey's PROTEUS program (193) describedtic-tac-toegames.The program had a rich understanding of the tactics of the game and could provide natural summaruzationsat an appropriate, high level. An example is "I threatened you by taking the middle of the edgeopposite that and adjacent to the one I had just taken but you blocked it and threatened me." The ERMA program (194) embodied a cognitive model of human generation that mimicked the realtime false starts and patching of utterances. The model was developedby studying transcripts of psychoanalysissessions to d.etermine a patient's reasoning patterns. As an example,
COMPUTATIONATLINGUISTICS
the program generated "you know for some reason I just thought about the bill and payment" as a gentle way of beginning to argue that "you shouldn't give me a bill." Interest in generation work has revived in the l-980swith a number of new researchprojects.Mann et al. (195) provide a survey of text generation projects. Some of the major projects are as follows. The transformational grammar generation system of Bates and Ingria (91), a very syntactically powerful generaor, was used in a CAI application. McDonald's generator, MUMBLE (196), models spoken language and concentrates on the fluency and coverageof the tactical component. The KDS system (197) used a "fragment-and-compose"paradigm in which the knowledge structure is divided into small propositional units, which are then composedinto large textual units. Mann and Mattheissen (150)useda systemicgrammar (NIGEL) for the tactical component in a text generation system. In describing a system for generating stock market reports, Kukich (198) proposeda "knowledge-intensive"approach to sentence generation; similarly specialized techniques form the basis of the generator for the previously mentioned UC project (199). The KAMP system (200) views generation as a planning problem of proving what to say. In her TEXT system,McKeown (201) adaptedideas of text schemas and focus from discourse-understandingresearch to the task of answer generation in a natural-language database system.
145
methodology lthe "free form speculation approach to theory buitdin g" (202)1,of general attitudes inherited from linguistic theory, of the emphasisin CD systemson the I/O behavior of programs instead of formal computational models, and of the difficulty in discovering and representing conceptual knowledge structures. One problem that plagued the inferencer in MARGIE was how to control the potential inferences that could be made. Later CD-based systems made inferences organized from knowledge sources such as scripts (204,205),plans and goals (206), beliefs (207), episodic memory (qt) (208), and thematic abstraction units (209). Scripts provide prepackaged causal and temporal links for stereotypical situations. For less structured situations the links are created dynamically by a plan and goal analysis. Inferences are also affected by one'sbeliefs (e.g.,conservative/liberalpolitical beliefs)and memory of past events.Although many of the ideasof schematicinferenceand planning are being incorporated in recent work, the difficulty of identifying and integrating a wide range of semantic and pragmatic representations remains a difficult problem for AI and CL.
LanguageAcquisition. Computational language acquisition (qv) research subdivides in much the same way that AI research generally does.Some researchersattempt to automate the acquisition of linguistic expertise by any efficacious method; other work is explicitly aimed at cognitive modeling Cognitive Modeling. In the late 1960s at Stanford, Roger and tries to be faithful to the psycholinguistic data on lanSchank, while working on a parser for an automated psychia- guage acquisition. Most of the language-learning systemsare trist project with Kenneth Colby, developeda meaning repre- concernedprimarily with learning syntactic rules. New computational approaches to language acquisition sentation known as conceptualdependency(CD). Having been exposedto machine translation as a graduate student, Schank have generally followed developments in linguistics or natuwas convinced that more of the underlying meaning of sen- ral-language-processingtechniques. The ZBIE system (210) tences needed to be represented. In particular, certain infer- learned foreign language rules from input pairs consisting of a enceswere included in the CD graphs. The basic schemewas semantic representation and a surface string. For example, centeredon approximately a dozenprimitive action concepts. the representation (be (on table hat)) was paired with the senThe translation of "X hit Y," for example,was approximately tence "The table is on the hat." To the extent that the appro"X propelled someZ from X to Y which resulted in the state of priate syntactic structure of a sentence bears a particular Y and Z betng in physical contact." The first fairly complete relationship to the semantic structure, the semantic system, MARGIE, included a parser (conceptual analyzer),an representation can guide in the induction of syntactic rules. Anderson'sgraph deformation condition (211) is a statement inferencer, and a text generation system (192). In an interesting early retrospective of the CD paradigm, of this principle. Klein's AUTOLING program (212)derived a Schank offered this perspectiveon the situation that he faced transformational grammar in cooperation with a linguist informant. The derived grammars contained context-freephrase in the late 1960s (202): structure rules and transformations. Harris (2LB)produced a point Thus, my was that Chomsky was wrong in claiming that language-learning system for a simulated robot. The system we should not be attempting to build a point by point model of a performed lexicalization, the processof mapping words to conspeaker-hearer.Such a model was precisely what I felt should cepts,and the induction of a Chomsky normal-form grammar. be tackled. Linguists uiewed this as perforrrle,nceand thus un- Berwick (2L4) investigated learning transformational graminteresting. I took rny case to psychologists and found them mar rules of the sort embodied in a Marcus parser. Reeker (zlil explicitly modeled a child's acquisition of lanequally uninterested.Psychologistsinterestedin language were guage with a problem-solving theory. The grammar was repremostly psycholinguists, and psycholinguists for the most part sented by context-free syntactic rules paired with a semantic bought the assurnptions of transformational grarnma,r (although it seemeduery odd to rrle that giuen the competencel representation modeled after conceptualdependencynotation. performance distinction, psychologistsshould be on the side of The system received as input an "adult sentence" and its meaning. A heuristic reduction processformed a reduced sencompetence). tence, which was then compared against a "child sentence" Schank's emphasis on semantic representationswas sup- producedfrom the meaning by the child's current grammar. If ported by others [notably the work on preferencesemanticsby a difference in the derived sentenceswas obtained, the gramWilks (203)l but has been slow to make a large impact on mar *3. adjusted. The AMBER system (216) similarly compractical systems.Perhaps the slow acceptancewas a result of pares input sentences to internally generated sentencesto
146
COMPUTATIONALTINGUISTICS
identify discrepancies.The CHILD system (2L7) receives an BIBLTOGRAPHY adult sentenceand a conceptual dependencyrepresentation of visual input. The model builds lexical definitions similar to 1. W. Weaver, W. Locke and A. Booth (eds.),in Machine Translation of Languages,MIT Press,Cambridge, MA, pp. 15-23, 1955. those of other word-basedparsers. The psychologistJohn R. Anderson has made many contri2. W. Locke and A. Booth (eds.), Machine Translation of Languages, MIT Press,Cambridge,MA, 1955. butions to language acquisition research. His LAS system (zLL) acceptedsentence-scenedescription pairs and learned 3. A. Oettinger, Automatic Language Translation, Harvard University Press, Cambridge, MA, 1960. an ATN grammar that was used for both recognition and generation. The scenedescriptionswere encodedin the HAM asso4. Y. Bar-Hillel, The Present Status of Automatic Translation of Vol. 1, AcaLanguag€s,in F. Alt (ed.), Aduancesin Compu,ters, ciative network representation (218).Following this work, he pp. 1960. 102-103, York, New demic Press, developeda series of cognitive models and learning theories 5. National ResearchCouncil, Language and Machines: Computers basedon a hybrid architecture, called ACT (adaptive control of in Translation and Linguistics, Report by the Automated Lanthought). An elaborate version of the model, ACT- (2I9), uses guage ProcessingAdvisory Committee (ALPAC), National Acada production system to control spreading activation processes of Sciences,Washington, DC, p. 19, 1966. emy in a semantic network. Anderson has studied the learning of 6 . N. Chomsky, Syntactic Structures, Mouton, The Hague, 1957. production rules for language generation, which is viewed as a 7 . Reference6, p. 34. problem-solving activity in ACT-. SpeechUnderstanding.The problem of understanding spoken natural langu age involves virtually all of the issues discussedabove as well as others of its own (see Speechunderstanding). FurtherReading In addition to the many referencesalready cited and the discussions and references in related articles, Feigenbaum and Feldman (220) and Minsky (221) contain descriptionsof, and Simmons (17,34) discusses,early work in natural-language processing;Rustin Q22) and Zampolli (22$ consider the status of several question-answering systemsof the early to middle 1970s;Kaplan (224) contains brief summaries of several dozen projects underway in the early 1980s;and the brief articles in Johnson and Bachenko Q25) give prospectsfor work in several areas of CL. Tennant (173) provides a fairly broad introduction to natural-language processing and contains technical details and historical remarks, as do the articles in Barr and Feigenbaum (226) and Lehnert and Ringle Q27). Grishman Q28) provides a general introduction to technical problems in the field; matters of parsing and grammatical formalisms are discussedin King (229), Winograd (230), Sparck Jones and Wilks (23L), and Dowty et al. (232); an interesting discussionof cognitive approachesto semantics is Jackendoff (165); Brady and Berwick (233) contains papers on discourse. Schank and Riesbeck (234) and Simmons (235) present the actual mechanismsby which specificprocessorshave beenconstructed. Harris (236) has written a recent textbook on natural-Ianguage processing (see Natural-Ianguage understanding). Many articles have appearedin conferenceproceedings,including the annual meeting of the ACL, the biennial International Conference on Computational Linguistics (COLING), conferencessponsoredby the American Association for Artificial Intelligence (AAAI), the biennial International Joint Conference on AI (IJCAI), a Conferenceon Applied Natural Language Processing,and two conferenceson Theoretical Issues in Natural Language Processing.A primary journalrs Compu' tational Linguistics (formerly the American Journal of Com' putational Linguistics), and other important journals include Artifi,cial Intelligence, the Canadian Journal of Artificial Intel' ligence,and Cognitiue Science.
8 . N. Chomsky, Aspectsof the Theory of Syntar, MIT Press, Cambridge, MA, 1965. 9. G. Salton, Automatic Information Organization and Retrieual, McGraw-Hill, New York, 1968. 10. J. Becker and R. Hayes,Information Storageand Retrieual Tools, Elements, Theories,Wiley, New York, 1963. 11. D. Hays (ed.;, Readings in Automatic Language Processing, American Elsevier, New York, 1966. L2. K. Sparck Jones and M. Kay, Linguisfics and Information Sclence,AcademicPress,London, 1973. 13. B. Raphael, Hewlett-Packard, personal communication, July 1983. L4. D. Bobrow, Natural Language Input for a Computer ProblemSolving System, in M. Miusky (ed.), Semantic Information Processing,MIT Press,Cambridge,MA, pp. 133-2L5, 1968. 15. ReferenceL4, p. 146. 16. V. Giuliano, "Commentson the article by Simmons,"CACM 8(1) p. 69, (1965). 17. R. Simmons, "Answering English questionsby computer: A survey," CACM 8(1), 53 (1965). 18. ReferenceL7, p. 70. 19. B. Greetr,A.WoIf, C. Ohomsky,and K. Laughery, BASEBALL: An Automatic Question Answerer, in E. Feigenbaum and J. Feldman Computers and Thought, McGraw-Hill, New York, 1963. 20. R. Lindssy, Inferential Memory as the Basis of Machines which Understand Natural LanguaE€, in E. Feigenbaum and J. Feldman (eds.), Computers and Thoughf, McGraw-Hill, New York, p . 2 2 L ,1 9 6 3 . 2L. B. Raphael, SIR, a Computer Program for Semantic Information Retrieval, in M. Minsky (ed.),Semantic Information Processing, MIT Press,Cambridgu,MA, P. 33, 1968. 22. J. Craig, S. Berezner, C. Homer, and C. Longyear, DEACON: Direct English Accessand Control. AFIPS 1966Fall Joint Computer Conference,p. 366. 23. Reference22, p. 376. 24. F. Thompson,P. Lockemann,B. Dostert, and R. Deverill, REL: A Rapidly Extensible Language System, ACM National Conference,p. 400, 1969. 25. Reference24, p. 404. ZG. C. Kellogg, A Natural Language Compiler for On-line Data Management, AFIPS 1968Fall Joint Computer Conference,pp. 473-492. 27. Reference14, p. 204.
COMPUTATIONALLINCUISTICS ZB. E. Charniak, Computer Solution of Calculus Word Problems, Proceedingsof the First International Joint Conferenceon Artifi' cial Inteltigence,Washington, DC, pp. 303-316, 1969. 29. Reference28, p. 305. 30. Reference28, p. 309. 31. J. Weizenbaum, "ELIZA: A computer program for the study of natural language communication between man and machine," CACM e(1) 36-45 (1966). 32. J. Weizenbaum, Computer Power a.nd Human Reason, W. H. Freeman, San Francisco, CA, 1976. 33. K. Colby, S. Weber, and F. Hilf, "Artificial paranoia," Artif. Intell. 2, L-25 (1971). 34. R. Simmons, "Natural language question answering systems: 1969,"}ACM 13(1) 15-30 (1970). 85. R. Simmons and D. Londe, NAMER: A Pattern Recognition System for Generating Sentencesabout Relationships between Line Drawings, Report TM-1798, System Development Corp., Santa Monica, CA, 1964. 36. R. Kirsch, Computer Interpretation of English Text and Picture Patterns, IEEE Trans. Electron. Comput., L3,363-376 (1964). 37. J. Thorne, P. Bratley, and H. Dewar, The Syntactic Analysis of English by Machine, in D. Mitchie (ed.), Machine Intelligence, Vol. 3, American Elsevier, New York, pp. 281-299, 1968. 38. D. Bobrow and B. Fraser, An Augmented State Transition Network Analysis Procedure. Proceedingsof the First Internq,tional Joint Conferenceon Artifi,cial Intelligence, Washington, DC, pp. 557-567, 1969. 39. W. Woods,"Transition network grammars for natural language analysis,"CACM 13,591-606 (October1970). 40. C. Fillmore, The Casefor Case,in E. Bach and R. Harms (eds.), (Jniuersals in Linguistic Theory, Holt, Rinehart and Winston, New York, pp. 1-90, 1968. 4L. B. Bruce, "Case systemsfor natural languagQ,"Artif. Intell.6, 327-360 (1e75). 42. R. Schank and L. Tesler, A Conceptual Parser for Natural Language, International Joint Conferenceon Artificial Intelligence, pp. 569-578, 1969. 43. D. Hays, "Dependency theory: A formalism and some observations," Language 40, 5fi-524 (1964). 44. M. Kay, Experiments with a Powerful Parser,Proceedingsof the SecondInternational Conferenceon Computational Linguistics, Grenoble,August 1967. 45. S. Lamb, "The semantic approachto structural semantics,"Am. Anthropol. (1964). 46. R. Schank, "Conceptual dependency:A theory of natural language understanding i' Cog. Psychol. 3, 552-63L (L972). 47. W. Woods,Procedural Semanticsfor a Question-AnsweringSystem, AFIPS 1968Fall Joint Computer Conference,pp. 457-47t. 48. M. Quillian, Semantic Memory, in M. Minsky (ed.), Semantic Information Processing, MIT Press, Cambridge, MA, pp. 2t6270,1968. 49. R. Simmons, Semantic Networks: Their Computation and Use for Understanding English Sentences,in R. Schank and K. Colby (eds.),Computer Models of Thought and Languag€, W. H. Freeman, San Francisco,CA, pp. 63-113, 1973. 50. N. Findler (ed.), AssociatiueNetworks: Representationand Useof Knowledge in Computers,Academic Press, New York, 1979. 51. J. Sowa, ConceptualStructures: Information Processingin Mind and Machine, Addison-Wesley,Reading, MA, 1984. 52. T. Winograd, Understanding Natural Language, Academic Press,New York, 1972. 53. W. Woods,R. Kaplan, and B. Nash-Webber,The Lunar Sciences
147
Natural Language Information System: Final Report, Report 2378,Bolt Beranek and Newman, cambridge, MA, L972. 54. W. Woods, Lunar Rocks in English: Explorations in Natural Language Question Answeritg, in A. Zampolli (ed.),Linguistic StructuresP rocessing,North-Holland, Amsterdam, pp. 521-569, t977 . 55. M. Halliday, "Categories of the theory of grammar," Word t7, 24r-292 (1961). 56. T. Winograd, Frame Representationsand the Declarative-Procedural Controversy, in D. Bobrow and A. Collins (eds.),Representation and (Jnderstanding,Academic Press,New York, pp. 1852L0, L975. 57. Y. Wilks, Natural Language Understanding Programs Within the A.I. Paradigm: A Survey and SomeComparisons,in A. Zam' polli (ed.),Linguistic Structures Processiog,North-Holland, Amsterdam,pp. 341-398, L977. 58. S. Petrick, On Natural-Language Based Computer Systems,in A. Zampolli (ed.), Linguistic Structures Processing,North-Holland, Amsterdam, pp. 313-340, 1975. Also appears in IBM J. Res.Deu. 20(4), 3L4_325(1976). 59. E. Codd, R. Arnold, J. Cadiou, C. Chang, and N. Roussopoulos, SevenStepsto RENDEZVOUS with the Casual Llser, in J. Kimbie and K. Koffeman (eds.),Data Base Management,North-Hol1974. land, pp. 1-79-200, 60. W. Plath, "REQUEST: A natural language question-answering system,"IBM J. Res.Deu.2O(4),326-335(1976). 61. S. Petrick, Transformational Analysis, in R. Rustin (ed.;,Natural Langaage Processing,Algorithmics, New York, pp. 27-4L, 19?3. 62. F. Damerau, "Operating Statistics for the Transformational Question Answering Systeml' Am, J. Computat.Ling. 7(1), 3044 (1981). 63. Hendrix, G. E. Sacerdoti,D. Sagalowicz,and J. Slocum, "Developing a natural language interface to complex data," ACM Trans. DatabaseSys. 3(2), 105-147 (1978). 64. Hendrix, G. Human engineering for applied natural language processing. Proc. of the Fifth Int. J. Conf. on Artifi.cial Intelligence,Cambridge,MA, 1977,pp. 183-191. 65. R. Hershman, R. Kelley, and H. Miller, IJser performancewith a natural language query system for command control. Tech. Report TR 79-7, Navy Personnel Researchand Development Center, San Diego, Ca., 1979. J. Tsotsos, 66. J. Mylopoulos,A. Borgida, P. Cohen, Roussopoulos, and H. Wong, TORUS: A Natural Language Understanding System for Data Management, Proc. of the Fourth IJCAI, Tbilisi, Georgia,pp. 414-421, 1975. 67. F. Thompsonand B. Thompson,Practical Natural Language Processing:The REL System as Prototype, in M. Rubinoff and M. Yovits (eds.),Aduancesin Computers,Vol. 3., Academic Press, New York, pp. 109-168, t975. 68. F. Thompson and B. Thompson, Shifting to a Higher Gear in a Natural Language System, National Computer Conference,pp. 657-662,1991. 69. B. Thompson and F. Thompson, Introducing ASK, a Simple Knowledgeable System, Conferenceon Applied Natural Langua,geProcessing,Santa Monica, CA, pp. 17-24, L983. 70. B. Thompson, Linguistic Analysis of Natural Language Communication with Computers, Proceedings of the Eighth International Conferenceon Computational Linguistics, Tokyo, pp. 19020L, 1990. 7L. M. Templeton, EUFID: A Friendly and Flexible Frontend for Data Management Systems,Proceedingsof the SeuenteenthAnnual Meeting of the ACL, pp. 91-93, 1979.
148
COMPUTATIONALLINGUISTICS
72. M. Templeton and J. Burger, Proglems in Natural-Language Interface to DBMS with Examples from EUFID , Conferenceon AppliedNatural LanguageProcessing,Santa Monica, CA, pp. 3-16, 1983. 73. S. Kaplan, Indirect Responsesto LoadedQuestions,Theoretical Issues in Natural Language Processing, Vol. 2, pp. 202-209, 1978. 74. D. Waltz,"AnEnglish languagequestionansweringsystemfor a large relational database,"CACM 2L(7), 526-539 (1978). 75. T. Finitr, B. Goodman,and H. Tennant, JETS: Achieving Completeness through Coverage and Closure, Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo,Japan, pp.275-281, 1979. 76. H. Tennant, Experience with the Evaluation of Natural Language Question Answerers, Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo, Japan, pp. 874-876, 1979. 77. H. Lehmann, "Interpretation of natural language in an information system,"IBM J. Res.Deu. 22(5),560-571(1978). 78. Reference77, p. 560. 79. J. Krause, Results of User Study with the User Specialty Language System and Consequencesfor the Architecture of Natural Language Interfaces, Technical Report 79.04.003,IBM Heidleberg Scientific Center, t979, 80. W. Bronnenberg, S. Landsbergen, R. Scha, W. Schoenmakers, and E. van Utteren, "PHLIQA-I, d question-answeringsystem for data-base consultation in natural English," Philips Tech. R eu. 38 229-239, 269-284 (L978- 1979). 81. Reference80, p. 230. 82. W. Hoeppner,T. Christaller, H. Marburger, K. Morik, B. Nebel, M. O'Leary, and W. Wahlster, "Beyond domain-independencQ," Proceedingsof the Eighth Int. J. Conf. on AI, Karlsruhe, FRG, PP. 588-594, 1983. 83. L. Harris, "User-oriented data basequery with the Robot natural languagesystem,"Int. J. Man-Mach. 9tud.9,697-713 (1977). 84. L. Harris, "The ROBOT system: natural language processing applied to data base query," ACM Natl. Conf. L65-L72 (1979. 85. S. Salveter, Natural Language DatabaseUpdates,Proceedingsof the NineteenthAnnual Meeting of the ACL, Unversity of Toronto, pp. 67-73, 1982. 86. J. Davidson and S. Kaplan, "Natural language accessto data bases: Interpreting update requests," Am, J. Computat. Ling. g(2),57-68 (1983). 87. J. Carbonell, "AI in CAI: An artificial intelligence approach to computer-assistedinstruction," IEEE Trans. Man-Mach. Sys. 11, 190-202 (1970). 88. A. Collins, E. Warnock, N. Aiello, and R. Miller, Reasoningfrom IncompleteKnowledge,in D. Bobrow and A. Collins (eds.),Representationand Understanding, Academic Press, New York, pp. 383-415, 1975. 89. J. Brown and R. Burton, Multiple Representationsof Knowledge for Tutorial Reasonirg, in D. Bobrow and A. Collins (eds.),Representationand (Jnderstanding,Academic Press, New York, pp. 312-313,1975. 90. R. Weischedel,W. Voge, and M. James, "An artificial intelligence approach to language instruction ," Artif. Intell. lO, 225240 (1978). 91. M. Bates and R. Ingria, Controlled Transformational Sentence Generation, Proceedngsof the Nineteenth Annual Meeting of the ACL, Stanford University, pp. 153-158, 1981' 92. G. Heidorn, Natural Language Dialogue for Managing an Online Calendar, ProceedingEof the Annual Meeting of the ACM, Washington, DC, pp. 45-52, L978. 93. D. Bobrow, R. Kaplan, M. Kay, D. Norman' H. Thompson'and
T. Winograd, "GUS: A frame-driven dialog system,"Artif. Intell. 8(2), 155-173 (1977). 94. A. Biermann, B. Ballard, and A. Sigmon, "An experimental study of natural language programmrng," Int. J. Man-Mach. stud. 18(1),7L-87 (1983). 95. J. Allen, A. Frisch, and D. Litman, ARGOT: The RochesterDialogue System, Proceedingsof the SecondNational Conferenceon Artificial Intelligence, Carnegie-Mellon University and University of Pittsburgh, Pittsburgh, PA, pp. 66-70, 1982. 96. R. Wilensky, Talking to UNIX in English: An Overview of UC, Proceedingsof the SecondAnnual Conferenceon Artificial Intelligence,Pittsburgh, PA, pp. 103-105, 1982. 97. N. MacDonald,L. Frase,P. Gingrich, and S. Keenan,"The Writer's Workbench: Computer aids for text analysis," IEEE Trq,ns. Commun..30, 105-110 (January 1982). 98. G. Heidorn, K. Jensen,L. Miller, R. Byrd and M. Chodorow,"The EPISTLE text-critiquing system," IBM Sys. J. 2L(3), 305-326 ( 1982). 99. G. Heidorn, "Automatic programming through natural language dialogue:A survey,"IBM J. Res.Deu.2O(4),302-313(1976). 100. J. P. Gelb, Experience with a Natural Language Problem-Solving System, Proceedingsof the SecondInternational Joint Conferenceon Artificial Intelligence, London, pp. 455-462, I97L. 101. G. Heidorn, Natural Language Inputs to a Simulation Programming System, Ph.D. Dissertation, Technical Report NPS55HD?2101A,Naval PostgraduateSchool,Monterey, CA, t972. I02. C. Green, A Summary of the PSI Program Synthesis System, Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence,Cambridge,MA, pp. 380-381, L977. 103. A. Biermann and B. Ballard, "Toward natural language computation," Am. J. Computat.Ling. 6(2),71-86 (1980). 104. R. Geist, D. Kraines, and P. Fink, Natural Language Computation in a Linear Algebra Course, Proceedingsof the National Educational Computer Conference,pp. 203-208, 1982. 105. N. Sager, Natural Language Information Processing:A Computer Grammar of English and Its Applica,tions,Addison-Weslty, Reading,MA, 1981. 106. L. Hirschman, R. Grishman, and N. Sager, From Text to Structured Information: Automatic Processing of Medical Reports, Proceedings of the AFIPS National Computer Conferencepp. 267-275, L976. 107. R. Grishman and L. Hirschman, "Question answering from natural language medical data bases,"Artif . Intell. 7, 25-43 (1978). 108. N. Sager, Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base, in M. Yovits (ed.),Aduancesin Compnters,Vol. L7, AcademicPress, New York, pp. 89-t62, 1978. 109. E. Marsh and C. Friedman, "Transporting the linguistic string project system from a medical to a Navy domain," ACM Trans. Ofc. Inform. Sys. 3(2), L2L-t40 (1985). 110. N. Haas and G. Hendrix, An Approach to Acquiring and Applying Knowledge, Proceedingsof the First National Conferenceon Artifi.cial Intelligence, Stanford University, Stanford, CA, pp. 235-239, 1980. 111. G. Hendrix and W. Lewis, Transportable Natural-Language Interfaces to Databases, Proceedngs of the Itlineteenth Annual Meeting of the ACL, Stanford University, pp. 159-165, 1981. 11Lz.W. Mark, Representation and Inference in the Consul System, Proceed,ingsof the SeuenthInternational Joint Conferenceon Artificiat Intelligence,Vancouver,BC, pp. 375-381, 1981. 113. D. Wilczynski, Knowledge Acquisition in the Consul System, Proceedingsof the SeuenthInternational Joint Conferenceon Artifi,ciatIntettigence,vancouver, BC, pp. 135-L40,1981. LL4. D. Warren and F. Pereira, "An efficient easily adaptable system
COMPUTATIONALLINGUISTICS for interpreting natural language queri€s," Am. J. Computat. Ling. 8(3-4), 110- L22 (1982). 1lb. M. Bates, Information Retrieval Using a Transportable Natural Language Interface, Proceedings of the International ACM SIGIR Conference,Bethesda,MD, pp. 81-86, 1983' 116. J. Ginsparg, A Robust Portable Natural Language Data Base Interfa ce,Proceedingsof tlte Conferenceon Applied Natural Lan' guqge Processing,Santa Monica, CA, pp. 25-30 (1983).
149
138. R. Milne, "Resolving Lexical ambiguity in a deterministic parser," Computat.Ling. Lz(L), I-I2 (1986). 139. W. Woods, "Optimal Search Strategies for SpeechUnderstanding Control," Artificial Intelligence L8, 295-326 (1982).
140. L. Erman, F. Hayes-Roth, V. Lesser and D. Reddy, "The Hearsay-Il speech understanding system," Computing Surueys L2, 2L3-253 (1980). 141. G. Cottrell, A Model of Lexical Accessof Ambiguous Words,Proceedingsof the Fourth Conferenceof the AAAI, Austin, TX, pp. 117. B. Grosz, TEAM: A Transportable Natural Language Interface Pro' 6L-67 , August 1984. Language Natural System, Proceedingsof the on Applied cessing,Santa Monica, CA, pp. 39-45' 1983. L42. M. Jones and A. Driscoll, Movement in Active Production Networks, Proceedingsof the Twenty-Third Annual Meeting of the 118. B. Ballard, J. Lusth, and N. Tinkham, "LDC-l: A transportable, Association for Computational Linguistics, pp. 161-166, July processor environoffice for knowledge-based natural language 1985. ments," ACM Trans. Ofc. Inf. Sys.2(1), L-25 (1984). 119. R. Grishman, N. Nhatr, E. Marsh, and L. Hirschman, Automated L43. D. Waltz and J. Pollack, "Massively parallel parsingi' Cog. Scl. 9 ( 1 ) ,5 L - 74 ( 1 9 8 5 ) . Determination of Sublanguage Syntactic Us&ge,Proceedingsof the International Conferenceon Computational Linguistics, Stan- I44. S. Kuno, "The predictive analyzer and a path elimination techford, pp. 96-98, July 1984. , nique," CACM I 453-462 (1965). L20. B. Ballard (ed.), "special issue on transportable natural lanL45. A. Zwicky, J. Friedman, B. Hall, and D. Walker, The MITRE guage processirg," ACM Trans. Ofc. Irf. sys. 3(2) 104-230 Syntactic Analysis Procedure for Transformational Grammars, (1985). IFIPS Proceedings Fall Joint' Computer Conference,Spartan, Washington,DC, pp. 317-326, 1965. LzI. F. Damerau, "Problems and some solutions in customization of natural language database front ends," ACM Trans. Ofc. Inf. t46. S. Petrick, A Recognition Procedure for Transformational GramSys.3(2), 165-184 (1985). mars, Ph.D. Dissertation, MIT, Cambridge,MA, 1965. t4z. C. Hafner and K. Godden,"Portability of syntax and semantics L47. R. Stockwell, P. Schachter, and B. Partee, The Major Syntactic in DATALOG," ACM Trans. Ofc. Inf. Syg 3(2), 141-164 (1985). Structures of English, Holt, Rinehart and Winston, New York, 1973. L23. J. Slocum and C. Justus, "Transportability to other languages," ACM Trans. Oft. Inf.Sys. 3(2), 204-230 (1985). 148. J. Robinson, "DIAGRAM: A grammar for dialogu€s," CACM 25(I), 27 -47 (January L982). L24. B. Thompson and F. Thompson,"ASK is transportable in half a (1985). 185-203 3(2), Inf. Sys. Ofc. Trans. dozenways," ACM I49. G. Heidorn, Augmented Phrase Structure Grammars, in B. Webber and R. Schank (eds.), TheoreticalIssuesin Natural Language L25. J. Slocum (ed.),"special issueson machine translation," CompuProcessing,Cambridg", MA, pp. 1-5, L975. tat. Ling. ll(2-4) (1985). 150. W. Mann and C. Mattheissen, Nigel: a Systemic Grammar for L26. S. Nirenburg, (ed.),Machine Translation: Theoreticaland Meth' Text Generation, in Freedle (ed.),SystemicPerspectiueson Disodological Issues,Cambridge University Press,New York, 1987. course: Selected Theoretical Papers of the Ninth International Language A Natural L27. W. Lehnert and S. Schwartz, EXPLORER: Systemic Workshop,Ablex, Norwood, NJ, 1985. ProcessingSystem for Oil Exploration, Proceedingsof the Con' Monica, Santa Processing, 151. M. Kay, Functional Grammar, Proceedingsof the Fifth, Annual ferenceon Applied Natural Language Meeting of the Berkeley Linguistic Society,pp. L42-158, 1979. CA, pp. 69-72, L983. L52. G. Gazdar, Phrase Structure Grammar, in P. Jacobsonand G. L28. T. Johnson, Natural Language Computing: the Commercial ApPullum (eds.),The Nature of SyntacticRepresentation,D. Reidel, plicatiorzs,Ovum, London, 1985. Dordrecht,pp. 131-186, 1982. grammars for clause "Definite t1g. F. Pereira and D. H. D. Warren, language analysis: A survey of the formalism and a comparison 153. A. Joshi, How Much Context-sensitivity is Required to Provide ReasonableStructural Descriptions: Tree Adjoining Grammars, with Augmented Transition Networks," Artif. Intell. 13, 23tin D. Dowty, L. Karttunen, and A. Zwtcky (eds.),Natural Lan278 (1980). g uage P rocessing : P sycholing uistic, Computational and T heoreti130. D. Chester, "A parsing algorithm that extends phrases," Am. J. cal Properties, Cambridge University Press, New York, 1984. Computat.Ling. 6(2), 87-96 (1980). 181. C. Riesbeck,ConceptualAnalysis, in R. Schank (ed.), Conceptual I54. E. Proudian and C. Pollard, Parsing Head-Driven Phrase Structure Gramm ar, Proceedngsof the Twenty -Third Annual Meeting I nforrnationP rocessing,North-Holland, Amsterdam, pp. 83- 156, of the Association for Computational Linguistics, pp. 8-12, July L975. 1985. (A L32. S. Small, Parsing and Comprehending with Word Experts Theory and Its Realization),in W. Lehnert and M. Ringle (eds.), 155. J. Bresnan and R. Kaplan, Lexical-FunctionalGrammar: A Formal System for Grammatical Representation,in J. Bresnan (ed.), Strategiesfor Natural Language Processing,Lawrence Erlbaum, The Mental Representationof Grammatical Relatio,rzs,MIT Press, Hillsdale, NJ, L982. Cambridg", MA, L982. 133. D. Younger, "Recognition and parsing of context-freelanguages 156. M. Kay, Functional Unification Grammar: A Formalism for Main time n3," Inf. Ctr., 10, 129-208 (1967). chine Translation, Proceedingsof Coling 84,Menlo Park, pp. 75parsing algorithm," CACM L34. J. Earley, "An efficient context-free 78, L984. l3(2), 94-102 (February 1970). L57 . Allen (ed.;,"Specialissueon ill-formed input," Am. J. CompuJ. 135. R. Kaplan, A General Syntactic Processor,Algorithmics, New tat. Ling. 9(3-4), L23-196 1983. York, 1973. 158. R. Weischedel and J. Black, "Responding intelligently to un136. W. Ruzzo, S. Graham, and M. Harrison, "An improved contextparasableinputs," Am J. Computat.Ling. 6(2),97-109 (1980). free recognizer," ACM Trans. Program. Lang. Sys. 3, 4L5-562 159. P. Hayes and G. Mouradian, "Flexible parsirg," Am. J. Compu(July 1980). tat. Ling. 7(4),232-242 (1981). I37. M. Marcus, A Theory of Syntactic Recognition for Natural Lan160. S. Kwasny and N. Sondheimer,"Relaxation techniquesfor parsgua.ge,MIT Press, Cambridge, MA, 1980.
150
COMPUTATIONALLINGUISTICS ing ill-formed input," Am. J. Computat. Ling. 7(2), 99-108 (1981).
161. K. Jensen, G. Heidorn, L. Miller, and Y. Ravin, "Parse fitting and prose fixing: Getting a hold on ill-formedness,"Am . J. Computat. Ling. 9(3-4), I47 -160 (1983). L62. R. Weischedel and N. Sondheimer, "Meta-rules as a basis for processingill-formed output," Am. J. Computat.Ling. 9(3-4), 1 6 1 - 1 7 7( 1 9 8 3 ) . 163. R. Granger, "the NOMAD system: Expectation-baseddetection and correction of errors during understanding of syntactically and semantically ill-formed texti' Am. J. Computat.Ling.9(34), 188-196 (1983). 164. P. Fink and A. Biermann, "Correction of ill-formed input using history-based expectation with applications to speech understanding,"Computat.Ling. 12(1),13-36 (1986).
185. A. Lockman, Contextual ReferenceResolution, Ph.D. Dissertation, Columbia University, May 1978. 186. V. Yngve, Random Generation of English Sentences,Proceedings of the International Conferenceon Machine Translation of Languages and Applied Language Analysis, National Physical Laboratory, Symposium No. 13, Her Majesty's Stationery Office, London,pp. 66-80, L962. 187. J. Friedman, "Directed random generation of sentences,"CACM 12(1),40-46 (1969). 188. S. Klein, "Automatic paraphrasing in essay format," Mechan. Transl. 8(3), 68-83 (1965). 189. R. Simmons and J. Slocum, "Generating English discoursefrom semantic networks," CACM 15(10),891-905 (L972).
190. G. Heidorn, Generating Noun Phrasesto Identify Noun Phrases in a Semantic Network, Proceedings of the Fifth International Joint Conferenceon Artifi,cial Intelligence, Cambridge, Mass., p. 165. R. Jackendoff, Semanticsand Cognition, MIT Press,Cambridge, r43, 1977. MA, 1983. 166. J. Searle,Indirect SpeechActs, in P. Morgan and J. Cole (eds.), 191. N. Goldman, Conceptual Generation. in R. Schank, Conceptual Information Processing, North-Holland, Amsterdam, pp. 289Syntax and Semantics,Vol .3, SpeechActs,Academic Press,New 371 1975. York, pp. 59-82, 1975. lg2. R. Schank, Conceptual Information Processing,with contribuL67. P. Cohen and C. Perrault, "Elements of a plan-basedtheory of tions from N. Goldman, C. Rieger, and C. Riesbeck,Vol. 3 of speechacts," Cog.Sci 3, I77-2L2 (L979). Studies in Computer Science,North-Holland, AmFundamental in utterances," intention Perrault, "Analyzing C. and 168. J. Allen 1975. sterdam, Artif. Intell. 15(3), L43-L78 (1980). Production, Edinburgh University Press, 169. H. Grice, Logic and Conversation, in P. Morgan and J. Cole 193. A. Davey, Discourse Edinburgh, L979. (eds.), Syntax and Semantics, Vol. 3, Speech Acts, Academic Lg4. J. Clippinger, "speaking with many tongues: Some problems in Press,New York, PP.41-58, 1975. modeling speakers of actual discourse," TINLAP-I, 68-73 170. J . Hirschb erg, "Toward a Redefinition of Yes/No Questiors," (1975). Stanford, Proc. of Tenth Int. Conf, on Computational Linguistics, 195. W. Mantr, M. Bates, B. Grosz,D.McDonald, K. McKeown' and CA, pp. 48-51, 1984. W. Swartout, "Text Generation: The state of the art and literaI7I. L. Siklossy,Question-AskingQuestion-Answering,Department ture," JACL 8,2 (1982). of Computer ScienceReport TR-71, University of Texas, Austin, McDonald, Natural Language Generation as a Computational D. 196. t977. TX, An Introduction, in M. Brady and R. Berwick (eds.), Problem: CamPress, MIT Dialog, Processing Framework A Brow1, L7Z. G. for Mod,etsof Discotrrse,MIT Press,Cambridgu,MA, Computational bridge, MA, June L977. pp. 209-265,1983. L7g. H. Tennant, Natural LanguageProcessing,Petrocelli, New York, W. Mann and J. Moore, "Computer generation of multiparatg1. 1981. graph English text," Am J. Computat.Ling.7, L7-29 (1981)' t74. R. Reichman, "Extended person-machineinterface," Artif. Intell. 1gg. K. Kukich, Design of a Knowledge-BasedReportGeneratot,Pro22(2), L57-218 (March 1984). ceedings of the Twentieth Annual Meeting of the ACL, CamL7S. D. Rummelhart, Notes on a Schemafor Stories,in D. Bobrow and MA, PP. 145-150, 1983bridge, A. Collins (eds.), Representationand Understanding, Academic, p. Jacobs, "PHRED: a generator for natural language inter1gg. New York, L975. faces,"Computat.Ling. 11(4),2L9-242 (1985)' 176. A. Coreira, "Computing story trees," Am. J. Computat. Ling' 200. D. App elt, Planning English Sentences,Cambridge University 6(3-4), 135-149 (1980). Press,New York, 1985. L77. J. Levin and J. Moore, "Dialog-games:Metacommunications K. McKeowr, Text Generation, Cambridge University Press, 201. structures for natural language interaction," Cog. Sci. 1(4),395New York, 1985. 420 Q977). Schank, Inference in the ConceptualDependencyParadigm: A R. Z0Z. for model recognition 1Tg. D. Litman and J. Allen, "A plan-based personal History, Yale l]niversity, Department of Computer Scisubdialoguesin conversationl' Cog.Sci. 11 (1987). Report 141, Septemberl'978' Research ence, 1Tg. B. Grosz, Focusing and Description in Natural Language Diay. Wilks, "A preferential, pattern-seeking semanticsfor natural 208. (Jnderstanding, UniCambridge logues,inElements of Discourse language inference,"Artif. Intell. 6, 53-74 (L975). versity Press,PP.84-105, 1981. 204. R. schank and R. Abelson, scripts, Plans, Goals, and under180. C. Sidner, "Focusing for interpretation of pronouns," Am. J. standing, Lawrence Erlbaum, Hillsdale, NJ 1977. -231 (1981). Computat.Ling. 7(4), 2I7 2Ob.R. Culingford, Script Application: Computer Understanding of 181. G. Hirst, "Discourse-oriented anaphora resolution in natural Newspaper Stories, ResearchReport 116, Yale University, Delanguageunderstanding:A review," Arn. J , Computat.Ling ' 7,2, partment of Computer Science'1978' pp. 85-98. R. Wilensky, Planning and (Jnderstanding, Addison-Wesl"y, 206. Discourse of Ig2. R. Kantor, The Management and Comprehension Readng,MA, 1983. Connection by Pronouns in English, Ph.D. Dissertation, Ohio 207. J. Carbonell, "POLITICS: Automated ideological reasonilg," State UniversitY, 1977. Cog.Sci. 2, 27-51 (1978). Garland, 188. B. Webber,A formal Approach to DiscourseAnaphora, 20g. J. Kolodner, Retrieualand Organizational Strategiesin ConcepNew York, 1978. tual Memory: A Computer Model, Lawrence Erlbaum, Hillsdale, 184. J. Hobbs, "Coherenceand coreference,"Cog. Sci. 3(1), 67-90 NJ, 1984. (1979).
DESIGN COMPUTER.AIDED 209. M. Dyer, In-Depth (Jnderstanding,MIT Press, Cambridge, MA, 1983. 2I0. L. Siklossy,"A language-learningheuristic program," Cog.Psy' chol. 2, 479-495 (1971). 2IL. J. Anderson, "Induction of augmentedtransition networks," Cog. Sci. l(2), t25-157 (April 1977). 2I2. S. Klein, Automatic Inference of Semantic Deep Structure Rules in Generative Semantic Grammars, Technical Report 180, Computer Science Department, University of Wisconsin, Madison, May L973. 2I3. L. Harris, "A system for primitive natural language acquisition," Int. J. Man-Mach. stud.9, 153-206 (L977). 214. R. Berwick, Computational Analogues of Constraints on Grammars: A Model of Syntax Acquisition, Proceedingsof the EighteenthAnnual Meeting of the ACL, Philadelphia, PA, pp. 49-54, 1980. 2L5. L. Reeker, "A problem solving theory of syntax acquisition," J. Struct.Learn.2, 1-10 (1971). 2L6. P. Langl"y, "Language acquisition through error recoveryl' Cog. Brain Theor. 5,2tL-255 (1982). 217. M. Selfridge, Inference and Learning in a Computer Model of the Development of Language Comprehensionin a Young Child, in W. Lehnert and M. Ringle (eds.), Strategies for Natural Language Processing,Lawrence Erlbaum, Hillsdale, NJ, pp. 299326, 1982. 218. J. Andersonand G. Bower,Human AssociatiueMemory,Winston and Sons,Washington, DC, 1973. 2L9. J. Anderson, The Architecture of Cognition, Harvard University PressoCambridg., MA, 1983. 220. E. Feigenbaum and J. Feldman (eds.),Computersand Thought, McGraw-Hill, New York, 1963. 22L. M. Minsky (ed.), Semantic Information Processing,MIT Press, Cambridge,MA, 1968. 222. R. Rustin (ed.), Natural Language Processing, Algorithmics, New York, 1973. 223. A. Zampolli (ed.;, Linguistic Structures Processing,North-Holland, Amsterdam, L975. 224. S. Kaplan (ed.),"Specialsectionon natural languageprocessing" SIGART Newslett. 79, 42-108 (1982). 225. C. Johnson and J. Bachenko (eds.),"Applied computationallinguistics in perspectrve,"Am. J. Computat. Ling. 8(2), 55-84 (1982). 226. A. Barr and E. Feigenbaum (eds.), The Handbook of Artifi,ciat Intelligence, Vol. 1, William Kaufmann, Los Altos, CA, 1 9 81 227. W. Lehnert and W. Ringle (eds.),Strategiesfor Natural Language Processirg, Lawrence Earlbaum, Hillsdale, NJ, L982. 228. R. Grishman, An Introduction to Computational Lingistics, Cambridge University Press, New York, 1986. 229. M. King (ed.),Parsing Natural Langueg€, Academic Press,London,1983. 230. T. Winograd, Language as a Cognitiue Process,VoI. L, Syntax, Addison-Wesley,Reading,MA, 1983. 23L. K. Sparck Jones and Y. Wilks, Automatic Natural Language Parsing, Ellis Horwood, Chichester,UK, 1985. 232. D. Dowty, L. Karttunen, and A. Zwicky (eds.) Natural Language Parsing, Cambridge University Press, Cambridge, UK, 1985. 233. M. Brady and R. Berwick, Computational Models of Discourse, MIT Press,Cambridge,MA, 1983. 234. R. Schank and C. Riesbeck,Inside Computer Understanding, Lawrence Erlbaum, Hillside, NJ, 1,981. 235. R. Simmons, Computations from the English, Prentice-Hall, EnglewoodCliffs, NJ, 1984.
151
236. M. Harris, Introduction to Natural Languqge Processing,Reston Publ. Co., Reston,VA, 1985. and M. Jorvns B. BaT,LARD AT&T Bell Laboratories
DESIGN COMPUTER.AIDED Computer-aided design (CAD) is the processof utiliztng the computer to construct drawings or models of objectsor systems (I,2). CAD encompassesthe whole range from drafting to design depending on the purpose of the user. The beginning of CAD may be traced to L964 when IBM introduced the 2250 gaphic terminal together with software, allowing users to draw circuits on the face of the tube through menu selection. However, the field remained dormant until the end of the sixties, when Lockheedengineersdevelopedthe CADAM system (computer-augmented design and manufacturing) (3), which allowed convenient construction of traditional two-dimensional multiple-view orthographic projections on the face of an IBM 2250 (and later an IBM 3250) console.Even then CAD remained little known and used until the late seventies,when CAD systems grew so fast that their usagein industry is now quite commonplace.CAD systemsare often coupled with CAM (computer-aidedmanufacturing) as CAD/CAM (seeComputer-integratedmanufacturing). Indeed, one of the main links betweenCAD and CAM is the possibility of producing a numerically controlled machine program directly from the CAD model. A CAD system requires the following hardware: a digital computer, including various peripherals such as disks, tapes, printers and plotters, and one or more graphics terminals equippedwith keyboards and light pens,joysticks, mouse,or other such device,permitting the user to point to various parts of the screen (see Fig. 1). Although several types of graphic terminals exist, raster-scan systems are predominant currently, allowing resolution as high as 1200 x 1000points with a full spectrum of colors. The software includes basic display programs; programs for storing and processinginternal representation of various elements such as points, lines, curves, arcs, splines, notes, surfaces, and volumes; and programs for translating, rotating, scalitg, clipping pictures, and removing hidden lines. The principal application areas of CAD include mechanical (407o),electronics (35Vo),architecture/engineering/construction (I57o), and others (707o).CAD systems are available as turnkey systems or software packages. The former include both software and hardware (general and specialized).The latter includes a collection of programs designedto run on specific commercial hardware. There are currently some280,000CAD systemsin the U.S. (4). The 1984 CAD market topped $2 billion (10e).It is expectedto reach $28.7 billion by 1994 (5). Early CAD systemswere developedas computerizedextensions of classical drafting; modern systemsare aimed at threedimensional (3-D) models that may be classified as surface, wire-frame, and solid models (L,2) (seeFig. 2). Surfacemodels are used to define double-curvature "sculptured surfaces"such as may be found in aircraft, automobiles, shoes,etc. These surfacesare describedby various analytic techniquesdefining a surface as a quiltwork of patches (smoothly joined together) each defined as locally parameterized surfaces such as bi-cu-
152
DESIGN COMPUTER.AIDED
Figure 1. A CAD sYstem.
bics. The importance and complexity of this field has given rise to a new discipline called computational geometry (6). Wireframe models are 3-D extensions of the usual engineering drawings, wherein 3-D lines and curves denotethe hard edges of an object, i.e., those parts where the tangent to the object surface suffers a discontinuity. Thus, for example, a sphere has no real wire-frame representation, whereas the wireframe model of a cube consistsof 12 line segments.Although wire-frame models are relatively easy to construct and to process,they may be confusing and ambiguous.Thus solid models are taking over the 3-D area. These include, in particular, surface boundary models and CSG (constructive solid geometty) models. In the former caseand object is representedby a collection of surfaces separating empty space from object space.In the latter the object is defined by a number of parameteri zedprimitive solids (such as cuboids,spheres,cylinders, cones, etc.) combined by Boolean-like operations including union, intersection, and difference (seeFig. 3). The proliferation of various types of CAD models that normally cannot communicate has recently given rise to several standard proposals,including IGES (intermediate graphics exchange standard) (7) and GKS (graphics kernel system) (8).
Applicationsof Al The obvious application of AI to CAD would be to utilize AI in order to facilitate, if not automate, the CAD process.This is, in general, a difficult undertaking because so little is known
(o) CADAM
drawing.(b)
Wire-frame model. (c) CSG model.
COMPUTER-AIDED DESIGN
153
( A U B I_ C
\ lvB
a\
/\
U Figure
a
a
'U
3. CSG models constructed from Boolean operations as shown.
about the design process,at least with regard to its creative aspects. However, there are surprisingly many situations where a designer repeatedly designsvery similar objects,such as electric motors, transformers, wheels, etc. This may be described as parameterized CAD. It may be possible in such casesto developexpert systems(ES) (qv) requiring the knowledgeengineer to design the ES to discover (as usual) two types of information: characteristic features of the object and design rules. Thus, for example, the characteristic features of an electric motor might include power, size,electrical wirirg, etc. The design rules would then be utilized to assemblepartial systems or generate new ones into a single final object. The application of ESs to the design of mechanical objectsis in its infancy, and very few examples may be cited. Recently such a parameterized system has been implemented for designing multiple-spindle drill heads (9); however, it was not constructed as an ES, although it may well be consideredas such. Since a large number of manufacturing concernseffectively do parameterized design, the use of ESs in this area is certain to grow in the near future. It may be noted that the use of group technology (GT) (10) for design, which consists of classifying various objects according to some GT codehelping designersto locate previously designedparts, is a step in that direction. In the area of electronic design (LSI, VLSD the number of componentsto be designed and drawn is so large that special languages and systems have been constructed to resolve that problem. The designer can utilize these aids to describe the final operations required, and the system effectively designs and draws the detailed circuits and built-in componentsaccording to various built-in rules (11).This may well be considered as a further application of AI in CAD.
BIBLIOGRAPHY 1. Y. Gardan and M. Lucas, Interactiue Graphics in CAD, Kogan Page, London, 1984. 2. J. Encarnacaoand E. G. Schlechtendahl,Computer-AidedDesign, Springer-Verlag,Berlin, 1983. 3. R. E. Notestine, Graphics and Computer-Aided Design in Aerospace,Proceedingsof the Forty-secondNational Computer Conference,New York, L973,p. 629. 4. The 1984-1985 Directory of Computer Graphics Suppliers, Klein, Sudbury, MA private communication, Technology and Business Communications,Inc., 1985. 5. CADICAM Opportunities and Strategies,Report No. 610, International ResourceDevelopment,Manufacturing Engineering 93, 33 (Nov. 1984). 6. D. F. Rogers and J. A. Adams, MathematicalElementsfor Computer Graphics, McGraw-Hill, New York, Lg7G. 7. The latest information concerningIGES may be obtainedfrom the IGES Coordinator, National Bureau of Standards, A353 Metrology Building, Gathersburg, MD. 8. G. Enderle, K. Kansky, and G. Pfaff, Computer Graphics Programnxing: GKS-The Graphics Standard, Springer-Verlag, 1983. 9. L. Lichten, "Development of a special application computer-aided design systemi' Am. Machin. l2g, 104 (January 1985). 10. Introduction to Group Technology in Manufacturing and, Engineering,Industrial DevelopmentDivision, Institute of Scienceand Technology,The University of Michigan, Ann Arbor, Mr, L977. 11. V. Begg, DeuelopingExpert CAD Systems,Kogan Page, London, 1984. M. A. MrlxlNoFF UCLA
154
INSTRUCTION, COMPUTER-AIDED INTELLIGENT
INTETLIGENTinformation, and so on. In INSTRUCTION, COMPUTER-AIDED This entry provides an introduction to intelligent computerassistedinstruction (ICAI). ICAI is the application of AI principles to the development of instructional programs. The entry describespast ICAI projects,current developments,and future prospects.Major theoretical and practical issues in ICAI are also described. However, this entry provides but a cursory overview of the field. More detailed discussionsof ICAI can be found in Sleeman and Brown (1), Kearsley (2), and O'Sheaand Self (3). What ls lCAl? Carbonell (4) provides one of the first attempts to defi.nethe need for and nature of ICAI. Table 1 lists some of the major characteristics of ICAI programs as described by Carbonell. Mixed-initiative dialogue is one of the most distinguished aspects of ICAI programs; it refers to the capability for the student to ask a question and hence participate in a two-way interaction with the program. This contrasts with the typical one-way interaction (program presents question/problem,student responds)of conventional CAI. The net result of a mixedinitiative dialogue program is to produce a highly interactive instructional session much like the conversation between a good teacher and a motivated student (i.e., the Socratic method). A more important characteristic of ICAI programs from a design perspective is that they are constructed as knowledge networks consistingof facts,rules, and their relationships(see Representation, knowledge; Semantic networks). This contrasts with the scriptlike structure of most conventional CAI programs where all content is organi zed into screens and branching instructions that define the sequenceof instruction. In conventional CAI the author predefines the possible patterns of interaction. In an ICAI program the tutoring rules that an author uses to create these patterns are defined in the knowledge base, and the program generatesthe instructional sequence in response to student questions or mistakes. In other words, ICAI programs contain two types of knowledge: content knowledge about the subject matter being taught and pedagogicalknowledge, that is, knowledge about how to teach the subject. This leads to two other major characteristics of ICAI programs, namely, student models and error diagnosis rules. In order to determine what instruction to present next (sinceit is notpredefined), it is necessaryto have a goodidea of what the student knows and has already learned. This is achieved by identifying the aspects of the knowledge network currently understoodby the student (seealso Belief systems).In order to identify the student's present level of understanding, it is necessary to be able to diagnose any mistakes made by the student in terms of misconceptions,overgeneralizations,missing Table 1. Major Characteristics of ICAI Programs Mixed-initiative dialogue Semantic (knowledge)networks Student models Diagnostic error rules Natural language
addition to a set of general contentindependent errors made while learning (seeRef. 5), each subject domain has content-specificerrors that must be included in the knowledge network. Finally, another characteristic of ICAI identified by Carbonell is natural-language interaction. Clearly, the quality of communication between a CAI program and a student is dramatically improved if the program can understand naturallanguage input (either typed or spoken)(seeNatural-language understanding). Furthermore, many of the previous ICAI characteristics such as mixed-initiative dialogue and error diagnosisdependheavily on the semanticsknowledge associated with natural language. Despite many new developmentssince Carbonell's classic paper, the characteristics outlined in Table 1 remain some of the important conceptsof ICAI. One significant changeis that the understanding of knowledge networks has gone beyond strictly language-based(i.e., semantic) representations. In fact, the importance of natural language has diminished considerably in ICAI (and AI generally). This is becauseit is possible to implement intelligent systems using structured command languages or menu selection structures (seeMenubased natural language). One major characteristic of ICAI programs that was not discussedin Ref. 4 is the ability of ICAI programsto learn (i.e., adaptive or self-modifying systems).It is clear that a system that cannot learn from its successesand mistakes cannot be consideredfully "intelligent." In the case of ICAI programs this means a system capableof changing its teaching behavior based on how well students seem to learn via one strategy versus another (seeLearning). Although it is not listed in Table 1, the power of all ICAI programs is derived from their capability to draw inferences (seeInference).In fact, this single quality more than any other constitutes the intelligence of AI software. In the caseof ICAI programs inferencing takes place when the program attempts to deducewhat the student misunderstands and what tutoring rule to apply in order to remove the misunderstanding. If the program usesnatural langu&B€,a lot of inferencing is required just to understand the input, that is, to disambiguate pronoun referencesand fill in hidden meanings. One area where a great deal of progresshas been made in the past decadeis the design and implementation of inferencing mechanisms. The major trend in the ICAI field in the past five years has been away from mixed-initiative tutoring systemsof the kind describedby Carbonell toward diagnostic tutors or coaches.A diagnostic tutor compares the student's behavior with that of an expert for the problem domain involved. When discrepancies arise, the student is given advice about better learning or performance strategies. Diagnostic tutors are usually a more appropriate form of ICAI for games, simulations, and problemsolving situations than a mixed-initiative tutor approach.
Examplesof lCAl Programs Almost all research conductedin ICAI has been in the context of specificprograms designed for a particular subject domain. Table 2 lists someof these programs.In general,eachprogram has explored a somewhat different set of issues in cognitive scienceand AI methodology. SCHOLAR was the first ICAI program developedinitially by Carbonell, Collins, and colleaguesat Bolt, Beranek, and
INSTRUCTION,INTELLIGENT COMPUTER.AIDED Table 2. Examples of ICAI Programs Program
SCHOLAR WHY SOPHIE BIP SPADE FLOW MENO-II/Proust WEST WUMPUS BUGGY GUIDON STEAMER
Subject Area
Format
South American geography NLS text editor Meteorology Electronics troubleshooting BASIC programming LOGO programming Programming principles Pascal programming Arithmetic game Adventure game Elementary arithmetic Medical diagnosis Steam plant operation
Mixed initiative
416
Mixed initiative Mixed initiative Mixed initiative Diagnostic tutor Coach Diagnostic tutor Coach Coach Diagnostic tutor Mixed initiative Intelligent simulation
b
References
7 8 I 10 LL, L2 13 L4 15 16 T7
Newman, Inc . (4,5).Whereasthe scHoLAR systemdealt bas- problerns and then monitor the student's solution, providing ically with concepts,the WHY system (6) focused on causal diagnostic feedback as needed. One of the unique aspects of BIP was the capability to identify "gaps" in its curriculum reasoning in the domain of meteorology (specifically rainfall). A great deal of attention was paid to identifying the kinds of network that needed completion. Other attempts to develop misconceptionsexhibited by students and the corresponding programming tutors include SPADE (9), FLOW (10), MENOil (11), and Proust (L2). tutoring strategies used by the program. The WEST proglam developedby Burton and Brown (13) The SOPHIE system developedby Brown and Burton was a tutor for electronics troubleshooting (7). One of the strengths was one of the first attempts to implement a coachingstrategy. of SOPHIE was a powerful inferencing capability that allowed The idea was to implement a tutor that would unobtrusively the program to test out student ideas for troubleshooting monitor a student's progless and provide advice when approfaulty circuits and provide critical feedback or counterexam- priate. WEST was basedon an arithmetic game (How the West Was Won) originally implemented on the PLATO system.The ples when students were wrong. SOPHIE featured a naturallanguage interface that made conversational dialogues possi- major research problem in this type of ICAI program is to identify when the coach should interuupt the student and what ble (seeFig. 1). (BIP) specific advice to provide. WUMPUS (14) was another game program at Standeveloped The BASIC instructional used to study coaching. ford was the first of many attempts to develop diagnostic tuBUGGY, also developedby Brown and Burton (15),was an tors for proglamming languages. BIP was a curriculum to teach BASIC in which a BASIC interpreter and programming attempt to explore sophisticated diagnosis of student errors ("bugs") in the context of simple arithmetic problems. The tutor were embedded.The system would pose proglamming program consisted of diagnostic rules specific to arithmetic that allowed the program to infer the reasonsfor mistakes. By WHAT IS THE OUTPUT? applying these diagnostic rules to student responses,BUGGY The outputvoltageis 11.7 V . could deduce exactly what a student was doing wrong. AnWHAT IS IT IN A WORKINGINSTRUMENT? other interesting aspectof BUGGY was that it could be used to In a working circuit the output voltageis 19.9V. gtve teacherspractice in diagnosing student errors by generatWHAT IS THE VOLTAGEBETWEENR8 and Ql? ing problems with known bugs to be identified. The voltageat Nl is 47.6V. The GUIDON program developedby Clancy and colleagues WHAT IS THE BASE EMITTER VOLTAGEOF THE -LIMITING at Stanford (16) represents a major milestone in ICAI proTRANSISTOR? VOLTAGE : Base grams in that it was the first tutor built to work with an - Transistor already existing expert system (MYCIN). MYCIN is a medical The baseemittervoltageof Q5 is - 0.037mV. consultation system for diagnosing infectious diseases (see WHAT ABOUT FOR THE CURRENT.LIMITINGTRANSISTOR? Medical systems).GUIDON teachesthe diagnostic rules used The baseemittervoltageof Q6 is 0.349V. in MYCIN. The GUIDON program provided some interesting WHAT SHOULDIT BE? developments in the structure of mixed-initiative dialogues. In a working circuit the baseemitter voltageof Q6 is 0.593V. First, students could specify what they curuently know, what REPLACEQ6 want to know more about, what they do not understand, they I am goingto ask you somequestionsabouthow Q6 is faulted. they want to ignore using a terse keyword command and what junctions Are any shorted? language. Second,GUIDON featured a "don't interrupt" mode I DON'TKNOW. in which the student could only respond to questions and not Are any terminalsopen? YES. change the topic. Third, the program is able to detect partial Which terminal(s)are open? understandings by asking probing questions about students THE BASE. answers or solutions (seeFig. 2). Q6 is not faulted in that manner. Q6 is not replaced. STEAMER is an intelligent simulation for teaching steam IS IT POSSIBLE THAT Q6 HAS AN OPEN BASE? plant operation (17). It represents a synthesis of state-of-theFigure 1. Excerpt from SOPHIE (1) (student responsesin capitals). art techniques in different areas of computing, including
156
INSTRUCTION, INTELTIGENT COMPUTER.AIDED
user interface, and the instructional control. More fundamentally, the learning model underlying the ICAI program dictates how the program will be constructed and how it will work. 3a. whether the meningitis is partially treated The type of knowledge representation used in an ICAI pro3b. whether a lumbar puncture has been performed on Pt538 gram is an important determinant of the kind of tutoring pos3c. the duration of the neurological signs sible. Early ICAI programs such as SCHOLAR and SOPHIE 3d. the CSF protein used augrnented-transition networks (ATNs) for their repre(.)3e. whether Pt538 has a rash consistent with herpes zoster sentational scheme (see Grammar, augmented transition). Factors marked with an asterisk have already been stated. Based on More recent ICAI programs use an object-orientedor producwhat we have already mentioned, you should be able to conclude tion rule formalism (see Rule-based systems) to represent about: 3b. knowledge. The way the knowledge is representeddetermines USE 3c what kind of subject and student understanding the program is capable of. For example, it is important in a programming The duration of the neurological signs is 7.2 h. Using other tutor to be able to represent specificconceptssuch as variables, information available to you now, the duration of the neurological signs enables you to concludethat the type of the infection is iteration, subroutines, comments, and so on as well as more bacterial (D viral (.2) TRULE 5241. general ones such as causality, syntax, files, and so on. Differtypes of subject matter may require different knowledge ent IKNOW 3b representation structures. Good.MYCIN has also made a decision. A wide variety of different diagnostic techniqueshave been The site of CULTURE-I is evidencethat a lumbar puncture has used in ICAI prograhs, although almost all involve either been performed on Pt538 TRULE LL2l. forward or backward reasoning sequences(seeProcessing,bottom up and top down). Some diagnostic techniques have been the infection . Back to our discussionof the type of fairly simple generate and test strategies. For example, in the Summary of evidencethat the type of the infection is bacterial (.98): BUGGY program each bug rule generates an answer for the problem according to its deficietrcy,and all of the answers are 29a. The fact that the meningitis is partially (.95) is evidencethat the type of the infection is bacterial (.19) TRULE 1481. matched against the student's responseto detect which mis29b. The WBC from the CSF and the percentageof PMNs in the take the student is making. Other diagnostic techniques inCSF is evidencethat the type of the infection is bacterial (.9) volve goal-directed inferencing. In the Proust program the diTRULE 501]. agnostic routines try to match specificsubgoalsin a student's program that are generated by a plan for solving the problem. DETAILS 29a Most student models have been of the "overlay" type, that For which of the following factors would you like to see a review of is, the current representation of the student's knowledge is the evidence? represented as a subset of the complete knowledge network. (Select from the following): The simplest form of student model is a state vector in which each element of the vector represents the student's current 1. the infection that requires therapy 2. the fact that the meningitis is partially treated (.95) knowledge of a specific concept or skill. The REGIS tutorial and O'Shea (18) is a good example of a Figure 2. Example sequencefrom GUIDON (14) (student responses developedby Heines more complex type of student model A model. vector state in capitals) involves building a generative model of each student that is independent of the knowledge network. This kind of model Although accounts for the fact that the student's understanding of a tutoring. and intelligent graphics, simulation, subjectis not merely a subsetof the full knowledge domain but intelpurposes, it not is an for training is designed STEAMER a new incarnation. ligent tutor in the sense discussed in this entry. It provides an The user interface can range from natural-langu age disextremely powerful graphics interface that makes it possible course to selection of items from means or use of keyword for students to explore and learn about a complex system. A Most early ICAI programs used natural language; commands. how project is to understand major focus of the STEAMER however, natural language has been de-emphasizedin recent people build mental models of physical systems and use this years. Most current ICAI programs use menu selectionsfor the knowledge to design a good training system. user interface. This shift correspondsto a greater interest in To summarize this brief review of past ICAI efforts, it the inferencing mechanisms associated with tutoring than programs be could ICAI seems reasonable to conclude that with understanding language. These and domain. training or area subject for any developed User control is an important attribute of ICAI programs. other ICAI programs have provided a rich assortment of intriMany of the original ICAI programs featured mixed-initiative cate learning models to explore and have demonstrated the dialogue in which the student could ask a question at any and kind of fine-grained interaction possible between student time. More recent programs provide menu options that are instruction. of machine in the context always active and available. A major design consideration in the construction of a coach or diagnostic tutor is how to allow Design lssues the student to control the advice or prescriptive feedback. Almost all ICAI programs are based on an explicit or imICAI of an design the with associated issues There are many plicit theory of learning. In fact, most ICAI programs are desystem, including the type of knowledge representation, the signed to test these theories. Consider, for example, the geomdiagnostic techniques, the nature of the student model, the
FACTORS
The following factors will be useful to determine the type of infection:
INSTRUCTION,INTELLIGENT COMPUTER-AIDED
etry and LISP tutors developed by Anderson, Boyle, and Reiser (19) basedon the ACT* Iearning theory. ACT* consists of a set of assumptions about memory that have been embodied in tutoring programs. These assumptionscover the representation of procedural knowledge via productions, the use of goal structures, and working memory limitations. In some casesthe learning theory underlying an ICAI program is general in nature, whereas in other casesit is specificto the subject domain. Even though a number of ICAI programs have been developed, there has been relatively little published analysis and discussion of the design and development process.Woolf and McDonald (20) discussthe design of Meno-Tutor. This generic tutor incorporates a hierarchical discoursemanagement network consisting of three levels: pedagory, strategy, and tactics. The article discusseshow the different levels interact for a given tutoring sequence.Clancey (2L) outlines the design issues involved in building GUIDON and focuseson the distinction between an expert system and a tutor for the same knowledge domain. lmplementationlssues So far, this discussion of ICAI has been free of pragmatic or "real-world" considerations. There are three major practical issues that need attention before ICAI can enter the mainstream of the education and training world. These issues are the relationship between ICAI and conventionalCAI, accessibility of ICAI programs, and performance factors. ICAI and CAl. Researcherswho have developedpast ICAI programs have tended to be computer scientists rather than specialists in CAI. To a large extent, these researchershave not been fully cognizantof the state of the art in conventional CAI applications, tending to characterize the field as still being stuck in the drill-and-practice, or frame-oriented,tutorials of the sixties. In fact, many current applications of CAI in education and training involve sophisticated simulations, diagnostic testing, and problem-solving sequences(see Refs. 22-24). On the other hand, CAI specialists generally have backgrounds in training or education and tend to be largely ignorant of ICAI developments. Because they often lack solid grounding in computer science,the AI techniquesemployedin ICAI are novel to them. Furthermore, becauseof the performance factors discussedbelow, they tend to be skeptical of the practical value of ICAI over conventional CAI programs. The problem with this communication gap between ICAI researchersand CAI specialistsis that if ICAI is to be put to real use, it is likely to come from the CAI specialists who design and implement systems in actual educational or training settings. Even though there is a continuity between ICAI and CAI prograffis, the underlying programming methodsare different. The major difference lies in the data and control structures (qt) of ICAI versus conventional CAI. ICAI programs are implemented using symbolic or production-typelanguages(..g., LISP, PROLOG) or object-oriented languages (such as SmallTalk). Conventional CAI is implementedusing standard sequential control programming languages such as BASIC, Pascal, or C or in authoring languages such as Tutor, Pilot, or Planit with implicit control structures. Data structures in
157
ICAI programs are someform of declarative or procedural representations (i.e., knowledge networks), whereas data structures in conventional CAI are simple data statements (embedded in the control logic) or data fiIes. Despite this difference in programming structures, however, there can be strong similarities between ICAI and conventional CAI programs. For example,in certain CAI tutorials the answer analysis is very sophisticated and closely resembles the kind of diagnostic capability of some ICAI programs. In fact, the collection of keywords, feedback messages, prompts, and branches for any particular answer constitutes the components that would form a knowledge network, student model, and tutoring rules in an ICAI program. The major difference is that the componentsare implicit rather than explicit in the programming. It is not completely obvious how critical the use of AI programming languages are to the creation of ICAI programs. AI programming languages have been developedto make it easy to create programs with the kind of rule-based,context-dependent processingrequired in intelligent programs. However, it is likely that such routines could be implemented using popular high-level languages. Accessibilityof lCAl. This discussionof the differencesin prograrnming structures between ICAI and conventional CAI programs leads to the secondmajor pragmatic consideration associatedwith the current state of ICAI, namely, accessibility. Becausethere are relatively few CAI specialists who are familiar with the type of languages used to create ICAI programs (actually, very few computer scientists in general), accessto the tools neededto do ICAI is very limited at present. Although the conceptual knowledge of how to construct ICAI programs is in theory distinct from the implementation techniques, in practice, the two are very closely intertwined. For example, most developersof ICAI programs do their own programming using an AI language. Another development neededin order for ICAI programs to becomemore widespreadin the near term is the availability of ICAI authoring tools. The existence of authoring languages and systems has made a considerable difference in the time and cost associated with developing conventional CAI programs (2il. Comparable authoring tools need to be developed so that ICAI programs can be created quickly and without the considerableAI knowledge presently required. The emerging generation of software for building expert systems (qv) (e.9., Ml, ART, KEE, etc.) may be useful as a basis for such ICAI authoring tools. Greater availability of ICAI programs is also needed.It is very difficult for designers and programmers to understand the nature of ICAI unless they are able to examine and use such programs. Accessto ICAI programs is usually limited to those immediately involved in the development of a program. At present there are no ICAI programs commercially available for personal computers. Once such programs are on the market, examples of ICAI programs will be more accessible. PerformanceFactors.Closelyrelated to this issueof accessibility is the third pragmatic consideration: program performance. Like most AI programs, ICAI programs tend to be very computationally complex and require enormous amounts of memory. For this reason,they have usually been developedon high-performance machines. Even then, the responsetimes of
158
INSTRUCTION, INTELLIGENT COMPUTER-AIDED
ICAI programs are often very slow and unacceptablefor operational environments. The recent emergenceof LISP machines (qv), powerful supermicros designed primarily for AI applications, has had a significant impact on the performance problem. LISP machines typically have at least 1 Mbyte of RAM and 40 or more megabytes of disk storage. These machines are too expensive at present to be widely used in schools or training centers. However, as personal computerscontinue to improve in priceperformanceratios, it should be possibleto run ICAI programs on machines that are commonly available and relatively inexpensive. Although the raw power of the computersused for AI applications is an important factor in determining performance, the efficiency of the programming involved is also a consideration. As discussedearlier, past ICAI programs have been developed as research tools, not as operational software. Hence, relatively little attention has been given to making ICAI programs run efficiently. If standard techniques used in data processingwere applied to ICAI programs (e.g.,codecompression, hash tables, etc.) or the programs rewritten in general-purpose languages that run faster, substantial improvement in performance could be achieved. The three factorsjust discussed(ICAI vs. CAI, accessibility, and performance) are all current limitations that will dissipate with time. ICAI techniques will be incorporated into conventional CAI programs (and all other software) as more instructional designersand programmers learn such techniques and as suitable computers becomewidely available. FutureDevelopments Probably the most important development neededto increase the use and application of ICAI does not come from the computer field but the domain of cognitive science(qv). ICAI provides the capability to design highly individualized instruction that maps very closely onto the learning strategies and thinking processesof the student. Hence,there must be a good theoretical understanding of human learning and cognition for ICAI to work. Even though the topic has been scientifically studied for hundreds of years, there are still no complex models of how people learn and think. In fact, much of the progress that has been made in cognitive sciencein the past two decadeshas been made by AI researchers.Attempts to design and implement computer programs that exhibit intelligence has forced a great deal of attention on the characteristics of human intelligence. Consequently, much of the research in ICAI has focusedattention on how people learn. There are a number of developments in other areas of computer technology that are closely related to ICAI or likely to become so. The most obvious example is computer-based speechprocessing (i.e., synthesis and recognition) capability (see Speech understanding). The ability to talk to the computer and have it respond opensup a new dimension of interactive instruction in terms of the amount and kind of information that can be presented and analyzed via CAI. Another technology of considerable relevance is computer graphics. Historically ICAI programs have been very text based, making littte of use of visual presentation modes. Yet all other instructional media (including most conventional CAI) make heavy use of visuals and graphics. The STEAMER program mentioned above is an example of how "intelligent" gfaphics
can extend the boundaries of ICAI. In addition, all developments in the expert systems areas are relevant to ICAI in terms of new software tools on programming techniques. There are some very significant sociological implications associated with ICAI. The tradition of didactic, classroombased teaching has become a very strong cultural fixture in the North American educational system. ICAI presents a whole different tradition-one-on-one tutoring that is largely inquiry driven in nature. Such a radically different tradition is not going to be easily accepted or assimilated in schools or training centers. Although hardware and software technologres can change very quickly, traditions (particularly educational ones)change very slowly. Consequently,ICAI programs will likely need new curricula designedfor them. The dream of ICAI researchers is to provide each student with a computer-basedtutor that has all of the qualities of a master teacher. This includes great scopeand depth of subject matter expertise, excellent knowledge of teaching techniques, powerful communication skills, and the ability to inspire and motivate students to learn. Clearly, most conventional CAI programs are a long way from this ideal. So far, ICAI programs have primarily focusedon the first two qualities of a master teacher, namely, subject matter expertise and teaching techniques. This is manifested in the knowledge networks and tutoring rules, which form the basis of ICAI. Less progress has been made in providing powerful communication skills. This is due to the impoverished nature of the communication interface between peopleand machines. Goodcommunicators use many modalities to pick up and convey information, including sight, speechqualities, facial and body movements,touch, and so on. If ICAI programs are going to be better communicators, they must possesssomeanalog to listening and oration skills. The least progresshas been made in the affective area. It is true that some educational games and simulations are highly motivating, but their effect on motivation is short term rather than the long-term impact of real inspiration. Similarly, the positive-feedbackremarks of the "You're doing terrific!" variety have very little real effect on motivation (if any). In order for a program to truly inspire learning in a student, the program would need to exhibit enthusiasm for the subject matter and for learning. A great deal of teaching is concernedwith the transmission of beliefs and values. Can these qualities be buitt into a computer program? The question of values and belief systems embedded in computer programs is likely to becomean important issue in the coming decadesas software becomestruly intelligent. Before having to deal with such profound issues in ICAI and elsewhere, there are many more mundane developments to worry about. This entry has discusseda number of the developments needed in order for ICAI to becomemore prevalent. More instructional designersand CAI specialistswho are familiar with ICAI methodology are needed.Authoring tools that make it much easier and faster to create ICAI programs and widespread availability of affordable personal computers powerful enough to run ICAI programs are also needed.All of these developments are likely to take place within the next five years. Thus, before the end of this decadethere will probabty be a dramatic improvement in the quality of computerbased instruction. However, for the reasons discussedin this entry, it will be much longer before this improvement is widely implemented in the classroom.
COMPUTERCHESSMETHODS To summarrze, a sufficient number of ICAI programs have been created to demonstrate the potential of ICAI. Practical issues that underly the wider use of ICAI include competition with conventional CAI, accessibility, and performance. In addition, significant advances are needed in the understanding of human learning, and ICAI research will likely make and benefit from major contributions to instructional psychology. Lastly, the success of ICAI depends on the emergence of a new tradition of teaching that is different from the current didactic classroom pedagogy.
BIBLIOGRAPHY
20. B. Woolf and D. D. McDonald, "Building a computer tutor: Design issues,"IEEE Compuf. (September 1984). 2L. W. J. Clancey, Methodologyfor Building an Intelligent Tutoring System, Department of Computer ScienceReport, 81-894, Stanford University, October 1981. 22. A. Bork, Learning with Compu.ters,Digital Press, Bedford, MA, 1 9 81 . 23. S. M. Alessi and S. R. Trollip, ComputerBasedInstruction: Methods and Deuelopment,Prentice-Hall, Englewood Cliffs, NJ, 1985. 24. G. Kearsley, Computer Based Training, Addison-Wesley,Readirg, MA, 1982. 25. G. Kearsley, "Authoring systems in computer based education," Carnmun. ACM, 25(7), 429-437 (1982). G. KnaRSLEY Park Row Software
1. D. Sleeman and J. S. Brown, Intelligent Tutoring System.s,Academic Press,New York, 1982. 2. G. Kearsley, Artifi,cial Intelligence and Instruction: Applications and Methods, Addison-Wesley,Reading, MA, 1987. 3. T. O'Shea and J. Self, Learning and Teaching with Computers: Artifi,cial Intelligence in Education, Prentice-Hall, Englewood Cliffs, NJ, 1983. 4. J. R. Carbonell, "AI in CAI: An artificial intelligence approachto computer aided instruction," IEEE Trans. Man-Machine Sys., 1l(4), 190-202 (L970). 5. A. Collins, Processesin Acquiring Knowledge, in R. C. Anderson, R. J. Spiro, and W. Montague (eds.),Schooling and the Acquisition of Knowledge, Lawrence Erlbaum, Hillsdale, NJ, L976. 6. A. Stevens,A. Collins, and S. E. Golden,"Misconceptionsin student's understanding," fnt. J. Man-Machine Stud., 11, L45-L56
(1e7e). 7. J. S. Brown, R. Burton, and J. deKleer, Pedagogical,Natural Language, and Knowledge Engineering Techniques in SOPHIE I, II, and III, in D. Sleeman and J. Brown (eds.),Intelligent Tutoring Systems,Academic, New York, L982. 8. A. Barr, M. Beard, and R. C. Atkinsoh, "A rationale and description of a CAI program to teach the BASIC programming language,"Instruc. Sci.,4, L-31 (1975). 9. M. L. Miller, "A structural planning and debugging environment for elementary programming," Int. J. Man-Machine Stud.,lr 7995 (1979). 10. D. Gentner, Toward an Intelligent Tutor, in H. F. O'Neil (ed.), Procedures fo, Instructional Systems Deuelopment, Academic Press,New York, L979. 11. E. Soloway et al, "Meno-II: An AI basedprogramming tutor," J. Comput.BasedInstruc., l0(1 and 2), 20-34 (1983). L2. W. L. Johnson and E. Solow&y,"Proust," BYTE (April 1985). 13. R. Burton and J. S. Brown, "An investigation of computer coaching for informal learning activities," Int. J. Man-Machine Stud., tL, 5-24 (1979). L4. B. Carr and I. Goldstein, Ouerlo.ys;A Theory of Modeling for Computer Aided Instruction, AI Memo 406, MIT AI Lab, Cambridge, MA, L977. 15. J. S. Brown and R. R. Burton, "Diagnostic models for procedural bugs in basic mathematical skills," Cogn. Sci.,2, L55-192 (1978). 16. W. J. Clancey, "GI-IIDON," J. Comput.BasedInstruc., 10(1 and 2), 8-15 (1983). L7. J. D. Hollan, E. L. Hutchins, and L. Weitzman, "STEAMER: An interactive inspectable simulation-based training system," AI Mag.(Summer 1984). 18. J. M. Heines and T. O'Shea, "The design of a rule-based CAI tutorial," Int. J. Man-Machine Stud., 16, 356-371 (1984). 19. J. R. Anderson,C. F. Boyle, and B. J. Reiser,"Intelligent tutoring systems," Science,228, 456-462 (1985).
159
CHESSMETHODS COMPUTER HistoricalPerspective Of the early chess-playing machines the best known was exhibited by Baron von Kempelen of Vienna in 1769. Like its relations it was a conjurer's box and a grand hoax (I,2). In contrast, about 1890 a Spanish engineer, Torres y Quevedo, designed a true mechanical player for king-and-rook against king end games.A later version of that machine was displayed at the Paris Exhibition of L914 and now resides in a museum at Madrid's Polytechnic University (2). Despite the successof this electromechanical device, further advanceson chess automata did not comeuntil the 1940s.During that decadethere was a sudden spurt of activity as several leading engineers and mathematicians, intrigued by the power of computersand fascinated by chess,began to expresstheir ideas on computer chess.Some,like Nemes of Budapest (3) and Zuse (4), tried a hardware approach, but their computer chess works did not find wide acceptance.Others, like noted computer scientist Turing, found successwith a more philosophical tone, stressing the importance of the stored program concept (5). Today, best recognized are the 1965 translation of de Groot's 1946 doctoral dissertation (6) and the much referenced paper on algorithms for playing chessby Shannon (7). Shannon'spaper was read and reread by computer chessenthusiasts and provided a basis for most early chessprograms. Despite the passage of time, that paper is still worthy of study. landmarks in Chess Program Development.The first computer model in the 1950swas a hand simulation (5); programs for subsets of chess followed (8), and the first full working program was reported in 1958 (9). By the mid-1960sthere was an international computer-computer match (10) between a program backedby John McCarthy of Stanford [developedby a group of students from MIT (11)l and one from the Institute for Theoretical and Experimental Physics (ITEP) in Moscow(I2). The ITEP group's program (under the guidance of the wellknown mathematician Georgi Adelson-Velskiy) won the match, and the scientists involved went on to develop Kaissa, which became the first world computer chess champion in L974 (13). [Descriptions of these programs can be found in various books (13,14). Interviews with some of the designers have also appeared(15).1Meanwhile there emergedfrom MIT
160
COMPUTTRCHESSMETHODS
another prograffi, MACHACK-6 (qv) (16),which boostedinterest in AI. First, MACHACK was demonstrably superior not only to all previous chess programs but also to most casual chessplayers. Secondly,it contained more sophisticatedmoveordering and position evaluation methods. Finally, the program incorporated a memory table to keep track of the values of chesspositions that were seen more than once. In the late sixties, spurred by the early promise of MACHACK, several people began developing chess programs and writing proposals.Most substantial of the proposalswas the 29-pointplan by Good(17). By and large, experimentersdid not make effective use of these works, at least nobody claimed a program based on those designs,partly becauseit was not clear how someof the ideas could be addressedand partly becausesome points were too naive. Even so, by 1970 there was enough progressthat Newborn was able to convert a suggestionfor a public demonstration of chessplaying computersinto a competition that attracted eight participants (18). Due mainly to Newborn's careful planning and organization, this event continues today under the title "The ACM North American Computer Chess Championship." In a similar vein, under the auspicesof the International Computer Chess Association, a worldwide computer chess competition has evolved. Initial sponsorswere the IFIP triennial conferencein Stockholm (L974) and Toronto (1977),and later there were independent backers such as the Ltnz (Austria) Chamber of Commerce (1980), ACM New York (1983), and for 1986, the city of Cologtr€,Federal Republic of Germany. In the first world championship for computers Kaissa won all its games, including a defeat of the eventual secondplace finisher, Chaos. An exhibition match against the 1973 North American Champion, Chess4.0, was drawn (10).Kaissa was at its peak, backed by a team of outstanding experts on tree-searchingmethods. In the secondchampionship (Toronto, 1977), Chess 4.6 finished first with Duchess(19) and Kaissa tied for secondplace. Meanwhile both Chess 4.6 and Kaissa had acquired faster computers, a Cyber 176 and an IBM 3701 165, respectively. The traditional exhibition match was won by Chess 4.6, indicating that in the interim it had undergone far more development and testing (20). The Third World Championship(Lin2,1980) finished in a tie betweenBelle and Chaos. In the playoff Belle won convincingly, providing perhaps the best evidence yet that a deeper search more than compensatesfor an apparent lack of knowledge. In the past this counterintuitive idea had not found ready acceptancein the AI community. More recently, in the New York 1983 championship another new winner emerged, Cray Blitz (2D. More than any other, that program drew on the power of a fast computer, here a Cray X-MP. Originally Bhtz was a selective searchprogram in the sensethat it could discard some movesfrom every position basedon a local evaluation. Often the time savedwas not worth the attendant risks. The availability of a faster computer made it possible to use a purely algorithmic approach and yet retain much of the expensive chess knowledge. Although a mainframe won that event, small machines made their mark and seem to have a great future (22). For instance, Bebe with special-purposehardware finished second,and even experimental versions of commercial products did well. lmplications. All this leads to the common question: When will a computer be the unassailed expert on chess?This issue
was discussedat length during a "Chess on Nonstandard Architectures" panel discussionat the ACM L984 National Conference in San Francisco. It is too early to give a definitive answer, and even the experts cannot agree; their responses covered the whole range of possible answers from "in five years" (Newborn), "about the end of the century" (Scherzer and Hyatt), "eventually, it is inevitable" (Thompson),and "never, or not until the limits on human skill are known" (Marsland). Even so there was a sensethat production of an artificial Grand Master was possibleand that a realistic challenge would occur during the first quarter of the twenty-first century. As addedmotivation, Edward Fredkin (MIT professor and well-known inventor) has created a special incentive prize for computer chess.The trustee for the Fredkin Prize is Carnegie-Mellon University and the fund is administered by Hans Berliner. Much like the Kremer prize for man-poweredflight, awards are offered in three categories.The smallest prize of $5000 has already been presented to Ken Thompson and Joe Condon, when their Belle progTam achieved a U.S. Master rating in 1983.The other awards of $10,000for the first Grand Master progTamand $100,000for achieving world champion status remain unclaimed. To sustain interest in this activity, each year a $1500 prize match is played between the currently best computer and a comparably rated human. One might well ask whether such a problem is worth all this effort, but when one considerssome of the emerging uses of computersin important decision-makingprocesses,the answer must be positive. If computers cannot even solve a decision-making problem in an area of perfect knowledge (like chess),how can we be sure that computers make better decisions than humans in other complex domains-especially in domains where the rules are ill-defined or those exhibitittg high levels of uncertainty? Unlike some probleffis, for chess there are well-established standards against which to measure performance, not only through a rating scale (23) but also using standard tests (24) and relative performance measures (25). The ACM-sponsored competitions have provided 15 years of continuing experimental data about the effective speedof computers and their operating system support. They have also afforded a public testing ground for new algorithms and data structures for speedingthe traversal of search trees. These tests have provided growing proof of the increased understanding about chessby computers and the encoding of a wealth of expert knowledge. Another potentially valuable aspect of computer chess is its usefulness in demonstrating the power of man-machine cooperation. One would hope, for instance, that a computer could be a useful adjunct to the decision-making process,providing perhaps a steadying influence and protecting against errors introduced by impulsive shortcuts of the kind people might try in a carelessor angry moment. In this and other respects it is easy to understand Michie's belief that computer chess is the "Drosophila melanogaster[fruit fly] of machine intelligence" (26). Terminology There are several aspectsof computer chessof interest to AI researchers.One area involves the description and encodingof chessknowledge in a form that enables both rapid accessand logical deduction in the expert system sense.Another fundamental domain is that of search (qv). Since computer chess programs examine large trees, a depth-first search is com-
COMPUTERCHESSMETHODS
monly used. That is, the first branch to an immediate successor of the current node is recursively expanded until a leaf node (a node without successors)is reached. The remaining branches are then consideredas the searchprocessbacks up to the root. Other expansion schemesare possible, and the domain is fruitful for testing new search algorithms. Since computer chess is well defined and absolute measures of performance exist, it is a useful test vehicle for measuring algorithm efficiency. In the simplest case the best algorithm is the one that visits fewest nodeswhen determining the true value of a tree. For a two-person game tree this value, which is a least upper bound on the merit for the side to move, can be found through a minimax search (seeMinimax procedure).In chess this so-calledminimax value is a combination of both the "MaterialBalance" (i.e., the difference in value of the piecesheld by each side) and the "StrategicBalance" (e.g., a composite measure of such things as mobility, square control, pawn formation structure, and king safety). Usually MaterialBalance is dominant. Minimax Search.For chessthe nodesin a two-persongame tree represent positions, and the branches correspond to moves.The aim of the search is to find a path from the root to the highest valued terminal node that can be reached under the assumption of best play by both sides.To represent a level in the tree (i.e., a play or half move) the term ply was introducedby Arthur Samuel in his major paper on machine learning (27). How that word was chosenis not clear, perhaps as a contraction of play or maybe by associationwith forests as in layers of plywood. In either caseit was certainly appropriate, and it has been universally accepted. A true minimax search is expensivesince every leaf node in the tree must be visited. For a tree of uniform width IV and fixed depth D there are WD terminal nodes.Some games,like Fox and Geese (28), produce narrow trees (fewer than 10 branches per node) that can often be solved exhaustively. In
contrast, chessproducesbushy trees (average branching factor about 35 moves).Becauseof the magnitude of the game tree, it is not possibleto search until a mate or stalemate position (a leaf node) is reached,so somemaximum depth of search (i.e., a horizon) is specified.Even so, an exhaustive searchof all chess game trees involving more than a few moves for each side is impossible. Fortunately, the work can be reduced since it can be shown that the search of some nodes is unnecessary. Alpha-BetaAlgorithm. As the search of the game tree proceeds,the value of the best terminal node found sofar changes. It has been known since 1958 that pruning was possible in a minimax search (29), but according to Knuth and Moore, the ideas go back further, to McCarthy and his group at MIT. The first thorough treatment of the topic appears to be Brudno's 1963 paper (30). The alpha-beta algorithm (see Alpha-beta pruning) employs lower (alpha) and upper (beta)bounds on the expectedvalue of the tree. These bounds may be used to prove that certain moves cannot affect the outcomeof the searchand hence that they can be pruned or cut off. As part of the early descriptions about how subtrees were pruned, a distinction between deep and shallow cutoffs was made. Someversions of the alpha-beta algorithm used only a single bound (alpha) and repeatedly reset the beta bound to infinity, so that deepcutoffs were not achieved.Knuth and Moore's recursive F2 algorithm (31) corrected that flaw. In Figure 1 Pascal-like pseudocodeis used to present the alpha-beta algorithm, AB, in Knuth and Moore's negamax framework. A Return statement has been introduced as the convention for exiting the function and returning the best subtree value or score.Omitted are details of the game-specific functions Make and Undo (to update the game board), Generate (to find moves),and Evaluate (to assess terminal nodes).In the pseudocodeof Figure 1 the max(alpha, merit) operation represents Fishburn's "fail-soft" condition (32) and ensures that the best available value is returned (rather than an alpha-beta bound). This idea is usefully em-
position; aIpha, beta, depth : integer) : integer; ) { p is pointer to the current node ) t atpha and beta are window bounds t depth is the remaining search Iength ) { the vatue of the subtree is returned ) V A Rm e n i t , j , v a t u e : i n t e g e r ; p o s n : A R R A Yt l . . M A X t l I D T H l0 F p o s i t i o n ; { Note: dePth must be Positive } BEGIN { hor i zon node, il?x i mum depth? } IF depth = 0 THEN R e t u r n ( E v a I u a t e ( p)); F U N C T I OANB ( p :
posn := Generate(p); I F e m p t y ( p o s n )T H E N R e t u r n ( E v a t u a t e ( p));
161
{ p o i n t t o s u cc e s s o r p o s i t i o n s { t eaf, ho moves?
) )
{ f i n d m e ri t o f b e s t v a r i a t i o n ) merit := -MAXINT; F O Rj : = 1 T 0 s i z e o f ( p o s n ) D O B E G I N { make current move } Make(posnljl); v a t u e : = - A B ( p o s n l j l , - b e t a , - m a x ( a L p h a , m e r i t ) ,d e p t h - 1 ); IF (vatue { note new best score } merit := vaIue; Undo(posntjl); { retract current move} { cutoff? } IF (merit G 0 T 0d o n e ; E N Di done: Return(merit); E N Di
Figure 1. Depth-limited alpha-beta function.
162
COMPUTERCHESSMETHODS
ployed in some of the newer refinements to the alpha-beta algorithm.
more efficient and requires no assumptionsabout the choiceof aspiration window (37).
Minimal Came Tree. If the "best" move is examined first at every node, the tree traversed by the alpha-beta algorithm is referred to as the minimal game tree. This minimal tree is of theoretical importance since its size is a measure of a lower bound on the search. For uniform trees of width W branches per node and a search depth of D ply, there are
Minimal Window Search.Theoretical advances such as Scout (38) and the comparable minimal window search techniques (32,37) were the next products of research. The basic idea behind these methods is that it is cheaper to prove a subtree inferior than to determine its exact value. Even though it has been shown that for bushy trees minimal window techniques provide a significant advantage (37), for random game trees it is known that even these refinements are asymptotically equivalent to the simpler alpha-beta algorithm. Bushy trees are typical for chess,and so many contemporary chess programs use minimal window techniques through the principal variation search (PVS) algorithm. In Figure 3 a Pascal-like pseudocodeis used to describePVS in a negamax framework but with game-specificfunctions Make and Undo omitted for clarity. Here the original version of PVS has also beenimproved by using Reinefeld'sdepth - 2 tdea(39), which ensures that re-searchesare only done when the remaining depth of search is greater than 2.
wtDt2l+ wLDt2t 1 terminal nodes in the minimal game tree. Although others derived this result, the most direct proof was given by Knuth and Moore (31). Since a terminal node is rarely a leaf, it is often called a horizon node, with D the distance to the horizon (33). AspirationSearch.An alpha-beta searchcan be carried out with the initial bounds covering a narrow range, one that spans the expected value of the tree. In chess these bounds might be (MaterialBalance Pawn, MaterialBalance + Pawn). If the minimax value falls within this range, no additional work is necessary,and the searchusually completesin measurably less time. The method was analyzed by Brudno (30), referred to by Berliner (34), and experimented with in Tech (35) but was not consistently successful.A disadvantage is that sometimesthe initial bounds do not enclosethe minimax value, in which case the search must be repeated with correctedbounds, as the outline of Figure 2 shows. Typically these failures occur only when material is being won or lost, in which case the increased cost of a more thorough search is warranted. Because these re-searchesuse a semi-infinite window, from time to time people experiment with a "sliding window" of (V , V * PieceValue)instead of ( % +MAXINT). This method is often effective but can lead to excessivere-searching when mate or large material gain/loss is in the offing. After L974 "iterated aspiration search" came into general use, as follows: Before each iteration starts, alpha and beta are not set to - infinity and + infinity as one might expect,but to a window only a few pawns wide, centered roughly on the finat score [ualue] from the preuious iteration (or preuious moue in the caseof the first iteration) . This setting of "high hopes"increases the number of alpha-beta cutoffs.(36) Even so, although aspiration searching is still popular and has much to commend it, minimal window search seemsto be t { t {
A s s u m eV = e= dePth = P=
aLpha beta
V := AB IF (V V := ELSE IF (V V
est i mated va I ue of posi t i on p, and expected error Limit current distance to horizon position being searched v e; t I ower bound V + e; t upper bound
] ) ) ) ] )
(p, atpha, beta, depth ) ; beta) THEN { AB ( p, V, +ltlAX I NT, dept h)
f a i Li n s h i g h
}
aLpha) THEN { A B ( p , - t t f A x I N T ,V , d e p t h ) ;
fail,ing low
]
A s u c c e s s f u I s e a r c h h a s n o w b e e n c o m pI e t e d V n o w h o Ld s t h e c u r r e n t v a I u e o f t h e t r e e Figure
2. Narrow-window
aspiration search.
Forward Pruning. To reduce the size of the tree that should be traversed and to provide a weak form of selective search, techniques that discard some branches have been tried. For example, tapered N-best search (11,16)considersonly the Nbest moves at each node. Here N usually decreaseswith increasing depth of the node from the root of the tree. As Slate and Atkin observe, "The major design problem in selective search is the possibility that the lookahead processwill exclude a key move at a low level in the game tree" (36). Good examples supporting this point are found elsewhere (40). Other methods, such as marginal forward prunin g (4I) and the gamma algorithm (18), omit moveswhose immediate value is worse than the current best of the values from nodes already searchedsince the expectation is that the opponent'smove is only going to make things worse. Generally speaking, these forward pruning methods are not reliable and should be avoided. They have no theoretical basis, although it may be possible to develop statistically sound methods that use the probability that the remaining moves are inferior to the best found so far. One version of marginal forward pruning, referred to as razorin g (42), is applied near horizon nodes.The expectationin all forward pruning is that the side to move can improve the current value so it may be futile to continue. Unfortunately, there are caseswhen the assumption is untrue, for instance, in zugzwangpositions. As Birmingham and Kent point out, their Master program "defines zugzwang precisely as a state in which every move available to one player creates a position having a lower value to him (in its own evaluation terms) than the present bound for the position" (42). Marginal pruning may also break down when the side to move has more than one piece en prise (e.9.,is forked), and so the decisionto stop the search must be applied cautiously. Despite these disadvantages,there are soundforward pruning methods, and there is every incentive to develop more since it is one way to reduce the size of the tree traversed, perhaps to less than the minimal game tree. A goodprospectis through the development of programs that can deducewhich branches can be neglected by reasoning about the tree they traverse.
COMPUTERCHESSMETHODS F U N C T I 0P NV S ( p :
V A Rm e r i t , j ,
position; atpha, beta, depth : integer) : integer; t p is pointer to the current node t atpha and beta are rindow bounds t depth is the remaining search Length { the vatue of the subtree is returned vatue: integer;
p o s n : A R R AtY 1 " ' ' A X . ' J I D ToHt I o " i t i l : : i
BEGIN IF depth = 0 THEN R e t u n n ( E v a t u a t e ( p)); posn := Generate(p); iF empty(posn) THEN R e t u r n ( E v a t u a t e ( p));
163
) ) ) }
o " i . n m u s tb e p o s i t i v e )
{ h o r i z o n n o d e , m a x i m u md e p t h ? } t point to successor positions ) { I'eaf, no moves? }
{ principal. variation? } merit := -PVS (posn[1], -beta, -atpha, depth-1]; F O Rj : = 2 T 0 s i z e o f ( p o s n ) D 0 B E G I N { cutoff? } IF (menit > beta) THEN G0T0 done; atpha := max(merit, atpha); t fai t-soft condition ) { zero-width minima[-windowsearch ] vatue:= -PVS (posnljl, -aLpha-1, -alpha, depth-1); IF (vatue > merit) THEN { re-search, if "fait-high" } IF (atpha < va[ue) AilD (vatue < beta) AND(depth > 2) THEN merit := -PVS (posnIj], -beta, -vatue, depth-1) E L S Em e r i t : = v a t u e ; E N D; done: Return(merit); EI'ID; Figure 3. Minimal window principal variation search.
Move ReorderingMechanisms.For efficiency (traversal of a smaller portion of the tree) the moves at each node should be ordered so that the more plausible ones are searchedsoonest. Various ordering schemesmay be used. For example, "since the refutation of a bad move is often a capture, all captures are consideredfirst in the tree, starting with the highest valued piece captured" (43). Special techniques are used at interior nodes for dynamically reordering moves during a search. In the simplest case, at every level in the tree a record is kept of the moves that have been assessedas being best or good enough to refute a line of play and so cause a cutoff. As Gillogly observed,"If a move is a refutation for one line, it may also refute another line, so it should be consideredfirst if it appears in the legal move list" (43). Referred to as the killer heuristic, a typical implementation maintains only the two most frequently occuming "killers" at each level (36). Recently, a more powerful scheme for reordering moves at an interior node has been introduced. Named the history heuristic, it "maintains a history for every legal move seen in the search tree. For each move, a record of the move's ability to cause a refutation is kept, regardless of the line of play" (44). At an interior node the best move is the one that either yields the highest merit or causesa cutoff. Many implementations are possible,but a pair of tables (eachof 64 x 64 entries) is enough to keep a frequency count of how ofben a particular move (defined as a from-to square combination) is best for each side. The available moves are reordered so that the most successfulones are tried first. An important property of this so-calledhistory table is the sharing of information about the effectivenessof moves throughout the tree rather than only at nodes at the same search level. The idea is that if a move is frequently good enough to cause a cutoff, it will probably be effective whenever it can be played.
QuiescenceSearch. Even the earliest papers on computer chessrecognizedthe importance of evaluating only those positions that are "relatively quiescent" (7) or "dead" (5). These are positions that can be assessedaccurately without further search. Typically they have no moves, such as checks,promotions, or complex captures, whose outcome is unpredictable. Not all the moves at horizon nodes are quiescent (i.e., lead immediately to dead positions) so some must be searchedfurther. To limit the sizeof this so-calledquiescencesearch,only dynamic rnovesare selectedfor consideration.These might be as few as the moves that are part of a single complex capture but can expand to include all capturing moves and all responsesto check (43). Ideally, passedpawn moves (especially those close to promotion) and selected checks should be included (2I,25), but these are often only examined in computationally simple end games. The goal is always to clarify the node so that a more accurate position evaluation is made. Despite the obvious benefits of these ideas, the realm of quiescence search is unclear because no theory for selecting and limiting the participation of moves exists. Present quiescence search methods are attractive becausethey are simple, but from a chess standpoint they leave much to be desired, especially when it comesto handling forking moves and mate threats. Even though the current approaches are reasonably effectiv€, 4 more sophistieated method of extending the search or of identifying relevant moves to participate in the selective quiescencesearch is needed (45). On the other hand, Sargon managed quite well without quiescencesearch using direct computation to evaluate the exchange of material (a6). Horizon Effect. An unresolved defect of chessprograms is the insertion of delaying moves that causeany inevitable loss of material to occur beyond the program's horizon (maximum search depth) so that the loss is hidden (33). The "horizon
164
COMPUTERCHESSMETHODS
effect" (qv) is said to occur when the delaying moves give up additional material to postponethe eventual loss.The effect is less apparent in programs with more knowledgeable quiescence searches (45), but all programs exhibit this phenomenon. There are many illustrations of the difficulty; the example in Figure 4, which is based on a study by Kaindl (4b), i, clear. Here a program with a simple quiescencesearchinvolving only captures would assumethat any blocking move saves the queen.Even an eight-pty search(b3-b2,B x iZ; c+_c3,B x c3; d5-d4, B x d4; e6-eb,B x eb) would not seethe inevitable "thinking" that the queen has been saved at the expenseof four pawns! Thus, prosams with a poor or inad"qr1u[uquiescencesearch suffer more from the horizon effect. The besl way to provide automatic extension of nonquiescent positions is still an open question, despite proposals such as bandwidth heuristic search (47).
obvious why iterative deepening is effective; as indeed it is not, unless the search is guided by the entries in a transposition table (or the more specialized refutation table), which holds the best moves from subtrees traversed during the previous iteration. All the early experimental evidenceindicated that the overhead cost of the preliminary D - 1 iterations was often recovered through a reduced cost for the D-pIy search. Later the efficiency of iterative deepening was quantified to assessvarious refinements, especially memory table assists (37). Today the terms progressiveand iterative deepeningare often used synony-o,rcly.
Transpositionand RefutationTables.The results (merit, best move, status) of the searchesof nodes(subtrees)in the tree can be held in a large hash table (16,36,48).Such a table serves several purposes,but primarily it enables recognition of move transposition, leading to a subtree that has been seen before Progressiveand lterative Deepening.The term progressive and so eliminate the need to search. Thus, successfuluse of a deepenittgwas used by de Groot (6) to encompassthe notion of transposition table is an example of exact forward pruning. selectively extending the main continuation of interest. This Many programs also store their opening book, where different type of selective expansion is not performed by programs em- move orders are common, in a way that is compatible with ploying the alpha-beta algorithm, except in the sense of in- accessto the transposition table. Another important purpose creasing the search depth by one for each checking move on of a transposition table is as an implied move reordering mechthe current continuation (path from root to hori zon)or by per- anism. By trying first the available move in the table, an forming a quiescencesearch from horizon nodes until dead expensive move generation may be avoided (48). positions are reached. By far the most popular table accessmethod is the one In the early 1970sseveral peopletried a variety of ways to proposedby Zobrist (49). He observed that a chess position control the exponential growth of the tree search. A simple constitutes placement of up to 12 different piece types {K, q, R , fixed depth search is inflexible, especially if it must be com- B, N, P, -K, . . , -P} on to a 64-squareboard. Thus, a set of pleted within a specifiedtime. Gillogly, author of Tech (4g), 12 x 64 unique integers (plus a few more for en passanf and coinedthe term iterative deepeningto distinguish a full-width castling privileges), {Ri]}, may be used to represent all the search to increasing depths from the progressively rnore fo- possiblepiecesquarecombinations.For best results theseintecusedsearch describedby de Groot. About the sametime Slate gers should be at least 32 bits long and be randomly indepenand Atkin sought a better time control mechanism and introdent of each other. An index of the position may be produced duced the notion of an iterated search (gG)for carrying out a by doing an EXCLUSIVE OR on selectedintegers as follows: progressively deeper and deeper analysis. For example, an itPj : Ro x o186 x or . . . xorR, erated series of one-ply, two-ply, three-ply, and so on, searches is carried out, with each new search first retracing the best where the Ro, . , R* are integers associatedwith the piece path from the previous iteration and then extending the placements.Movement of a "man" from the piece square assosearch by one ply. Early experimenters with this schemewere ciated with Rs to the piece square associated with R' yields a surprised to find that the iterated search often required less new index, time than an equivalent direct search. It is not immediately Pp: (Pi x or R) x orBl
ry ffiu%'^rchry %Kry%ru % %,r-ry,,',ffi % ,ffit,ffi_t %,r% %
One advantage of hash tables is the rapid accessthat is possible, and for further speedand simplicity only a single probe of the table is normally made. More elaborate schemeshave been tried, but often the cost of the increasedcomplexity of managing the table swamps any benefits from improved tabte usage. Table 1 shows the usual fields of each entry in the hash table. Figure 5 contains sample pseudocodeshowing how the entries Move, Merit, Flag, and Height are used. Not shown are the
Table l. Typical Transposition Table Entry Lock
Merit Flag
To ensure the table position is identical to the tree position Best move in the position, determined from a previous search Value of subtree, computed previously Indicates whether merit is upper bound, lower bound,
Height
or true merit Length of subtree upon which merit is based
Move
bcdefe Blackto move Figure
4. Horizon effect.
COMPUTERCHESSMETHODS F U N C T I OANB ( p : p o s i t i o n ; a t P h a , b e t a , d e p t h : i n t e g e r ) : i n t e g e r ; integer; vARvaIue, height, merit: j, move: l..MAXWIDTH i f L a g : ( V A L I D , L B O U N DU, B O U N D ) ; p o s n : A R R A Yt 1 . . i l A x l ' t I D T H lO F p o s i t i o n ; BEGIN t retrieve merit and best move for the current position ) Retrieve(p, height, merit, fLag, move); { t t
height is the effective subtree tengthposition not in tabLe. height < 0 position in tabte. height > 0
IF (height IF (fLag = VALID) THEN Return(merit); I F ( f L a g = L B O U N DT) H E N , erit); aLpha := max(aLPham I F ( f L a g = U B O U N DT) H E N beta := min(beta, merit); IF (aLpha Return(merit); END; Note: update of the al.pha or beta bound t is not vatid in a setective search. { If merit in table insufficient to end t s earch try best move (from tabIe) first, { before generating other moves. t IF (depth = 0) THEN R e t u r n ( E v a I u a t e ( p )) ; IF (height
t
hor i zon node?
} ) )
) } ] ] ) )
t first try movefrom tabte ) merit := -AB (posnlmovel, -beta, -al.pha, depth-1); IF (merit G 0 T Od o n e ; E N D E L S Em e r i t : = - M A X I N T ; generate moves No cut-off, ) t posn := Generate(p)i [eaf, mate or statemate? ] IF empty(posn) THEN t R e t u r n ( E v a L u a t e ( p )) ; FORj := 1 T0 sizeof(posn) D0 I F j + m o v e T H E NB E G I i I vatue := -AB (posntjl, IF (vaIue merit := vaIue1 move := i; IF (merit G0T0 done; END; END; done: fLag:= VALID; IF (merit fLag := UBOUND; IF (merit fLag := LBOUND; IF (height Store(p, depth, merit, Return(merit); END;
t using fail,-soft condition ) - b e t a , - m a x ( a t p h a , m e r i t ) , d e p t h - 1) ;
16s
table guides a progressive deepening searchjust as well as a transposition table. In fact, a refutation table is the preferred choice of commercial systems or users of memory-limited processors.A small triangular workspace t@ x D)12 entriesl is neededto hold the current continuation as it is generated,and these entries in the work spacecan also be used as a sourceof killer moves (51). Summary.The various terms and techniques described have evolved over the years. The superiority of one method over another often dependson how the elements are combined. The utility of iterative deepenirg, aspiration search,PVS, and transposition and refutation tables is perhaps best summarized by a revised version of an establishedperformancegraph (37) (Fig. 6). That graph was made from data gathered by a simple chess program when analyzing the 24 standard positions of the Bratko-Kopec test (24).Analysis of thosepositions requires the search of trees whose nodes have an average width of W : 34 branches. Thus, it is possible to use the formula for the terminal (horizon) nodes in a uniform minimal game tree as an estimate of the lower bound on the searchsize (seeFiS. 6). For the results presented in Figure 6 the transposition table was fixed at 8000 entries so that the effectsof table overloading may be seen. Figure 6 shows that: (a) iterative deepeninghas negligible cost and so is useful as a time control mechanism; (b) PVS is superior to aspiration search;(c) a refutation table is a space-efficientalternative to a transposition table for guiding both the next iteration and a re-search; (d) odd-ply alpha-beta searchesare more efficient than even-ply ones; (e) transposition table size must increase with depth of search; and (f) transposition and/or refutation tables plus the history heuristic are an effective combination, achieving search results close to the minimal game tree for odd-ply search depths. Strengthsand Weaknesses
t
update hash tabte )
fLag, move);
Figure 5. Alpha-betawith transpositiontable.
functions Retrieve and Store, which accessand update the transposition table. A transposition table also identifies the preferred move sequencesused to guide the next iteration of a progressivedeepening search. Only the move is important in this phase since the subtree length is usually less than the remaining search depth. Transposition tables are particularly advantageous to methods like PVS since the initial minimal window search loads the table with useful lines that are used in the event of a re-search. On the other hand, for deeper searches,entries are commonly lost as the table is overwritten even though the table may contain more than a million entries (50). Under these conditions a small fixed-sizetransposition table may be overused(overloaded)until it is ineffective as a means of storing the continuations. To overcomethis fault, a special table for holding these main continuations (the refutation lines) is also used. The table has W entries containing the D elements of each continuation. For shallow searches(D < 6) a refutation
Anatomy of a ChessProgram. A typical chessprogram contains the following three distinct elements: board description and move generation, tree searching/pruning, and position evaluation. Many people have basedtheir first chessprogram on Frey and Atkin's instructive Pascal-basedmodel (52). Although several goodproposalsexist in readily available books (14,20) and articles (53,54), the most efficient way of representing all the tables and data structures necessaryto describe a chessboard is not yet known. From these tables the move list for each position can be generated. Sometimes the Generate function producesall the feasible movesat once,which has the advantage that the movesmay be sorted to improve the probability of a cut off. In small memory computers, on the other hand, the moves are produced one at a time. This savesspace and perhaps time whenever an early cutoff occurs. However, sinceonly limited sorting is possible(capturesmight be generated first), the searching efficiency is generally lower. In the area of searching/pruning methods, variations on the depth-limited alpha-beta algorithm remain the preferred choice. All chessprograms fit the following general model. A full-width "exhaustive" search (all moves are considered)is done at the first few ply from the root node. At depths beyond this exhaustive layer some form of selective search is used. Typically, unlikely or unpromising moves are simply dropped from the move list. More sophisticated programs carry out an
166
COMPUTERCHESSMETHODS
-cl O l-
c)
a
(.) I
? (J c) t-
;q.)
.: 6
(.) q)
l-
a,) N.
c\
s e a r c hd e p t h ( p l y ) Figure
6. Comparison of alpha-beta enhancements.
extensive analysis to select those moves that are to be discarded at an interior node. Even so, this type of forward pruning is known to be error prone and dangerous;it is attractive becauseof the big reduction in tree size that ensues.Finally, the Evaluate function is invoked at the horizon nodesto assess the merits of the moves. Many of these are captures or other forcing moves that are not "dead," and so a limited quiescence search is carried out to resolve the unknown potential of the move. The evaluation processis the most important part of a chessprogram becauseit estimates the values of the subtrees that extend beyond the horizon. Although in the simplest case Evaluate simply counts the material balance,for superior play it is also necessaryto measure many positional factors, such as pawn structures. These aspects are still not formalrzed, but adequate descriptions by computer chess practitioners are available in books (14,86).
mainframes will continue to be faster for the near future, it is only a matter of time before massiveparallelism is applied to computer chess.The problem is a natural demonstration piece for the power of distributed computation since it is processor intensive and the work can be partitioned in many ways. Not only can the game trees be split into similar subtreesbut also parallel computation of such componentsas move generation, position evaluation, and quiescencesearch is possible. Improvements in hardware speedhave been an important contributor to computer chess performance. These improvements will continue, not only through faster special-purpose processorsbut also by using many processingelements.
Software Advances.Many observers attributed the advances in computer chess through the 1970s to better hardware, particularly faster processors.Much evidence supports that point of view, but major improvements also stemmedfrom HardwareAdvances.Computer chesshas consistentlybeen a better understanding of quiescenceand the horizon effect and a better encoding of chessknowledge. The benefits of aspiin the forefront of the application of high teehnology. With search (43), iterative deepening (36) [especiallywhen ration of special-purpose (55), introduction the saw the 1970s Cheops a refutation table (51)1,the killer heuristic (43),and with in used tried; were of computers networks Later for chess. hardware New York (1983) Ostrich used an eight-processorData Gen- transposition tables (16,36)were also appreciated,and by 1980 eral system (56) and Cray Blitz a dual-processorCray X-MP all were in general use. One other advance was the simple (21). Someprograms used special-purposehardware [see,€.8., expedient of "thinking on the opponent's time" (43), which Belle (57,58)and Bebe,Advance3.0, and BCP (14)1,and there involved selecting a response for the opponent, usually the predicted were several experimental commercial systems employing move predicted by the computer, and searching the and this tactic, by is lost Nothing reply. next the position for chips custom VLSI chips. This trend toward the use of custom may be saved time the made, is prediction successful a masterwhen latest the of success the by will continue, as evidenced caliber chessprogram Hitech from Carnegie-Mellon Univer- accumulated until it is necessary or possible to do a deeper emsity based on a new chip for generating moves (59). Although search. Anticipating the opponent's response has been
COMPUTERCHESSMETHODS
braced by all microprocessor-basedsystems since it increases their effective speed. Not all advances work out in practice. For example, in a test with Kaissa the method of analogies "reduced the search by a factor of 4 while the time for studying one position was increasedby a factor of 1.5" (60).Thus, a dramatic reduction in the positions evaluated occurred, but the total execution time went up and so the method was not effective. This sophisticated technique has not been tried in other competitive chess programs. The essenceof the idea is that captures in chessare often invariant with respect to several minor moves. That is, some minor moves have no influence on the outcome of a specific capture. Thus, the true results of a capture need be computed only onceand stored for immediate use in the evaluation of other positions that contain this identical capture! Unfortunately, the relation (sphere of influence) between a move and those piecesinvolved in a capture is complex,and it can be as much work to determine this relationship as it would be to simply reevaluate the exchange.However, the method is elegant and appealing on many grounds and should be a fruitful area for further research as a promising variant restricted to pawn moves illustrates (61). EndGame Play. During the 1970sthere developeda better understanding of the power of pawns in chess and a general improvement in the end game play. Even so, end games remained a weak feature of computer chess.Almost every game illustrated some deficiency through inexact play or conceptual blunders. More commonly, however, the progTamswere seen to wallow and move piecesaimlessly around the board. A good illustration of such difficulties is a position from a game between Duchessand Chaos (Detroit, 1979) (seeFig. 7), which was analyzed extensively in an appendix to a major reference (20). After more than 10 hours of play the position in Figure 7 was reached, and since neither side was making progressthe game was adjudicated after white's LLIth move of Bc6-d5. White had just completed a sequenceof 2L reversible moves with only the bishop, and black had responded correctly by simply moving the king to and fro. Duchesshad only the most rudimentary plan for winning end games.Specifically,it knew about avoiding a 50-moverule draw. Had the game continued,
% %,'ffit 'lffi
%%,%,%n
167
then within the next 29 movesit would either play an irreversible move like Pf6 -f7 or give up the pawn on f6. Another 50move cycle would then ensue,and perhaps eventually the possibility of winning the pawn on a3 might be found. Even six years later it is doubtful that many programs could handle this situation any better. There is simply nothing much to be learned through search. What is needed here is some higher notion involving goal-seekingplans. All the time a solution must be sought that avoids a draw. This latter aspectis important since in many variations black can simply offer the sacrifice bishop takes pawn on f6 (B x f6) becauseif the white king recaptures with K x f6, a stalemate results. Sometimes, however, chessprograms are supreme. At Toronto in 1977, in particular, Belle demonstrateda new strateW for defending the lost ending KQ versus KR against chess masters. Although the ending still favors the side with the queen, precise play is required to win within 50 moves, 8s several chessmasters were embarrassedto discover.In speed chessBelle also often dominates masters, as many examplesin the literature show (20). Increasingly, chess programs are teaching even experts new tricks and insights. As long ago as 1970 Strohlein built a database to find optimal solutions to several simple three- and four-pieceend games (kings plus one or two pieces)(62). Using a Telefunken TR4 (48-bit word, 8-t s operations) he obtained the results summarized in Table 2. Many other early workers on end gamesbuilt databasesof the simplest endings. Their approach was to develop optimal sequencesbackward from all possiblewinning positions (mate or reduction to a known subproblem) (63,64).These works have recently been reviewed and put into perspective (65). The biggest contributions to chess theory, however, have been made by Belle (qv) and Ken Thompson (66). They have built databases to solve five-piece end games. Specifically, KQX versus KQ (where X :Q, R, B, or N), KRX versus KR, and KBB versus KN. This last casemay prompt another revision to the 50-move rule since in general KBB versus KN is won (not drawn), and lessthan 67 movesare neededto mate or safely capture the knight (66). Also completedis a major study of the complex KQP versus KQ ending. Again, often more than 50 maneuvers are required before a pawn can advance (66). For more complex endings involving several pawns, the most exciting new ideas are those on chunking. Based on these ideas, it is claimed that the "world's foremost expert" has been generated for endings where each side has a king and three pawns (67,68). MemoryTables. Others have pointed out (36,50)that a hash table can also be used to store information about pawn formations. Since there are usually far more movesby piecesthan by pawns, the value of the base pawn formation for a position must be recomputed several times. It is a simple matter to build a hash key based on the location of pawns alone and so
ffi# ,ffi,ffimry t%,%,%% %%%
abcdefgh
Whiteto move Figure
7. Lack of end game plan.
Table 2. Maximum Moves to Win Simple End Games Pieces
Queen Rook Rook vs. Bishop Rook vs. Knight Queen vs. Rook
Moves
Computation Time
10 16 18 27 31
6.5 min 9 min 6h30min 14 h 16 min 29h9min
168
COMPUTERCHESSMETHODS
store the values of pawn formations in a hash table for immediate retrieval. Hyatt found this table to be effective (21) since otherwise 10-207oof the searchtime was taken up with evaluation of pawn structures. A high (98-997o) successrate was reported QD. King safety can also be handled similarly (36,50)sincethe king has few movesand for long periodsis not under attack. Transposition and other memory tables comeinto their own in end games since there are fewer piecesand more reversible moves. Search time reduction by a factor of 5 is common, and in certain types of king and pawn endings it is claimed that experiments with Cray Bhtz and Belle have producedtrees of more than 30 ply, representing speedupsof well over a 100fold. Even in complex middle games,however, significant performance improvement is observed.Thus, use of a transposition table provides an exact form of forward pruning and as such reducesthe size of the search space,in end gamesoften to less than the minimal game tree! The power of forward pruning is well illustrated by the following study of "Problem No. 70" (69) (Fig. 8), which was apparently first solved (52) by Chess4.9 and then by Belle. The only complete computer analysis of this position was provided later (2L). As Hyatt puts it, a solution is possible because "the search tree is quite narrow due to the locked pawns" (21). Here Cray Blitz is able to find the correct move of Kal-bl at the 18th iteration. The complete line of the best continuation was found at the 33rd iteration after examining four million nodes in about 65 s of Cray-l time. This was possible because the transposition table had become loaded with the results of draws by repetition, and so the normal exponential growth of the tree was inhibited. Also, at every iteration the transposition table was loaded with losing defences corresponding to lengthy searches.Thus, the current iteration often yielded results equivalent to a much longer 2(D l)-ply search. Thompson refers to this phenomenonas "seeing over the horizon" (66).
chess position evaluation. The essenceof the selective approach is to narrow the width of search by forward pruning. Some selection processesremoved implausible moves only (70), thus abbreviating the width of search in a variable manner not necessarily dependent on node level in the tree. This technique was only slightly more successfulthan other forms of forward pruning and required more computation. Even so,it too could not retain sacrificial moves. So the death knell of selective search was its inability to predict the future with a static evaluation function. It was particularly susceptible to the decoysacrifice and subsequententrapment of a piece.Interior node evaluation functions that attempted to deal with these problems becametoo expensive.Even so, in the eyes of some,selective methods remain as a future prospectsince "selective search will always loom as a potentially faster road to high level play. That road, however, requires an intellectual break-through rather than a simple application of known techniques" (58).The reasonfor this belief is that chessgame trees grow exponentially with depth of search. Ultimately, it will becomeimpossible to obtain the necessarycomputing power to search deeperwithin normal time constraints. For this reason most chessprograms already incorporate some form of selective search,often as forward pruning. Thesemethods are quite ad hoc since they are not basedon a theory of selectivesearch. Although nearly all chessprograms have some form of seIective search, even if it is no more than the discarding of unlikely moves,at present only two major programs (Awit and Chaos) do not consider all moves at the root node. Despite these programs can no longer comtheir occasionalsuccesses, pete in the race for Grand Master status. Nevertheless, although the main advantage of a program that is exhaustive to some chosen search depth is its tactical strength, it has been shown that the selective approach can also be effective in tactical situations. In particular, Wilkins's Paradise program demonstrated superior performance in "tactically sharp middle game positions" on a standard suite of tests (71).Paradisewas designedto illustrate that a selective searchprogram can also find the best continuation when there is material to be gained, from beta advances came SelectiveSearch. Many software ter understanding of how the various componentsin evalua- through searching but a fraction of the game tree viewed by tion and search interact. The first step was a move away from such programs as Chess 4.4 and Tech. Furthermore, it can do selective search by providing a clear separation between the so with greater success than either program or a typical algorithmic component, search, and the heuristic component, A-classplayer (71). However, a 9: 1 speedhandicapwas necessary to allow adequate time for the interpretation of the Maclisp program. Paradise's approach is to use an extensive static analysis to produce a small set of plausible winning plans. Once a plan is selected,"it is used until it is exhausted or until the program determines that it is not working." In addition, Paradise can "detect when a plan has been tried earlier along the line of play and avoid searching again if nothing has changed" (71). This is the essenceof the method of analogiestoo. As Wilkins says,the "goal is to build an expert knowledge base and to reason with it to discover plans and verify them within a small tree" (71). Although Paradise is successfulin this regard, part of its strength lies in its quiescence search, which is seen to be "inexpensive compared to regular search," despite the faet that this search "investigates not only captures but forks, pins, multimove mating sequences,and other threats" (71). The efficiency of the program lies in its powerful evaluation so that usually "only one move is investigated at each node, except when a defensive move abcdefeh Whiteto move fails." Pitrat has also written extensively on the subject of finding plans that win materi aL(72), but neither his ideas nor Figure 8. Transposition table necessity.
KffiffiD
COMPUTERCHESSMETHODS
those in Paradise have been incorporated into the competitive chessprograms of the 1980s. Searchand KnowledgeErrors. The following game was the climax of the 15th ACM NACCC, in which all the important programs of the day participated. Had Nuchess won its final match against Cray Blitz,there would have been a five-way tie between these two programs and Bebe, Chaos,and Fidelity X. Such a result almost came to pass, but suddenly Nuchess "snatched defeat from the jaws of victorY," as chesscomputers are prone to do. Complete details about the game are not important, but the position shown in Figure 9 was reached.Here, with Rf6 x 96, Nuchess wins another pawn, but in so doing enters a forced sequencethat leaves Cray Blitz with an unstoppablepawn on a7, as follows:
169
that addressedthis issue; a thesis that could have someimpact on the way expert systems are tested and built since it demonstrates that there is a correct order to the acquisition of knowledge if the newer knowledge is to build effectively on the old.
Areasof FutureProgress.Although most chessprograms are now using all the available refinements and tables to reduce the game tree traversal time, only in the ending is it possible to search consistently less than the minimal game tree. Selective search and forward pruning methods are the only real hope for reducing further the magnitude of the search. Before this is possible, it is necessary for the progfams to reason about the trees they see and deduce which branches can be ignored. Typically, these wiII be branches that create permanent weaknessesor are inconsistent with the current themes. The difficulty will be to do this without losing sight of tactical Rg8 x 96+ 45. Rf6 x 96 ? factors. Nc8 x d6 46. KS5 x 96 Improved performance will also come about by using faster 47. Pc5 x d6 computers and through the construction of multiprocessorsysMany explanations can be given for this error, but all have to tems. One early multiprocessor chess program was Ostrich do with a lack of knowledge about the value of pawns. Perhaps (56,74).Other experimental systemsfollowed,including Parablack's passed pawn was ignored because it was still on its belle (75) and ParaPhoenix (76). None of these systems,nor home square, or perhaps Nuchess simply miscalculated and the strongest multiprocessor program Cray Bhtz (2t1, consist"forgot" that such pawns may initially advancetwo rows? An- ently achievesmore than a five-fold speed-upeven when eight other possibility is that white became lost in some deep processorsare used (76). There is no apparent theoretical limit searchesin which its own pawn promotes. Even a good quies- to the parallelism, but the practical restrictions are great and cencesearch might not recognizethe danger of a passedpawn, may require some new ideas on partitioning the work as well especially one so far from its destination. In either case this as more involved scheduling methods. Another major area of research is the derivation of strateexample illustrates the need for knowledge of a type that cannot be obtained easily through search but that humans are gies from databasesof chessend games.It is now easy to build . Pa5, expert system databasesfor the classical end gamesinvolving able to see at a glance (6). The game continued,47. and white was neither able to prevent promotion nor advance four or five pieces.At present these databasescan only supply the optimal move in any position (although a short principal its own pawn. There are many opportunities for contradictory knowledge continuation can be provided by way of expert advice).What is interactions in chessprograms. Sometimeschessfolklore pro- needednow is a program to deducefrom these databasesoptivides ground rules that must be applied selectively. Such ad- mally correct strategies for playing the end game. Here the vice as a knight on the rim is dim is usually appropriate, but database could either serve as a teacher of a deductive inferin special casesplacing a knight on the edge of the board is ence program or as a tester of plans and hypotheses for a sound, especially if it forms part of an attacking theme and is general learning program. Perhaps a good test of these methunassailable. Not enough work has been done to assessthe ods would be the production of a program that could derive utility of such knowledge and to measure its importance. Re- strategies for the well-defined KBB versus KN end game. A cently, Schaeffercompleted an interesting doctoral thesis (73) solution to this problem would provide a great advanceto the whole of AI.
BIBLIOGRAPHY
%,;.,ruffiLru
1. A. G. Bell, The MachinePla,ysChess?, PergamonPress,Oxford, 1978. 2. D. N. L. Levy,Chessand Computers, Batsford,London,1976.
'&ft abcdefgh White'smove 45 Figure
9. A costly miscalculation.
3. T. Nemes,"The Chess-PlayingMachine," Acta Technico,Hungarian Academyof Sciences,Budapest,1951,pp. 215-239. 4. K. Zuse, "Chess Programs," in The Plankalkul, Report No. 106, Gesellschaftfur Mathematik und Datenverbeitung, Bonn, 1976, pp. 201-244 (translation of German original, 1945). 5. A. M. Turi.g, "Digital Computers Applied to Gam€s," in B. V. Bowden (ed.), Faster Than Thoughf, Pitman, London, 1953, pp. 286-297. 6. A. D. de Groot, Thought and Choicein Chess,Mouton, The Hague, 1965. 7. C. E. Shannon, "Programming a computer for playing chess,"phiIos. Mag. 41,256-275 (1950).
17O
COMPUTERCHESSMETHODS
8. J. Kister, P. Stein, S. Ulam, W. Walden, and M. Wells, "Experiments in chess,"JACM 4, I74-L77 (1957). 9. A. Bernstein, M. de V. Roberts,T. Arbuckle and M. A. Belsky, A ChessPlaying Program for the IBM 704, WesternJoint Computer ConferenceProceedings,Los Angeles, AIEE, New York, pp. 157159, 1958. 10. B. Mittman, A Brief History of Computer Chess Tournaments: 1970-1975,in P. Frey (ed.),ChessSkill in Man and Machine, lst ed., Springer-Verlag,New York, pp. 1-33, 1977. 11. A. Kotok, A ChessPlaying Program for the IBM 7090,B.S. Thesis, MIT, AI Project Memo 4L, Computation Center, Cambridg", MA, L962. 12. G. M. Adelson-Velskii,V.L. Arlazarov,A. R. Bitman, A. A. Zhwotovskii, and A. V. Uskov, Programming a Computer to Play Chess,Russian Math. Surueys,Vol. 25, Cleaver-HumePress,London, pp. 22I-262 (1970). (Translation of Proc. lst summer school Math. Prog. Vol. 2, 1969,pp. 216-252). 13. J. E. Hayes and D. N. L. Levy, The World ComputerChessChampionship, Edinburgh University Press, Edinburgh, 1976. L4. D. E. Welsh and B.Baczynskyj, ComputerChessII, W. C. Brown, Dubuque, IA, 1985. 15. H. J. van den Herik, Computerschaak,Schaakwereld en Kunstmatige Intelligentie, Ph.D. Thesis, TechnischeHogeschoolDelft, Academic Service,'s-Gravenh&ga,The Netherlands, 1983. 16. R. D. Greenblatt, D. E. Eastlake III and S. D. Crocker, The Greenblatt ChessProgram, Fall Joint Computing ConferenceProceedings31, San Francisco,ACM, New York, pp. 801-810, 1967. L7. I. J. Good,A Five-Year Plan for Automatic Chess,in E. Dale and D. Michie (eds.), Machine Intelligence,Vol. 2, Elsevier, New York, pp. 89-118, 1968. 18. M. M. Newborn, Computer Chess, Academic Press, New York, L975. 19. T. R. Truscott, Techniques used in Minimax Game-Playing Programs, M.S. Thesis, Duke University, Durham NC, April 1981. 20. P. W. Frey (ed.), Chess Skill in Man and Machine, 2nd €d., Springer-Verlag,New York, 1983. 2I. R. M. Hyatt, A. E. Gower, and H. L. Nelson, Cray Bhtz, in D. Beal (ed.),Aduancesin Computer Chess,Vol. 4, Pergamon Press,Oxford, pp. 8-18, 1985. 22. D. Levy and M. Newborn, More Chessand Computers,2nd ed., Computer SciencePress,Rockville, MD, 1981. 23. A. E. Elo, The Rating of Chessplayers,Past and Present, Arco Publishirg, New York, 1978. 24. D. Kopec and I. Bratko, The Bratko-Kopec Experiment: A Comparison of Human and Computer Performance in Chess, in M. Clarke (ed.), Aduances in Computer Chess, Vol. 3, Pergamon Press,Oxford, pp. 57-72, L982. 25. K. Thompson,Computer ChessStrength, in M. Clarke (ed.),Aduancesin Computer Chess,Vol. 3, Pergamon Press, Oxford, pp. 55-56, 1982. 26. D. Michie, "Chess with computers," Interdisc. Sci. Reu. 5(3), 2L5-227 (1980). 27. A. L., Samuel,"Somestudiesin machine learning using the game of checkers,"IBM J. Res.Deu.3,2L0-229 (1959).[Also in Computers and Thought, E.Feigenbaum and J. Feldman (eds.),McGraw-Hill, New York, 1963,pp. 71-105.1 28. A. G. Bell, GamesPlaying with Computers,Allen & Unwin, London, L972. 29. A. Newell, J. C. Shaw and H. A. Simon, "Chessplaying programs and the problem of complexity,"IBM J. Res-Deu. 4(2),320-335 (1958). [Also in Computersand Thought, E.Feigenbaum and J. Feldman (eds.),McGraw-Hill, New York, 1963,pp. 39-701.
30. A. L. Brudno, "Bounds and valuations for abridging the search of estimat€s," Probl. Cybern 10, 225-24I (1963). (Translation of Russian original in Problemy Kibernetiki, Vol. 10, May 1963,pp. 141-150.) 31. D. E. Knuth and R. W. Moore, "An analysis of alpha-betaprunirg," Artif. Intell. 6@),293-326 (1975). 32. J. P. Fishburn, Analysis of Speedup in Distributed Algorithms, UMI ResearchPress, Ann Arbor, MI, 1984. 33. H. J. Berliner, Some Necessary Conditions for a Master Chess Program, Proceedingsof the Third International Joint Conference on Artificial Intelligence,Stanford, CA, pp. 77-85, 1973. 34. H. J. Berliner, Chess as Problem Solving: The Development of a Tactics Analyzer, Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, March I974. 35. J. J. Gillogly, PerformanceAnalysis of the TechnologyChessPro9r8fr, Technical Report CMU-CS-78-189, Computer Science, Carnegie-Mellon University, Pittsburgh PA, March 1978. 36. D. J. Slate and L. R. Atkin, CHESS 4.5-The Northwestern University Chess Program, in P. Frey (ed.), ChessSkill in Man and Machine, 1st ed., Springer-Verlag,New York, pp. 82-118, L977. 37. T. A. Marsland, Relative Efficiency of Alpha-Beta Implementations, Proceedings of the Eighth International Joint Conferenceon Artificial Intelligence, Karsruhe, FRG, pp. 763-766 (August 1983). 38. J. Pearl, "Asymptotic properties of minimax trees and game searchingprocedures,"Artif. Intell. l4(2), 113-138 (1980). 39. A. Reinefeld,J. Schaefferand T. A. Marsland, Information Acquisition in Minimal Window Search,Proceedingsof the Ninth International Joint Conferenceon Artificial Intelligence, Los Angeles, pp. 1040-1043(August 1985). 40. P. W. Frey, An Introduction to Computer Chess,in P. Frey (ed.), ChessSkiII in Man and Machine, Springer-Verlug, New York, pp. 54-81 , 1977. 4I. J. R. Slagle, Artiftcial Intelligence: The Heuristic Programming Approach, McGraw-Hill, New York, 1971. 42. J. A. Birmingham and P. Kent, Tree-searchingand Tree-pruning Techniques,in M. Clarke (ed.),Aduancesin ComputerChess,Vol. 1, Edinburgh University Press,Edinburgh, pp. 89-107, L977. 43. J. J. Gillogly, "The technology chess program," Artif. Intell., 3(1-4), L45-L63 (1972). 44. J. Schaeffer,"The history heuristic," ICCA J., 6(3), 16-19 (1983). 45. H. Kaindl, Dynamic Control of the QuiescenceSearch in Computer Chess,in R. Trappl (ed.),Cyberneticsand SystemsResearch, North-Holland, Amsterdam, pp. 973-977 , L982. 46. D. Spracklen and K. Spracklen, An Exchange Evaluator for Computer Chess,Byte, L6-28 (November 1978). 47. L. R. Harris, The Heuristic Search and the Game of Chess,Proceedingsof the Fourth International Joint Conferenceon Artificial Intelligence,Tbilisi, Georgia,pp. 334-339, L975. 48. T. A. Marsland and M. Campbell, "Parallel search of strongly orderedgame trees," Comput.Suru., L4(4),533-551(1982). 49. A. L. Zobrist, A Hashing Method with Applications for Game Playing, Technical Report 88, Computer SciencesDepartment, University of Wisconsin,Madison WI, April 1970. 50. H. L. Nelson, "Hash tables in Cray Blitz," ICCA J., 8(1), 3-13 (1985). 51. S. G. Akl and M. M. Newborn, The Principal Continuation and the Killer Heuristrc, 7977ACM Annual ConferenceProceedings,Seattle, OctoberL977,ACM, New York, pp. 466-473, L977. 52. P. W. Frey and L. R. Atkin, Creating a Chess PlaY€r, in B. L. Liffick (ed.),The BYTE Book of Pascal,Znd. ed., BYTE/McGrawHill, Peterborough,NH, pp. 107-155, L979.
MANUFACTURING COMPUTER-INTECRATED 53. A. G. Bell, "Algorithm 50: How to program a computer to play legal chess,"Comput. J., l3(2),208-219 (1970). 54. S. M. Cracrafb,"Bitmap move generationin chess,"ICCA J.,7(3), 146-r52 (1984). 55. J. Moussouris, J. Holloway, and R. Greenblatt, CHEOPS: A Chess-OrientedProcessingSystem, in J. Hayes, D. Michie and L. Michulich (eds.),Machine Intelligence,Vol. 9, Ellis Horwood,Chichester,pp. 351-360, L979. 56. M. Newborn, A Parallel SearchChessProgram, Proceedingsof the Denuer,ACM, New York, pp. 272-277, ACM Annual Conference, 1985. 57. J. H. Condon and K. Thompson, Belle Chess Hardware, in M. Clarke (ed.), Aduances in Computer Chess, Vol. 3, Pergamon Press, Oxford, pp. 45-54, t982. 58. J. H. Condonand K. Thompson,Belle, in P. Frey (ed.;,ChessSkill in Man and Machine, 2nd ed., Springer-Verlag, New York, pp. 20L-2L0, 1983. 59. C. Ebeling and A. Palay, The Design and Implementation of a VLSI ChessMove Generator,EleuenthAnnual International Symposium on Computer Architecture, Ann Arbor, MI, June 1984, IEEE, New York, pp.74-80, 1984.
171
IEEE Trans. Pattern AnaI. Mach. Intell., 7(4), 442-452 (July 1985). 76. T. A. Marsland, M. Olafsson, and J. Schaeffer, Multiprocessor Tree-SearchExperiments, in D. Beal (ed.),Aduancesin Computer Chess,Vol. 4, PergamonPress,Oxford, pp. 37-51 (1985). T. A. MenslnNn Universitv of Alberta
NTEGRATED MANU FACTURING COMPUTER-I
Computer-integrated manufacturing (CIM) is, basically, the technology that embracesthe full range of the unique ability possessedby the digital computer and related computer technolory that greatly enhancesthe capabilities ofthe entire manufacturing process.That ability has three main elements.The first of these is the ability of the computer to provide on-line, variable-program (flexible) automation of manufacturing activities and equipment. The secondis its ability to provide on60. G. M. Adelson-Velsky,V. L. Arlazarov, and M. V. Donskoy, Algoline, moment-by-moment optimization of manufacturing acrithms of Adaptive Search, in J. Hayes, D. Michie and L. Michutivities and operations.With respectto both of these elements, lich (eds.),Machine Intelligence,Vol. 9, Ellis Horwood,Chichester, it should be noted that the computer has the ability to accomU.K., pp. 373-384, 1979. plish such not only with the "hard" componentsof manufactur61. H. Horacek, "Knowledge-basedmove selection and evaluation to ing (e.g., the manufacturing machinery and equipment) but guide the search in chess pawn endings," ICCA J., 6(3), 20-37 also with the "soft" componentsof manufacturing (the infor(1983). mation flow, the handling of databases,etc.). However, as is 62. T. Strohlein, Untersuchungen uber Kombinatorische Speile, Doc- becoming more and more widely recognized,the third element toral Thesis, Technischen Hochschule Munchen, Munich, FRG, of the computer's unique ability is, by far, the most important January 1970. and powerful of the three. This is its ability to integrate all of 63. M. A. Bramer and M. R. B. Clarke, "A model for the representathe various constituents of the entire manufacturing process tion of pattern-knowledge for the endgame in chess,"Int. J. Maninto a system-a system that can, because of the first two Machine Stud., 11, 635-649 (1979). elements discussedabove,be flexibly automated and moment64. I. Bratko and D. Michie, A Representationfor Pattern-Knowledge in Chess Endgames, in M. Clarke (ed.;, Aduances in Computer by-moment optimized as a whole. This powerful ability of the computer to function as a systemstool therefore results, in the ChessVol 2, Edinburgh University Press, Edinburgh, pp. 31-56, caseof its application to manufacturing, in what is called the 1980. CIM system (1). 65. H. J. van den Herik and I. S. Herschberg,"The construction of an The CIM system is a closed-loopfeedback system in which (1985). omniscient endgamedatabase,"ICCA J.,8(2), 66-87 prime inputs are product requirements (needs)and prodthe 66. K. Thompson,Private Communication, Bell Laboratories,Murray uct concepts (creativity) and the prime outputs are finished Hill, NJ, July 1985. 67. M. Campbell, A Chess Program that Chunks, Proceedingsof the products (fully assembled,inspected, and ready for use). It is Third National Conferenceon Artificial Intelligence, Washington, comprisedof a combination of software and hardware, the elements of which include product design (for production), proD.C., pp. 49-53, August 1983. 68. H. Berliner and M. Campbell, "IJsing chunking to solve chess duction planning (programming), production control (feedpawn endgam€s,"Artif. Intell.,23(1), 97 -t20 (1984). back, supervisory, and adaptive optimizrng)t production equipment (including machine tools), and production pro69. R. Fine, Basic ChessEndings, David McKay, New York, 1941. 70. E. W. Kozdrowicki and D. W. Cooper,COKO III: "The Cooper- cesses(removal, formirg, and consolidative).It is amenableto being realized by application of systems engineering and has Kozdrowicki chessprogram," Int. J. Man-Machine Stud., 6, 627the potential of being fully automated by means of versatile 699 (Le74). 7I. D. Wilkins, Using ChessKnowledge to ReduceSpeed,in P. Frey automation and of being made fully self-optimizing (adap(ed.),ChessSkill in Man and Machine,Znd ed., Springer-Verlag, tively optimiztng)i the present major resourcesfor accomplishNew York, pp. 21L-242, 1983. ing this are the computer-related technologies. The general conceptof this system is shown in Figure 1. In 72. J. Pitrat, "A chesscombination program which usesplans," Artif. Intell., 8(3), 275-321 (1977). this characterization of the system five main elements are 73. J. Schaeffer, Experiments in Search and Knowledge, Ph.D. The- shown, represented by the five boxes. There is nothing hard and fast about this particular characterization of the elements sis, IJniversity of Waterloo, Waterloo, Canada, May 1986. 74. M. Newborn, OSTRICH/P-A Parallel Search Chess Program, of the manufacturing system. The important concept to recogTechnical Report, SOCS 82.3, Computer Science,McGill Univer- nize is that all the types of activities, equipment,and processes sity, Montreal, Canada, March L982. represented by the terms in the boxes are, and must be, an 75. T. A. Marsland and F. Popowich, "Parallel game-tree search," integral part of any manufacturing system that is to be auto-
172
MANUFACTURING COMPUTER.INTEGRATED
mated, optimrzed, and integrated by applying the computer to these tasks if the full benefits of CIM are to be realized. The secondpoint to note in Figure 1 is that the CIM system is a closed-loopsystem. In other words, data and information relative to what is happening downstream in the system must be fed back upstream constantly and in real time in order to continuously condition the operations and activities going on there. Without such feedback,on line, real-time optimization and integrated, coordinated, flexible automation becomeimpossible. Two of the more critical feedback loops (labeled Cost and capabilities and Performance) are included in the figure to illustrate the general nature of the two types of data and information that must be fed back to provide overall flexible automation and real-time optimization. Obviously, all data and information originating within any of the elements of the system must be able to be fed either forward or back to any of the other elements of the system where it is required. This generic concept of the CIM system provides guidance for the ongoing development and implementation of full computer-automated and computer-optimized manufacturingcollectively called CIM. It should be recognrzed,however,that as yet full computer-integrated manufacturing has not been reahzed in practice anywhere in the world. Although at this stage it has been possibleto integrate someparts of the system with each other, the technology is not yet sufficiently advanced to accomplish overall closed-loopintegration of the total system from conceptualdesign of the product to its delivery in finished, ready-to-useform. In particular, the greatest difficulty is being experiencedin accomplishingclosed-loopintegration of the engineering design of the product with the remainder of the system.
examplesof these will provide a flavor of the developingpossibilities. In the field of computer-aideddesign (CAD) (qt) for production, Gero Q) has developed methodology for modeling both single objects and assembliesof objects by use of knowledge engineering techniques of first-order predicate logic (qv) implemented via PROLOG (see Logic programming). Even though a restricted domain has been used in his initial work, it makes evident the uniformity and power of the approach. The field of production planning (seePlanning) has seenthe greatest activity so far, with most of that directed to processand-operationsplanning. Darbyshire and Davies (3) have under development a hybrid expeftlalgorithmic processplanning system for turned parts called EXCAP. Initially, it combined an AL/X-like (later replacedwith a PROLOG-like)expert system with a recursive planning algorithm. This already has shown considerablepromise for realization of truly generative processplanning. Miladii and Kalajdlie (q have developedthe underlying theory of the utilization of expert systemstechnology in process planning, considering it to be the functional basis for long-term logical structuring of overall manufacturing process design. Triouleyre (5) has developed an expertplanning for systems-typeapproach to process-and-operations forming and welding operations. It takes into account the structure of the data concerning process,product, and processirg, thus enabling decisionsto be arrived at readily concerning both the choice of processand the detailed operations required. Further, the rules contained in the knowledgebaseare found to be an efficient aid in the design of the product to determine its easeof production. Zdeblick and Barkocy (6) are evolving an intelligent module for detailed operations planning for machining operations that, as it matures, will find its way onto the production equipment itself through the medium Roleof Al in CIM of intelligent control systems.There it can perform such funcThe fact that full CIM has not yet been reahzed in practice tions as making tool selections,cut selections,and speedand anywhere in the world is in large part due to the fact that the feed decisionsin real time just prior to actual machining of a workpiece. Preiss and Kaplansky (7), by encodingknowledge CIM system is not yet an intelligent system. At this stage AI in the form of expert systems (qv) technology is beginning to of the milling processinto a computer program using princibe developedand experimentally applied to certain elements ples of AI, have produced a system that automatically writes of the system.As y€t, none of these developmentsand experi- part progTams to mill successfully 2*-dimension parts on a mental applications appear to have been fully reducedto prac- three-axis numerically controlled milling machine. Finally, tice. However, they are exhibiting considerablepromise.Some Iwata and Sugimura (8) have developed,in prototype form, a
P E R F O R M A N(CCEA M )
PRODUCT Dtsl0t{
Gon PRODUCII
(cAD)
PR,ODUGTIOT PR0DUCTl0ll I0 il PRoDUCT rQutPiltilT O ON T R O L PtAilil mG (rl{ct-uDlll0 (PROGR,Af$- [f TIDBAOK, t AcHltt $UPTRYISOR ri tilG) T00 Ls) AD A PITV T lIe) 0PTttfilz
cosr
-Ail
PRODUCTION PR0Ct$Sts (RtNt0vA[, r0Rilrtc, 0 0 l l s 0 uDATIVT)
l"lTlts D CAPADI
IIEEDS (PRODUOT RtQUrRtlAtxTs) GREATIVITY (PRoDucT00xctPT$) Figure 1. The CIM sYstem.
fliltsHtDPnODucTs (ruttY AsstlsDl,tD/ AN D, I1{SPTCTTD RTADYTOR U$T)
MANUFACTURINC COMPUTER.INTEGRATED
knowledge-based computer-aided process-planning system that determines, from the CAD model of a part, the sequence of machine tools required to produceit. The knowledge baseof the system includes a set of rules describing preferencerelations among the machining processes. The field of production control has considerablyless activity in the application of AI, with much of it devoted either to schedulingor processcontrol. For example,Bourne and Fox (9) have created an AI system, called ISIS, that has been used successfullyto schedulejobs at the shop floor level in a factory. Ouchi, Mibuka, Kouzuki, and Taguchi (10) have developed and implemented an integrated AI system for controlling the sequencing and processing in the robotic assembly of color television sets by L1 robots (seeRobotics).CAD data from a higher level computer is automatically transformed to control the robots. The field of production equipment and production processes has also seenrelatively small activity in the application of AI, with much of it being devotedto monitoring of machine performance and diagnosis of machine malfunctions. For example, Bel, Dubois, Farreny, and Prade (11) have investigated the possibilities for application of AI to satisfying the need for efficient and flexible monitoring systems in fully automated manufacturing systems.They find that AI methodologyis very well suited not only to creating effective monitoring systems capable of dealing with the imprecise terms in which triggering situations are expressedbut also to detection of unpredicted events, the specification of error recovery strategies, and the planning of job input sequences.Bourne and Fox (9) have presenteda rule-basedarchitecture, called PDS and written in the Schema Representation Language, for the on-line, real-time diagnosis of malfunctions in machine operations.Diagnosis is based on information acquired from tens to hundreds of sensors,which is analyzed to gracefully account for sensor degradation over time as well as spurious readings. The total system of manufacturing is also now being anaIyzed to define the role that AI can be expectedto play. Merchant (L2) has analyzed existing Delphi-type technological forecasts on the future of manufacturing to determine their implications for utilizatton of AI in manufacturing systems. He identified three main thrusts for the future. The first of these is expectedto be toward the application of AI to accomplish full utilization of the product definition databasegenerated by CAD as the primary sourcefor automatic generation of all the information required throughout the rest of the system of manufacturing. The secondmain thrust is expected to be toward the application of AI, in conjunction with pattern recognition (qt) techniques, to accomplish full automation of all production activities carried on throughout the system of manufacturing. The third main thrust is expected to be toward application of AI to accomplishoverall on-line adaptive optimization of advancedmanufacturing systemsand their components. Hatvany (13) has been conducting researchon appropriate approachesto the architecture of overall manufacturing systems that are conducive to maximum effectiveness. As a result, he has concluded that during the past 30 years the thinking about complex computer-controlledsystemshas been conditioned by the conceptsof hierarchical structures. However, recent advances in distributed computing power and open system architectures (particularly local-area networks) have opened the way for heterarchic structures. He finds
173
therefore that basedon the incomplete and nonalgorithmic architecture specification that ensuesfrom this approach,these systems wilt have to exercise a high degree of local intelligence to cope with unforeseen situations. However, the greatest promise and potential impact of AI for the overall CIM system relates to the fact that the system of manufacturing (despitethe best strivings of the engineering profession to arrive at fully deterministic methodologies)can never be a totally deterministic system. The system must always have interfaces with nondeterministic elements of the real world. These include human beings, who are often far from logical or free of emor in their performance, and the economic,social, and political systems of the world, with all their vagaries. Further, as pointed by Hatvany (14), the system of manufacturing, even within a given manufacturing company, involves such an overwhelming welter of variables, parameters, interactions, activities, flows of material and information, and so on that either a detailed, explicit algorithm available for each solution procedure or all the facts, mathematical relations, and models available in perfect arrangement and complete form for a deterministic (and unique) answer can never be found. What are required then, as he indicates, for realtzation of the full potential of CIM are intelligent manufacturing systemscapableof solving, within certain limits, unprecedented,unforeseenproblems on the basis even of incomplete and imprecise information. The technology of AI must advance considerably in capability to carry out the kinds of inference and even intuition that personsnow use to overcomethe problems arising from the nondeterministic nature of the overall manufacturing system before that potential can be significantly realized. As AI technology advances,however, integration of that advancing capability into the CIM system can assure tealtzation of the dramatic improvement of manufacturing productivity and quality that CIM technology can provide.
BIBLIOGRAPHY 1. M. E. Merchant, "The future of batch manufacture,"Philos. Trans.Roy.Soc.Lond. 4275,357-372(L973). 2. J. S. Gero,"Objectmodellingthrough knowledgeengineering," Proc.CIRP Sem.Manufact.Syst.14, 54-62 (1985). 3. I. Darbyshireand B. J. Davies,"EXCAP,an expertsystems'approachto recursiveprocess planning,"Proc.CIHPSem.Manufact. Sysf.14,(1985). 4. V. R. Miladidand M. Kal ajdLi6,"Logicalstructureof manufacturing processdesign: Fundamentals of an expert system for manufacturing processplanning," Proc. CIRP Sem. Manufact. Syst. 14, (1e85). 5. J. Triouleyr€, "Elaboration of expert system knowledge based structure," Proc. CIRP Sem. Manufact. Syst. 14, (1985). 6. W. J. Zdeblick and B. E. Barkocy, "Manufacturing planning evolution with artificial intelligence with applications toward machining operations,"Proceedingsof the PROLAMAT 6th International Conference,Association Frangaise pour la Cyberndtique liconomique et Technique,Paris, pp. 99-108, 1985. 7. K. Preiss and E. Kaplansky, "Automated part programming for CNC milling by artificial intelligence techniques," J. Manufact. Sysf.4, 51-63 (1985). 8. K. Iwata and N. Sugimura, "A knowledge based computer aided process planning system for machine parts," Proc. CIRP Sem. Manufact. Syst. 14, (1985).
174
COMPUTERSYSTEMS
9. D. A. Bourne and M. S. Fox, "Autonomous manufacturing: Automating the job shop," Comput. Mag. 17(9),76-88 (1984). 10. T. Ouchi, M. Mibuka, K. Kouzuki, and K. Taguchi, "The intelligent production control system for color TV assembly process," Proc. CIRP Sem. Manufact. Syst. 14, (1985). 11. G. Bel, D. Dubois, H. Farreny, and H. Prade, "Towards the use of fvzzy rule-based systems in the monitoring of manufacturing processes,"Proceedingsof the PROLAMAT 6th International Conference,AFCET, Paris, pp. 109-119, 1985. L2. M. E. Merchant, "Analysis of existing technologicalforecastspertinent to the utilization of artificial intelligence and pattern recognition techniques in manufacturing engineering," Proc. CIRP Sem. Manufact. Syst.L4, 11-16 (1985). 13. J. Hatv&ry, "Intelligence and cooperation in heterarchic manufacturing systems," Proc. CIRP Sem. Manufact. Sys/. L4, 5-10 (1e85). L4. J. Hatvany, "The efficient use of deficient information," Ann. CIRP 32, 423-425 (1983). M. E. MpncHANr Metcut ResearchAssociates.Inc.
SYSTEMS COMPUTER Computer systems is an area of computer science that addressesthe integrated functioning of computer componentsas a single entity. These componentsinclude hardware, such as processors,memories, peripherals, and communication networks, and software, including operating systems, compilers, communication protocols, and application progTams. This entry discussescomputer systemsdesignedspecifically for AI applications. Artificial intelligence programs contain knowledge,consisting of objectsof someproblem domain, their properties, and relations between them. Further, progTams contain operations on the knowledge in semantic nets, for example, pattern matching, resolution in logic systeffis,and inheritance. Most computers manufactured today are based on a general-purpose von Neumann architecture. The architecture is general purpose in the sense that it may be progTammedto solve a variety of application probleffis,ranging from scientific to business to AI. The basic von Neumann architecture consists of two major parts, a memory and a processor.The memory contains a program and data operated on by the program. The processor constantly fetches and executes instructions from memory. Instructions generally specify an operation and one or more operands,or data. For example, if the operation is addition, the three operandsneededare the memory locations of the two addends and the place to store their sum. The general-purposevon Neumann computer can execute AI programs by mapping the knowledge in the AI program to its linear memory and simulating AI operationsby arithmetic and logic operations. However, this is often costly. The price is complex systemssoftware (compilers,interpreters, the operating system) neededto do the mapping and an execution speed penalty becausethe operations are simulated. There are several reasonswhy the von Neumann architecture is poorly suited to AI applications. First, the fastest von Neumann computers are optimized for fast arithmetic on single floating-point numbers or vectors containing lists of floatittg point numbers. However, this rarely is important to an AI program, which may spendmost of its time manipulating com-
plex data structures, such as lists, graphs, and sets. Further, the content of these data structures may be symbolic, not numeric, requiring rapid comparisonand pattern-matching operations. Another argument against the von Neumann architecture is the need for parallelism in AI. Parallelism is the simultaneous execution of two or more hardware operations,which is faster than performing the operations one after another. Many larger von Neumann computers use a form of parallelism in the processorcalled "pipelining," in which each instruction is decomposedinto several smaller steps. Execution of a sequence of similar instructions may then be overlapped,much as the steps in a factory assembly line are overlapped. This form of parallelism is used in the Symbolics 3600, a LISPbased computer discussedbelow. However, it is generally believed that future technological advancesin single-processorvon Neumann computers are unlikely to produce a computer fast enough to meet the demands of AI applications.For example,Hillis (1) observesthat memory/processordivision of the von Neumann architecture was appropriate for computers manufactured using expensivevacuum tubes for the processorand slower, cheaperdelay lines or storage tubes for the memory. However, today silicon is used to fabricate both memory and processor.Further, the processor occupies 2-3Vo of the silicon area, and the memory occupies most of the remainder. Becauseonly one memory location is active at a time, the bulk of the silicon area is idle most of the time. Finally, Hillis argues that future technologicaladvances that increase the density of circuitry in silicon will only increasethe mismatch in processorand memory power, making the computer less efficient. This mismatch is labeled the "von Neumann bottleneck." This suggests that parallelism, using multiple processorswith smaller memories, can more effectively use the same amount of silicon. The connection machine, which is also discussedbelow, adoptsthis philosophy. A computer system designedspecifically for AI programs is referred to as an Al-based computer system. Its potential advantages over a von Neumann computer are simpler systems software and increased performance through hardware organized specifically for AI programs. Yet it is not a generalpurpose computer, limiting its use largely to AI. Until the last decadethere were no Al-based systems.Artificial intelligence applications were usually written in LISP and run on a von Neumann computer system. Several events have led to the design, and in a few casesthe construction, of AIbased computer systems. Languagessuited to AI deArtificialtntelligenceLanguages. (e.g.,PROLOG)(2). languages logic and veloped,such as LISP These languag€s,unlike FORTRAN, COBOL, PLl1, and Pascal, did not presupposea von Neumann architecture and were not based on assigning values to variables in memory. Implementing these languages raises several questions.First, what computer architecture most naturally executesthem? Second, how can parallelism enhance their performance? Knowledge Representation Knowledge representation refers to the technique by which information or the relation between objectsfrom an application problem domain is represented so that it can be processedby a computer. Several knowledge representation paradigms in AI systemshave been developed:semantic networks, first-order logic, and frames (3). Their implementation raises two questions.First, what memory architectures efficiently store these paradigms? Second,
SYSTEMS 175 COMPUTER what operations on the knowledge, such as inheritance in semantic networks and resolution in first-order logic, should the hardware support? Very LargeSca/e Integration(VLSI)Technology.The emergence of VLSI technolory diminished the cost of fabricating computers becauseone VLSI chip can accommodatean entire processor.Researcherscan now experiment with architectures not based on the von Neumann model. Given the variety of AI languagesand knowledge representation paradigms and the short history of VLSI, no standard computer architectures for AI have emerged. Consequently, the bulk of this entry informs through three examples of AIbased computer systems that emerged from the three events cited above. The examples are 1. a variety of ventures in the Fifth-Generation Project (qv), based on logic languages and knowledge representation methods; 2. the connection machine, which can be configured to reflect the knowledge representation; and 3. the Symbolics 3600 LISP machine, an outgrowth of early attempts to apply VLSI to LISP. Al VersusConventionalPrograms The key issues that arise in the design of an Al-based computer system are summarized in Table 1. This section distinguishes between conventional (e.g., business and scientific) programs and AI programs to motivate these issues.They are also used to unify the discussionof the example architectures. A conventional program has three components:data, control, and a user interface. In contrast, many AI prograhs, such as expert systems,consist of three parts: data, knowledgebase, and control strategy (4). An AI program also has a fourth component,the user interface. The data represent current information during program execution as well as the declarative knowledge of the problem domain, often as semantic networks, frames, or first-order logic. The knowledge base is a set of "pattern-invoked programs" or operators used to reason with the declarative knowledge. The control strategy decideswhich knowledge base operator to apply when more than one are simultaneously applicable.
Table 1. Summary of Issuesin AI-Based Computer Systems Data cornponent Hardware support for knowledge representation (e.g., semantic nets, frames, and first-order logic) Knowledge base cornponent Hardware support for operations (eg, pattern matchirg, unification, resolution, property inheritance) Control strategy cornponent Hardware support for parallelism Human interface cornponent Meeting real-time performance needs AII components Hardware support for storage management Hardware support for dynamic data typing Hardware support for generic operations Memory management Processor scheduling Instruction set level
The data in a single conventional program usually combine many diverse data structures, such as arrays, stacks, and linked lists. However, an AI program represents its data, mostly declarative knowledge, in a single-form-usually semantic networks, frames, or first-order logic. The frequent use of these three structures may justify their implementation using special hardware. This raises the issues of providing special hardware for the efficient representation, access,and modification of semantic networks, frames, and first-order logic. Absent from the control portion of a conventional program are a variety of operations that the knowledge base must perform. For example, it may use pattern matching to determine which pattern-invoked programs or operators to apply. Additionally, each knowledge representation requires a set of operations, such as unification, resolution, and property inheritance. The issue raised by the knowledge base is which operations to complement the arithmetic and logic operations of conventional computers should an Al-based computer system provide in hardware? The control stratery component in an AI system may use one of a variety of techniques, for example, state spacesearch, propagation of constraints, or problem reduction (4). A von Neumann computer executing the control stratery of an AI program must choosea strictly sequential order in which to apply the knowledge base operators. However, an Al-based computer system can improve performanceby applying multiple operators in parallel. Because the control strategies are well defined and the benefit of parallelism is great, it is reasonable to design the architecture to support the control strategy parallelism. This raises the issue of organizing a computer system to allow parallel execution of operations. Several resourcemanagement problems arise. One is scheduling a large set of applicable operators on a smaller set of processors.A secondis minimizing contention of simultaneous operatorsfor the declarative knowledge through multiported memories and replicating the knowledge in multiple memories. The human interface of a conventional program and an AI program may be equally complex; both may require humanoriented input and output through speech,natural language, and pictures. These require additional hardware components, for example, speech synthesis requires voice digitizing and high-resolution graphics. The major issue raised is the design of the user interface to provide in real-time interaction in the form of dialogues. In addition to these issues, there are additional ones common to all four components: storage management, dynamic data typing, memory management, processorscheduling, and the instruction set level. StorageManagement.LISP and PROLOG manage storage automatically. In contrast, the programmar must manage storage in conventional languages such as PL/L and Pascal. The issue raised is how can the hardware support automatic storage management, such as garbage collection. One solution is to use a separate processor to collect garbage in parallel with a secondprocessorrunning a LISP prograffi, as analyzed in Ref. 5. Dynamic Data Typing. LISP and logic languages automatically keep track of data object types and simplify programming by providing generic operations,which work on any data type. In contrast, most commercial von Neumann computers
176
CoMPUTER sYsTEMs
provide typed operators in their instruction repertoires. For example, there may be different operatorsto add floating-point numbers and to add fixed-point numbers. To execute an AI language on a commercial von Neumann computer requires simulation of generic operations by a sequenceof instructions in the object code or interpreter. These select the correct instruction to use based on the current type of the operands. They must also update the type of variables when assigned new values. The issue raised is how the memory and instruction sets can be designed to support dynamic typing and generic operations. The Symbolics 3600, describedbelow, provides generic operations in its instruction set. Scheduling.LISP AI proMemory Management/Processor grams generally have large working sets becausetheir memory-referencing pattern is less predictable than conventional programs. For example, traversing a list scatteredthroughout memory requires accessinga few words from many pages. Much of the early implementation of AI programs were on PDP-10 computers, which have limited memory. Unpredictable referencing patterns reduced the effectiveness of virtualmemory management, so that these systems depended on swapping to allow multiprogramming. Since swapping large working sets is time consumirg, their schedulers gave each processlarge time quantums to reduce memory management overhead.This philosophy contrasts with machines executing business or scientific applications. These programs tend to have working sets small enough to be kept in memory while a processwaits for its turn to use the processor. lnstructionSet Level. Either the instructions are low level, such as in a reduced-instruction-setcomputer (RISC), requiring simple hardware and extensive emulation in software to implement flexibly the operations of the programming langgage, or the instructions are high level, simplifying the software, complicating the hardware, and making the implementation rigid but permitting optimized execution. Because system software can implement AI progTamson a
von Neumann architecture, special AI computer systems are not mandatory. However, an Al-based architecture permits optimi zation, for example, through parallelism, unattainable through software emulation. Given the tremendousprocessing requirements of the most complex AI applications, some implementation in hardware is needed. To support this, Shapiro (6) cites some performance statistics of existing implementations of PROLOG: 120 logical inferencesper secondGIPS), or a Z,80microprocessorrunning Microprolog; 1000-3000 LIPS for a C interpreter on a VAX computer; 25,000 LIPS on a large IBM machine; and 30,000 LIPS, the fastest available today, for compiled codeon a DEC System 2060. One LIPS requires about 100-1000 instructions per secondon conventional architectures. This contrasts with the goals of the JapaneseFifth-Generation Project (described in the next section)of 108-10eLPS. Examplesof Al-BasedComputerSystems As noted earlier, three events motivated the development of Al-based computer systems: AI languages, knowledge representation, and VLSI. This section discusseshow three examples emerging from these events addressthe issuesof Table 1. The discussionstarts with a commercial product and endswith research machines: the Symbolics 3600 LISP machine, the connection machine, and the Fifth-Generation Project. Several other Al-based computer systemsexist commercially or in researchlabs (seeTable 2). Symbolics3600. The Symbolics 3600 is the most recent LISP machine producedby Symbolics,Inc. (7). The 3600 is an outgrowth of the MIT Laboratory LISP Machine Project started in L974. Two machines, called CONS, in 1976, and CADR, in 1978, were developedat MIT. Symbolics,Inc. refined the CADR into a commercial product and introduced it in 1981 as the LM-2. Its successoris the 3600. In contrast to the next two examples, the 3600 uses a von Neumann architecture, with extensive hardware support for
Table 2. AI-Based Computer Systems Source
Name
LISP Symbolics3600 (7) LAMBDA family Scheme79 (8) ALPHA (9) EM3 (10) Logic languages PROLOG processor(11-13) Personal-SequentialInference_Machine( 14,lb) Parallel-InferenceMachine (15) PRISM on ZMob (16,17) Production systems DADO (18) Production system machine (19) Application structure Connection machine (1,20) Logic knowledge base DELTA QI) GRACE (22)
Symbolics, Inc., Cambridge, MA LISP Machines, Inc., Los Angeles, CA Massachusetts Institute of Technology (MIT) Fujitsu Laboratories, Ltd., Kawasaki, Japan Electrotechnical Laboratory, Ibaraki, Japan SRI International, Menlo Park, CA Institute for New Generation Computing Technology (ICOT), TokYo, JaPan University University
of Tokyo of Maryland
Columbia University Carnegie Mellon UniversitY (CMU) Thinking
Machines Corp., Cambridge, MA
ICOT, Tokyo, Japan University of Tokyo
COMPUTERSYSTEMS
LISP. AII software for the 3600 is written in a dialect of LISP a called Zetalisp. Even though the LISP codeis compiled into is available' machine .ode, normally no assembly language This discussion of the 36-bit processoris divided into two parts: LISP features and performance features. The first describes four architecture aspects that reflect the LISP language: tagged words for run time type checking, compact list 2. Normal list representation storage, generic operations, and the instruction set. The sec- Figure pointers. using overcome to designed aspects ond part describesarchitecture the Lxecution time inefficiencies associated with a weakly Three major performance bottlenecks the 3600 addresses typed language: parallelism, buffered stacks, and the stacklike architecture. are due to garbage collection, run time typing, and the von Becausethe data types of LISP objects cannot usually be Neumann bottteneck of limited memory bandwidth described determined at compile time, the type of an object is traditionabove. ally stored in a descriptor associated with the object and upThe first two are addressedthrough hardware parallelism. dated during execution. To save storage space,increaseexecu- The 3600 performs the following operations in parallel: fetchtion speedthrough reduced memory fetches, and simplify the ing instructions, decoding instructions, executing instruccompiled code, each word processedis associatedwith a tag tions, checking the data type of operands,supporting garbage field-.The tag identifies the work as one of 34 types, such as collection, and tagging the data type of the result. symbols, "cons" cells, or arrays. The four memory word forThe limited memory bandwidth is addressedby two means: mats are shown in Figure 1. The first two bits of the data type Reducing Memory Fetches.This is done in three ways. First, (or tab) field identify the word as containing a 32-bit immedi- each memory word contains two 17-bit instructions. Second,a ate fixed-point or floating-point number or indicate that the tag rather than a separate memory word stores specific data next four bits are further data type bits followed by an object types. Third, the compact list representation previously deaddress. scribed is used. Stack. The two high-order CDR code bits are used to compactly UsedMemory Wordsin High-Speed StoringFrequently store lists. They encodethe values "norm aI:' "next," and "nil'" Conventional computers often use a small, expensive, highIf the CDR codeis "next," the CDR of the list is the next word speed memory, called a cache, to store frequently referenced in memory. This savesspacefor representing its address.The instructions or data. However, LISP programs often have an final non-nil element in a list has its CDR-codeset to "nil" to irregular referencepattern, making a cacheless effective.The indicate that its CDR field is nil without using an extra word. 3600 usesthe recursive nature of LISP as the basis for predictFigures 2 and 3 compare this representation to the normal ing memory referencing patterns. It uses two 1024-wordhardone. The processorhardware is designedto operate efficiently ware stack buffers as a high-speedcachecontaining the top of on these lists. the LISP control stack. The tags also simplify another aspect of the architecture: Table 3 summarizes aspectsof the 3600 architecture that generic operations. A generic operation works on an operand addressthe issues describedin Table 1. of any data type. Conventional von Neumann computers, in contrast, require a set of instructions each performing the ConnectionMachine. The other two computer system examsame operation but for a variety of data types. For example, ples (the connection machine and the Fifth-Generation Projthe 3600 has a single generic-addinstruction that works on a ect) reflect the programming language used: LISP or logic variety of operands.Its execution involves simultaneously per- Ianguages. The connection machine, however, reflects the forming an integer add and checking the data type by hard- knowledge representation used. It provides tens of thousands ware. If the operands are not integer, & trap to microcodeoc- to millions of processorswith a programmable interconnection curs to perform an addition on less frequently used data types. topology that may be configured to match the application proThe hardware also checks for overflow, which generates an- gram structure. other trap, and tags the results with the proper data type. The The architecture is suited to applications that display a advantages are compact code,becauseonly one instruction is natural form of parallelism. Such applications consistprimarrequired, and a performance improvement from simultaneous ily of a large number of elements that function similarly and operations. communicate with one another through somespecialinterconThe instruction set is another aspect that intimately re- nection topology. For example, in a machine vision application flects LISP. For example, three classes of instructions are the elements are pixels, and the interconnection topology is a predicates, containing €g, not, fi*p, floatp, symbolP, and arrayp; list and symbol instructions, containing ear, cdr, and rplaca; and array instructions, containing array-leader and store-array-leader. A CDRnext B C D Rn e x t c C D Rn i l 2232
28 C D Rtyp e I
d a ta typ e
I
Pointer
FigUre 1. Two formats of Symbolics 3600 memory words.
Figure 3. Compact list representation of the Symbolics 3600, which uses sequential memory locations and two special bit patterns (CDR next and CDR nil) in the high-order two bits.
178
COMPUTERSYSTEMS
Table 3. How the symbolics BG00Addresses the Issues Data component: Hardware supports storage of LISp lists by reserving two high-order bits (seeFiS. 1) for use in a compact list representation (seeFigs. 2 and B). In the representation, lists are stored in sequential memory locations to avoid the need for pointers. Knowledge base component: No explicit support. Control strategy component: No explicit support. Human interface component: High-resolution terminal and a mouse-pointing device are provided. All components Storage management: Hardware provides support for garbage collection in parallel with other operations. Dynamic data typing: Two or six bits in memory words are reserved for a tag to denote 1 of 34 types so that the hardware can distinguish, e.g., strings from complex numbers (seeFig. 1). Generic operations: All instructions are generic, working on all appropriate data types. At run time the type of instruction operand is determined. Parallelism in data type checking, instruction execution, and result tagging reducesperformance penalty. Memory management: Three cachesare used to avoid a memory bottleneck. One is for instructions and the others are for control stacks. The virtual memory mechanism keeps the local environment of active processesin the stack caches automatically. Instruction set level: The instructions reflect LISP operations.For example there are instructions for the predicateseq and not as well as the list operations car and cdr.
grid. In a VLSI simulation the elements are transistors, and the interconnection topolory represents the wires connecting the transistors. The connection machine represents each application problem cell by a single-bit processorand logically connects the processorto other processorsin a manner matching the application topology. The connection machine was proposedin 1981 at the MIT AI Laboratory (20). The connection machine describedin Ref. 1 and discussedhere is now being manufactured by Thinking Machines, Inc. of Cambridge, Massachusetts.The prototype consistsof 65,536 physical processors,each with 4096 bits of memory. These are physically connected in a "Boolean 16 cube," which is based on the fact that 16 bits are needed to address65,536 processors.This interconnectionhas the property that a messagesent from one processorreachesany other processorwithin 16 steps.Each processoris connectedto the 16 other processorswhose addressesin binary differ in 1 of the 16 bits. However, neither the number of physical processorsnor their physical interconnection limits the application program size the machine can handle. A program may reconfigure the machine by specifying how many virtual processorsare to be mapped to each physical processor. Furthermore, although physical processorsare physically connectedto only 16 other processors,every processorcan communicate with any other processorthrough routers. Each router is hardware responsible for receiving messagesfrom a processoror another router and forwarding the messageto the destination processor to a
C o nn e c t i o n M a c hi n e
Figure 4. Connectionmachineorganization.
router closer to the destination processor.Thus, a program may establish a logical interconnection among processors which matches the way the elements of the application program are connected. The overall machine organtzation is shown in Figure 4, The two major parts are the connectionmachine computer and the front end. The front end, which is connected to a disk and terminal, is either a Symbolics 3600 or a Digital VAX computer. The front end provides an operating system and user interface. The application program is also stored in the front end. Programs are written in extensions of LISP or C, called CM LISP and C*, respectively. Table 4. How the Connection Machine Addressesthe Issues Data component: Connection machine processorsare logically connectedto match the natural structure of the application. The hardware offers support for a variety of data structures, including three representations of sets as well as trees, strings, arrays, matrices, and graphs. Knowledge base cornponent: No explicit support. Control strategy cornponent: Each operation normally applies to all data in the connection machine in parallel. However, each processwill conditionally execute an instruction depending on the internal state of one of its flags. H uman interface cornponent: Not applicable, since the user accessesthe connectionmachine through a front end, which is a solitary Symbolics 3600 or VAX computer. AII components Storage management: Several storage allocation mechanismsare provided to allocate an idle processor.Among these is Free List allocation, which maintains a list of free processors.A much slower method is Waves, in which a processorbroadcastsa request to have any unused processorsend its addressback. In this way a processorcan find an idle processorthat is physically close to it, thereby shortening the communication time. Generic operations: All instructions operate on the entire network and thus apply to whatever data is in all processors. Memory management: Almost no management is done. Each physical processorhas 4096 bits of memory, part of which is a stack area. Processorscheduling: Several virtual processorsare simulated by each physical processor. Instruction set level: The prototype provides flexibility in definition of the instruction set becauseinstructions from the front end are expandedinto nano instructions. Processors receiving the nano instructions implement all possible 256 Boolean operations on the three bits of data they operate on.
COMPUTERSYSTEMS
179
Table 5. Fifth Generation Research Projects
Languages KLO (14,15)
KLl (14,15) PARALOG (14) Machines Personal sequential inference machine GSI) (14,15)
Description
Topic
Name Sequential
machine language
Parallel machine language Parallel logic language Inference machine
Parallel inference machine (PIm (15)
Inference machine
DELTA (zT)
Knowledge base
GRACE Q2)
Knowledge base
A program executesin the following manner. First the connection machine is configured, that is, the program specifies the number of virtual processorsit needs.Next it specifiesthe initial state of each virtual processor,consisting of two things: pointers to the processorsit is connectedto and whatever data the processorneeds.Next the front end executeseach instruction in the program. Instructions are either serial or parallel; serial instructions are performed by the front end, and parallel instructions are passed to the connection machine microcontroller. The microcontroller expands each instruction into a sequenceof "nano instructions," which it broadcaststo all processorsin parallel. Parallel instructions tell each processor either to compute locally or to pass information to another processor. Each processor functions by reading two single-bit operands from its 4096-bit memory and one bit of its internal flag register. Its arithmetic logic unit generatestwo bits; one overwrites an operand, and the other overwrites one bit of its flag. The connection machine hardware is visible to the CM LISP programmer in two means. First, data to be operatedon in parallel in the connectionmachine are stored in a new data
The Kernel language is based on PROLOG and contains operating system primitives' such as multiprocess control, interrupt handling, and input-output control. A parallel version of KL0. Based on PROLOG. This initial machine will be used by researchers while they develop future machines. It is a single-user machine supporting unification, resolution, and performance measurement in hardware with a performancegoal of 20,000-30,000 LIPS. Hardware supports OR, AND, and unification parpllelism. In OR parallelism several machines execute multiple statements with the same goal. AND parallelism tries to achieve multiple subgoalsof a statement in parallel, presenting difficulties becausea consistent choice of arguments must be made to the dependent subgoals.Unification is the processof obtaining consistent instantiations of variables in multiple subgoals.Parallelism in unification involves generating several different instantiations at the same time. This connects to sequential inference machines via a local area network or shared memory. Its three main subsystemsare two-level memory, with moving and solid-state disks; relational database;and control processor, controlling concurrent transactions and interfacing database to inference machines. This is a relational algebra machine. A major research problem is using hardware to perform joins efficiently.
structure, called the xector. Second,two new program annotations, denoted a and B, are used to convert LISP functions to parallel operations in CM LISP. Overall, the connection machine has a raw computing power of a billion (10e)instructions per secondand a message routing speedof 3 billion bits per second. Table 4 summarizes how the connectionmachine addresses the issues of Table 1. Fifth-Generation Project. The Japanese Fifth-Generation computer system project is developing an architecture suited to logic programming. Warren describesthe background of the Fifth-Generation Project in Ref. 23 (seealso Ref. 24). The Fifth-Generation Project assumes that knowledgebased systems will be the important application area of the 1990s (25), in contrast to the evolution of distributed systems containing heterogeneousprocessorsand cooperating,distribTo build a knowledge-basedsystem,researchis uted processes. being conductedin three areas (14): problem solving and inference machines, knowledge base management, and intelligentuser interfaces.
180
COMPUTERSYSTEMS
Table 6. How the Fifth-Generation the Issues
Project Addresses
Data component: Designing relational databasemachines (i.e., DELTA, GRACE) based on logic. Knowledge base carnponent: Implementing inference (i.e., PSI, PIM) machines basedon various logic languages. Hardware supports unification and resolution. Control strategy cornponent: The parallel inference machine supports AND and OR parallelism in hardware. Human interface component: A major objective is to allow human interaction in the form of natural language, speech,and pictures. AII components Instruction set level: Machine languages KLO and KLl are based on logic. Other issues:Insufficient details are available.
The researchhas severalgoals (6). The most ambitious is to build in the 1990san inference machine with up to 1000 processingelements to function at a rate of 108-10eLIPS. The knowledge-basedmachine is planned to have a storage capacity of 1011-1012 bytes. The intelligent-user interface goal is a 10,000-word vocabulary, 2000 grammar rules, 997oaccuracy in syntactic recognition of natural languag€s, and a speech recognition system with 50,000Japanesewords and 95Vorecognition. Finally, the programming language, data description, and query language will use predicate logic (qr). Table 5 contains a summary of several language and architecture research activities that comprise the Fifth-Generation Project. Table 6 summarrzes how these activities addressthe issuesof Table 1. Summary
BTBLIOGRAPHY 1. W. D. Hillis, The Connection Machine, MIT Press, Cambridg", MA, 1995. 2. R. Kowalski, Logic for Problem Soluing, Elsevier North Holland, New York, 1979. 3. G. McCalla and N. Cercone, "Guest editors introduction: Approachesto knowledge representation," Compuler 16(10), L2-L8 (1e83). 4. D. S. Nau, "Expert computer systems," Computer 16(2), 63-85 (1983). 5. T. Hickey and J. Cohen, "Performance analysis of on-the-fly garbage collectior," CACM 27(11),1143-1154 (1984). 6. E. Y. Shapiro, "The fifth generation project-a trip report," CACM 26, 637-64r (1983). 7. Symbolics 3600 Technical SumtuaU, Symbolics,Inc., Cambridge, MA, February 1983. 8. G. L. Steele and G. J. Sussman,"Design of a LISP-basedmicroprocessor,"CACM 23(L1),628-644 (1980). 9. H. Hayashi, A. Hattori, and H. Akimoto, ALPHA: A High-Performance LISP Machine Equipped with a New Stack Structure and Garbage Collection System, Proceedingsof the 10th Annual International Symposiumon Comp.Arch., Stockholm,IEEE Computer Societypublication, Silver Spring, MD, pp. 342-348, 1983. 10. Y. Yamaguchi, K. Toda, and T. Yuba, A PerformanceEvaluation of a LISP-Based Data-driven Machine (EM3), Proceedingsof the 10th Annual International Symposium on Comp. Arch., Stockholm, pp. 363-370, 1983. 11. E. Tick, An OverlappedPROLOG Processor,Technical Note 308, SRI Int., Menlo Park, CA, October 1983. L2. E. Tick and D. H. D. Warren, Towards a Pipelined Processor, Proceedings of the International Symposium on Logic Programming,IEEE ComputerSocietypublication, Silver Spring, MD, pp. 29-4L, February 1984. 13. D. H. D. Warren, An Abstract PROLOGInstruction Set,Technical Note 309, SRI Int., Menlo Park, CA, October 1983. L4. T. Moto-oka and H. Stone, "Fifth-generation computer systems:A Japaneseproject," Computer L7, 6-L3 (1984). 15. S. Uchida, Inference Machine, Proceedingsof the 10th Annual International Symposium on Comp. Arch., Stockholm, pp. 4I0416, 1993. 16. S. Kasif, M. Kohli, and J. Minker, PRISM: A Parallel Inference System for Problem Solving, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence,Karlsruhe, FRG, 1983,pp.544-546. L7. C. Rieger, ZMOB: A Mob of 256 Cooperative Z8}a-Based Microcomputers, Technical Report TR-825, Department of Computer Science,University of Maryland, College Park, MD, L979. 18. S. J. Stolfo and D. E. Shaw, DADO: A Tree-Structured Machine Architecture for Production Systems,in Proceedingsof the Second National Conferenceon Artifi,cial Intelligence, Carnegie-Mellon University and University of Pittsburgh, Pittsburgh, PA, 1982, pp.242-250.
The design of computer systemsspecifically for AI applications is in its infancy. Consequently,almost no theory and just a few systems exist. Their design is motivated by computational needs that are so great they can only be filled by exploiting parallelism at the hardware level (6,20). From the examples discussedhere, several trends become evident. ArchitecEvolutionary yersus Revo/utionary Architectures. tures such as the Symbolics 3600 implement AI languages on a von Neumann model specifically designedfor the language. The alternative is to devise a more revolutionary architecture. The fifth-generation machines and connectionmachine exploit parallelism in logic programs and in the application problem, respectively. Orientation The three examples discussed here are ori19. T. Lehr, The Implementation of a Production System lVrachine, ented to either LISP programs, logic prograhs, or parallelism Technical Report, Carnegie-Mellon University, Pittsburgh, PA, in the application problem. May 1985. New Concepts.The Fifth-Generation Project tries to exploit 20. W. D. HiIIis, The Connection Machine, Artificial Intelligence AND, OR, and unification parallelism in hardware. The SymMemo No. 646, MIT AI Laboratory, Cambridge, MA, September bolics 3600 employs generic operations, a compact list repre1 9 81 . sentation, and parallel garbage collection support. The connec- 2L. K. Murakami et al., A Relational Data Base Machine: First Step tion machine allows the hardware to be configured to match to Knowledge Base Machin e, Proceedingsof the 10th Annual In' the natural structure of the application through massive parternational Symposium on Comp. Arch., Stockholm, pp. 423-425, 1983. allelism and programmable interconnections.
IN EDUCATION:CONCEPTUALISSUES COMPUTERS ZZ. T. Moto-oka, Overview to the Fifth Generation Computer System project , Proceed,ingsof the 1-0thAnnual International Symposium on Comp.Arch., Stockholm,pp. 4L7-422, 1983' Zg. D. H. D. Warren, A View of the Fifth Generation and Its Impact, Technical Note 265, SRI Int., Menolo Park, CA, JuIy L982' 24. K. Sorenson, "Fifth Generation: slow to Rise," Infoworld, 35 (June 9, 1986). 25. P. C. Treleaven, The New Generation of Computer Architecture, Proceedings of the 10th Annual International Symposium on Comp.Arch., Stockholm,pp. 402-409, 1983. A. AcnAwALAand M. AenAMS UniversitY of MarYland
IN EDUCATION: COMPUTERS ISSUES CONCEPTUAL It is futile to publish a factual overview of computersin education in a volume that will be read for more than a few years. The field is now at a watershed. Looking two years backward or forward give views as different as those to the east and west of the Rocky Mountains. A factual report based on data from the past would be obsoletebefore it got into the hands of the readers.On the other hand, an encyclopediais not the place for speculation and prophecy. So instead of choosingbetween obsolescenceand prophecies,what follows are several concepts that wilt help students of the situation follow the rapid changes of scene. Density Some obvious parameters do not get full recognition in the literature. One of these is the density of computersin learning environments. At the time of this writing (1985),averagedensity of computers in grade schoolsin the United States is about 1 for 60 or 70 students (estimates vary). But these machines are not evenly distributed. Roughly L07oof schoolshave no computers. A handful of large city school systemshave more than 1 computer per 30 students-which allows a student to have an average of t h of computer time a week. A handful of individual schoolshave a density sufficient for students to average an hour a day. Only a few experimental schoolshave more than this-the District Three Computer School in New York, the Hennigan School in Boston, and the WICAT School in Utah are prominent examples. Dlnsity is not merely a quantitative factor. Quadrupling the density does not mean four times as much of the same. What can be done with computers at different densities is qualitatively different. Three 10-min periods a week is a significant dose of drill and practice in number facts. It is not a significant period of time for the computer to be used as an instrument for creative writing. No one can create on such a tight schedule. Educationalldeology Differing visions of schoolswith high densities of computers raise the secondconceptual issue: educational ideology.A major line of cleavage separatespeoplewho seeeducation as ide-
181
ally being a highly structured processfrom those who seeit as controlled by tft. learner. This cleavage existed long before computers, tut their presence brings the controversy into much sharper focus. lnstructionversusDevelopment.One side leadsto the "back to basics" movement, with its emphasis on giving instruction in number facts, spelling, and the like. As students become more sophisticated, multiplication tables give way to quadratic una differential equations, and spelling is displacedby Shakespeare,but the emphasis on instruction remains unchangea. fftir philosophy of education places the brunt of re,pottrlbility on the teacher-to organize the material and tiansmit discrete packets of knowledge to the student. The other side shows up under names such as "child-centered education," "open education," and, in the extreme case' "free schools." The emphasis here sees the responsibility of schooling as the encouragementof the individual's overall development. The acquisition of particular factual knowledge is seen as an easy part of the processfor good learners and an impossibteone for poor learners. Here, the teacher'stask is to help students become better learners rather than teaching them that 5 x 7 is 35, not 57. Becoming a goodlearner builds on such factors as self-confidence, believing in oneself, and having the opportunity to work on things that one likes. It is undermined by unpleasant Iearning experiencesthat lead to "hating math" (or speUing) in particular, to hating school in general, and to becoming disgusted with the learning processitself. There are also many misleading "pop theories" about learning. For example, children often say that you learn best by making your mind a blank and saying "5 x 7 - 35" over and over again or by putting on the radio very loud-probably to drown out images of things the child would rather be doing. On the other hand, trying to relate the material you are learning to your interests and to things you already know might be a very good way of learning. There is a whole body of knowledge called mnemonics that suggeststhis might be true. So one can either concentrate on finding out how to get number facts into the headsof children who may be poor learners, or one can concentrateon what can be done to improve the learning ability of students so one doesnot have to take heroic measures to get them to learn the number facts. Structured,Metacognitive,and DevelopmentalPerspectives. The presence of computers has given rise to at least two schoolsof thought about how schoolscan help students become better learners. The first is rooted in ideas about cognitive processesthat stem directly from AI theory and research.In a nutshell, there is a growing tendency to believe that metacognitive knowledge helps one becomea good learner by providing explicit knowledge about the learning process and strategies for learning. The other approach puts a greater emphasis on a learner's self-directedactivities. Here, the goal is to create an appropriate learning environment in which the learner can come to grips with the essential problems and find a personal way of dealing with them, enter into relationships with other people, both teachers and other students, and developa more personal relationship with the knowledge being learned. The contrast between these two schoolsof thought stands
182
COMPUTERS lN EDUcATtoN:coNCEpruALtssuEs
out most clearly when one looks at three approachesto using the computer to improve a student's command of language. In the extreme case of the structured, instructional approach, there are many programs that posequestions about grammar, check the answer, and may also give feedback to the student. The metacognitive approach sees the main problem as the student's ability to structure-a story, for example-so there are programs that provide a framework in which the story can be mapped,drawing upon information about the "grammar" of story plots. The third and most developmental approach offers the student a good word processor.This frees the learner from the arduous-and for very young children, almost impossiblemechanics of writing text by hand. When writing is a laborious and slow process,the first draft is inevitably the final copy, and any corrections are messy ones. Being able to edit and print out clean copy provides the student, perhaps for the first time, with a product that can be looked at with pride, uncontaminated by the overall messinessof inexpert handwriting, scratched-outcomections,and smeared erasures. lnstruction Manuals, Videodisks,and Programs.The structured approach is obviously appropriate in situations that require quick learning of very specificmaterial. Manufacturers of appliances supply books of instruction on how to use their products. These books are expectedto enable people to learn specific facts quickly and reliably. Manufacturers of digital watches are not interested in promoting the general ability of their customers to the point where they could figure out for themselves how the watch works. When the "appliance" is complex-a task or a pieceof software, for example-videodisks are rapidly replacing printed books as the standard for these instruction manuals. The techniques being used to program these systems are still very much in a state of infancy and flux. A number of control, or author, languages do exist. However, it is quite clear that the art of producing such languages is not yet stable enough to warrant a detailed description of them. One class of instructional programs deals with learning how to use computer software: a word processoror accounting system, for example. These tutorial programs typically show something on the screenand invite the learner to manipulate the keyboard in appropriate ways and observewhat happens. They set problems:move the cursor down a paragraph, pick up this sentence,move it to the end, and so on. When a progTamis able to detect failures to carry out its instructions, it may also take appropriate action. In the simplest cases,it can insist that the learner try again; in more complex cases,it can give very elementary advice. Much of what is included explicitly in the schoolcurriculum can be presentedin an instructional form. The curriculum lays down that children should learn the multiplication tables, know how to spell words and to punctuate, learn historical dates, and so on. Instructional programs have been designed for eachof theseareas-as well as many others.They will be no less effective than books,flash cards,or any other sorts of drilland-practice technologies. However, the computer can also be an integral part of the educational process-rather than serving as a computerized instruction manual. Of course, the line is blurry. A student using a word processoras part of a creative writing classmight still require instruction on how to use it and might use the
same program that was designedfor teaching an office worker how to use the latest system. ComputerCulture But computerizedinstruction manuals of this sort are a minor instance of how computers are used in the school.The important issue is how the computer fits into the overall structure of education.The idea of creating an open,high-density learning environment in a schoolraises the third issue,the conceptof a computer culture. In the Computer School in New York, the teacher was explaining to an eighth-grade class the basics of the arrangement of electrons in the structure of atoms. His explanation of how the electrons are distributed among the successiveshells was done by saying "Let's write a computer program that tells the electrons how to move." By getting into the process of writing a computer program, these students found the concept of rules for distribution of the electrons very much more concrete. This instance could happen only becausethe teacher and the students already had experience in writing programs. It was familiar, part of their culture. The processof teaching often means relatin g a new idea to experiencesin a shared culture. The computer presencemeans a new and particularly rich sourceof referencepoints for a large number of otherwise abstract ideas. Thus, the computer can play a role as an aid to instruction even when it is not physically present. The computer in the head can often be a more effective aid to instruction than the computer on the desk. locus of Learning A fourth conceptual issue is the locus of learning-where does learning take place? Certain learning happens in the home before the child comesto school, such as learning to speak at least the colloquial language. Other learning happens traditionally inside the school:reading, writitg, arithmetic, and so on. The presenceof computers in the homes is already influencing the distribution of such learning. Recent articles in The New York Times (1) note that it is possibletoday for somebodywith a home computer to acquire all the credentials-from schooldiplomas to collegedegreesat home. Here the computer serves mainly as a communications link with a centralized source (universities, correspondence schools,data banks, and libraries) as well as among students and teachers via electronic mail. But when computers are used in the home by small children, a far more fundamental change in the kind of learning can also take place. The computer can be a means of exploring the world around them and of learning by experimentationjust as making mud pies enables children to learn about dirt. An example of this arising from developmentsin computer voice synthesizers (see Speechsynthesis) is the "talking box" or "word factory." It allows children to combine symbols on a screen in order to produce spoken words. In one version they put letters together and seewhat sound it makes. So they can experiment. If they do not like what they hear, they can change the letters and try different spellings. It may not matter how close it sounds to "teal" spoken English. The children do not care. They are used to cartoon characters who speak strangely. This sounds like robot lan-
lN EDUCATION:CONCEPTUALISSUES COMPUTERS
guage. That is okay. what they are learning to do experimentalty is develop spellings that work-not necessarily the dictionary versions, but ones that follow phonetic rules for transcribing the soundsof English into written words. In some versions of trrir software, they can ask the computer if this is the dictionary spelling. This is far different from drill and practice that C-A-T spells cat. Here, the computer is channeling the children's interest in alphabetic language into energ"yfor learning. It gives the child the same power that literacy gives adults, the power to make things happen with alphabetic language. Another program now being used in preschoolsalso builds a bridge between something children love to do-telling stories-and mastery of alphabetic language. This program allows them to create cartoons: designing characters, making paths for them to follow, and putting the characters' words or noises on the screen. So young children can "write" part of a story even if their mastery of words is very slight. They can set up the story by making the Wolf, Little Red Riding Hood, the Grandmother-and only need to have a few actual words, like "What big eyes you have." of Curriculum Transformation This versatility of the computer raises another conceptual issue: whether to use it as an aid to status quo learning of what has always been learned or to use it for learning new material that had previously been inaccessible,especiallyto young children. Two examples illustrating this (the first of which is taken from the arts) touch on another issue as well: whether the computer belongs to mathematics and the sciences,or to everyone. A remarkable fact about music education in our society is that the instruction is confined to learning how to reproduce music composedby other people(seeMusic, AI in). This is true even for the most private, tutored musical education.All other domains are quite strikingly different. In the visual arts, for example, everyone picks up a pencil and draws at some time. In literature, everyone is expectedto have someexperienceat writing. But in music, only a few students composetheir own material. One reason for this is easy to find. In order to compose effectively, a higher degree of performance competence is needed than most students ever acquire. Computers are now being used in a number of experimental projectsto create environments in which children move easily into composition.The computer has a number of roles, one as a musical instrument. The computer will play what you describeto it-in symbolsor drawing or some way that doesnot dependon your being able to producethe music with your fi.ngersor lips in real time-so you can try out a musical idea. You can hear it in its purity, untrammeled by your own performance deficiencies. Second,the music can be edited. If you do not like it, you can get into it and change it. The word processoropeneddoors to editirg, debuggitg, and writing more easily. The music editor can do the same. The second example is from science. Motion has been a central theme of physics since the period of Galileo and Newton. Newton's laws of motion are crucial to an understanding of physics.Yet in schools,motion (or dynamics)is taught very late and understood very poorly. The teaching of physics begrns not with dynamics, but with statics.
183
Why this curious reversal? Why is the less important field taught first? The reason lies in the kinds of technologieswe havi had for representing motion. As long as the technology was static-pencil and paper-the representation of dynamics required etaborate formalisms. First you go through algebra, then through calculus and differential equations, until finally - rna-.and the door is openedto Newton's someonecan say F has been turned off and dropped out student laws unless the before. long The computer turns this situation around becauseit is essentially a dynamic entity. One can begin to study motion at the earliest age by writing simple programs to control moving objectson the screen.The idea of a law of motion becomesvery concrete,tangible, and accessible.Thus, one can look for a turnabout-and signs of such turnabout in what is being taught are already evident. A Different Kind of EducationalTechnology A related issue is the similarity or dissimilarity betweencomputers and other technologies. Is the computer just another piece of educational technology? Or is it something that is transforming the world as we know it, including both schools and education? Inevitably, the computer is comparedwith educational television and other technologies-with SesameStreel in the case of early childhoodeducation;in the caseof schools,with audiovisual aids, language labs, and the like. Many of these technologiesare generally consideredto have been only partially successful at best and to have only a limited role and value in the classroom.At worst, they have ended up collecting dust in some distant closet. What specific capabilities does the computer have? Is it unique? Perhaps most obvious is that the computer can be used in more widely varied ways than these other technologies.But what is really fundamental is that only the computer is at least potentially under the control of the learner. Watching television is like listening to a fancy lecture. The accompanying images are often beautiful, impressive, and more informative than one usually finds in a classroom.The lecturer on television was no doubt selectedfor having special knowledge and special talents, and could put more effort into preparation than the teacher standing in front of a classroom. So maybe he does a better job than the teacher standing in front of the classrooffi,but he is doing a job of the same kind. In certain ways it is a better job, but in others it is not. The television lecture is less matched to the particular knowledge of those particular students, and there is no opportunity at all for interaction. At best, it is a slicked-up version of the same kind of thing. The speaker can be chosenmore carefully, there is more time for preparation, errors can be edited out, and so on. This does not make it a different kind of experience and probably cannot make up for the possibilities of personaltzation and interaction. The computer can create a different relationship to knowledge. Children who are discovering mathematical ideas by experimenting with the computer are in a different kind of relationship to mathematics than someone listening to an explanation from even the most skilled and interactive teacher. Compared with traditional teaching, the computer goes in the opposite direction from the change these other technolo-
184
COMPUTERS tN EDUCATION: coNcEpruAt tssuEs
gles can bring. Where television has somevirtues, its ultimate weaknessis that it pushesin the direction of passivelearning. Whereas the computer, whatever its weakness,pushes in the direction of active learning. The two are not comparabletechnologies;they are opposites.
SocialInteraction
The final conceptual issue is social interaction. Critics of the computerjump quickly to the conclusionthat computersin the educational environment will lead to isolation of the individual. Parents easily fear that their children will "spend all day sitting in front of the computer." What is the reality behind AlienatedVersusSyntonicLearning these fears? The answer is not a simple one. A few studies on social interactions in schoolswhere comConflicting claims about the role of computersin the classroom raise another issue. Do computers focus on factual and cogni- puters are present indicate that, at least statistically, the comtive knowledge alone, or do they also touch on feelings and puter presenceincreasesthe amount of interaction among the children. It allows for projects that encouragecooperation;it relationships? Mathematics can be used as an example that allows children to make something they are excited about and will clarify this issue. Most people come out of school with extremely negative talk about with others; it allows a new medium of communicaattitudes to mathematics. Many find their mathematical tion through computer mail or leaving messageson one anlearning experienceunpleasant. Quite a number developwhat other's files. However, there are individual casesof people of all ages has come to be known as mathophobia. get lost in the computer, who do withdraw from interacwho The reasonsfor this are complex and not completely understood, but certainly one of them is the fact that as early as tion with other people and focus their energies on themselves elementary school, children feel that mathematics is some- and their computer. Which will prevail? The answer leads to a final and most important concept about the computer. thing imposed on them from the outside, something that serves no clear purpose. Math is felt as something alienated, The"Effect" of the Computer. The computeris not an agent. something one doesunder pressure,something one is forcedto The computer doesnot "cause" more social behavior or less. It do-not something one does with a senseof pleasure and de- does not "cause" better or worse mathematical learning. The light, or that has relevance to one's own interests, and cer- computer is a material that enters into the learning environtainly not something one spontaneouslychoosesto do. ment-and more generally into the whole social environUsing the computer for drill and practice will only further ment-and can do so in many different ways. Noted aboveare alienate the children's experienceof mathematics. A different ways in which the computer can be used in support of the most approach seeks to solve this problem by changing the child's diametrically opposedtheoretical approachesto education. It relationship to mathematical knowledge. can be used to make structured education more structured. It The computer language Logo and turtle graphics allow chil- can be used to make open education more open. In all three dren to create designs or animations on the screen.In order to casesit is not the computer that has the particular effect but do so, they must acquire certain mathematical ideas. One of the ways it is used. these is numbers: they need to develop an intuitive senseof Nowhere is this more true than in the role computers can the relative sizesof various numbers. At first, children are not play in human interactions. Following are some examples of very good at guessing whether a particular line is 10 or 15 or measuresthat have been used to increasethe computer'sinflu100 units long. But with experience,they becomequite expert. ence as a socializrrtg,rather than an isolating, instrument. A more technical example is understanding the notion of LearningEnvironments that StrengthenInteraction. In the exangle. To make the turtle face another direction, you have to perimental school at the Learning Research Development say how many degreesit should turn: for example, with the Center in Pittsburgh, Logo teacher Leslie Thyberg runs a very command RIGHT 90 or LEFT 25. open class for children in the lower grades,K through 3. For a As traditionally taught, the idea of measuring angles in degreesis still very abstract for children as old as 11 and 12. variety of reasons, she introduced a simple rule: "Ask three By contrast, children who work with Logo and its turtle are before you ask me." This usually meant asking three other engaged in a different relationship with mathematics. They children-which had many results, all good. Asking each other questions created a particularly rich enenjoy what they are doing, and it is relevant to their immediate goal: for example, to program a video game or a cartoon or vironment for communication and for cross-fertilization of a drawing on the screen. With Logo, children as young as 7 ideas. This is the kind of climate in which a computer culture and 8 (and maybe even preschool ages) become completely can (and does) thrive. Answering each other's questions also allowed them to function as experts and as teachers. Leslie fluent in these ideas. This different relationship to mathematics opensthe possi- herself found that the number of trivial questions decreased markedly. This freed her time and energies to focus on more bility of a different kind of learning: syntonic learning-as contrasted with alienated learning. The word "syntonic" de- subtle things: spotting that one child is in trouble, another is rives from psychoanalysis,where it refers to the feeling that blocking, d third could be doing something more exciting, and certain activities are in harmony with one's goals and values so on. Another example concernsHenry-a boy whose early perand innermost self. It has always been clear to psychologistsand some educa- sonality development sets him up to get lost in the computer tors that syntonic learning is much more effective than alien- (2). Long before he met a computer, he had difficulty in social ated learning. Somepeople also find that educational ideology relations. He was a dreamer who lived in fantasies and preferred his own dream fantasies of space travel to playing can create marked differences in which type of learning takes place. In any case,syntonicity is a major issue for under- games with other children. For him, as for many like him, the standing the goals of many of the uses of computers in educa- arrival of the computer was an opportunity to intensify his social withdrawal. tion.
CONCEPTLEARNING
185
To exhibit these characteristics, an intelligent system-human or machine-must be able to classify someobjects,behaviors, or events as equivalent for achieving given goals and some others as differing. For example, to satisfy hunger, an animal must be able to classify some objectsas edible despite the great variety of their forms and the changesthey undergo in the environment. Thus, &r intelligent system must be able to form concepts,that is, classes of entities united by some principle. Such a principle might be a common use or goal, the same role in a structure forming a theory about something, or just similar preceptual characteristics. In order to use the concepts, the system must also develop efficient methods for recognizing concept membership of any given entity. The question then is how conceptsand conceptrecognition methods are learned. The study and computer modeling of processesby which an intelligent system acquires, refines, and differentiates concepts is the subject matter of conceptlearning. Conceptlearning is a subdomain of machine learning (qv). The research in this area originated with studies of concept development in humans (e.g.,Refs.1-3). It subsequentlycontinuedin the context of both AI efforts to build machines with concept-learning capabilities and cognitive sciencestudies to construct computational models of learning. Selected publications covering this developmentare listed in Refs. 4-23. At present, concept learning is one of the central research topics in machine learning, d subarea of AI concernedwith the development of computational theories of learning and the building of learning machines (see Machine learning). In research on concept learning, the term "concept" is usually viewed in a more narrow sensethan outlined above, namely, BIBLIOGRAPHY as an equivalenceclass of entities, such that it can be compre1. The New York Times,Section12, Education,Sunday,April L4, hensibly describedby no more than a small set of statements. This description must be sufficient for distinguishing this con1985. cept from other concepts.Individual entities in the class are Sithe Human Spirit, and Computers Self: The Turkle, Second 2. S. called instancesof the concept. monand Schuster,New York, pp. 129-136,1984. The assumption that a concept is an equivalence class implies that its every instance is equally representative of the General Beferences
However, Henry was in a school that was set up to favor a very different pattern of development. Children were encouraged to act as experts and advisors to the other children whenthey had special knowledge. The computerswere located "iu" out in the open rather than in computer labs or in classrooms where quiet was imposed. This made it much easier to see what other children were doing and to interact with anyone doing intriguing work. Thus, it was not the computer as such but the computer culture of the schoolthat drew Henry into a situation where he was in demand. So this young man who had always been afraid of pursuing contacts with other children found himself being pursued. Finally, a more subtle example is drawn from the author's work with Logo. From the outset this language was designed to encourage communication between users. Logo progTams are modular so they can be bomowed and shared. Logo is also designedto make it as easy as possibleto talk about how you made your program work-what the bugs were, what the difficulties were, and how you solved them. Thus, the content of actual computer work, even on what might seem like a very technical level such as designing a computer language, is a factor that can make for greater socialization or greater isolation. In all these conceptual issues one needs to remember one thing. Any question such as "What effect will the computer have upon this or that?" is a badly posedquestion. It is not the computer. In each case it is not what the computer will do to one, it is what one will do with the computer.
MA, :T"T:Jr:li,:T:iffffi:fjT:'i::'Jl Reading, Addison-wesrey, c. Daiute, writing andcomputers. 1985.
Deuetop,me* and. cognitiue Experience R.Lawrer, computer j:,:!:.:!: ffiTt;J1ffi"T1"""1"':,ffi: ff# Learningin a ComputerCulture,Ellis HorwoodLtd., distributed - ^--'
l?,fltrT:Jpi*
$r#:r:t-lxlTi :iff;
necessary and jointly sufficient conditions and thus excludes a^disjunctive description') Such an idealization greatly facilitates research on concept learning, as it defines the learning task simply as the acquisition of a formal structure describing an equivalence class. It is, however, only a very rough approxithat ignores many important aspects of the human 1^111"" notion of a concept (24). At the conclusion of this entry the w-eaknessesof this definition are briefly addressed' and ideas are pointed out that attempt to capture the notion ofa concept more adequately. Within research on concept learning two major orientations peprnr S. can be distinguished: cognitive modeling and the engineering Massachusetts Institute of Technology
by John wiley & sons,New york, 198b. T. o,shea, Learning and. Teachingwith computers.prentice-Hall, Inc., Englewoodcliffs, NJ, 19gg. S. Papert,Mind.storms: child,ren,computers,and Powerful ldeas,Basic Books,New york, 1980. publishing,Reston, D. peterson,The IntelLigentschoolhnuse.Reston VA, 1984. Harper & Row, NY, s. weir, cultivating Mind.s:Another casebooh. 1986.
CONCEpT
LEARNING
what is concept l_earning? Among the fundamental characteristics of intelligent behavior are the abilities to pursue goals and to plan future actions.
'l;Ii?ffi f l"'j;:A#iiTJl".:""flT:il:fi 3llffi :T: I,TtrlH;,l:lTllililfl ffi1,fi ffl'#".1ffI"j;,?""T'*il : velop computational theories of conceptlearning in humans or
and computer programs embodying those methods. In contrast, the engineering approach attempts to explore and experiment with all possible learning mechanisms, irrespective of their occurrence in living organisms.
186
CONCEPTLEARNING
ConceptLearningCan Be Classifiedby Type of InferencePerformed
usrng a conceptexample for guidance (27).In general, deductive learning is performing a sequenceof deductionsor compuIn any learning process the student applies the knowledge tations on the information given andior stored in background knowledge, and memoruzrngthe result. possessedto information obtained from a source,for .*u-p[, More advanced deductive learning is exemplified by anaa teacher, in order to derive new useful knowledge. This new knowledge is then stored for subsequentuse. Learning a new lytic or explanation-basedlearning methods (e.g., 27).These conceptcan proceedin a number of ways, reflecting the type of methods start with the abstract conceptdefinition and domain inference the student performs on the information r,rppli.d. knowledge, and by deduction derive an operational concept For example, one may learn the conceptof a butterfly by teing definition. A concept example is used to guide the deductive given a description of it, by generalizing examples of specific process.For instance, knowing that a cup is an open, stable butterflies, by constructing this concept in the processof ob- and liftable vessel, an explanation-basedmethod can produce serving and analyzitrg different types of insects, or by yet an- an "operational" description of a cup. Such a descriptioncharother way. The type of inference performed by the student on acterizes the cup in terms of lower level, more measurable the information supplied definesthe strategy of conceptlearn- features, such as the presenceof concavity, of a handle and a ing and constitutes a useful criterion for classifying learning flat bottom. Curuent research attempts to combine such anaIytical learning with inductive learning in order to learn conprocesses. cepts when the domain knowledge is incomplete, intractable Several basic concept-learning strategies have been identiinconsistent. fied in the courseof machine-learning research.These are pre- or sentedbelow in the order of increasing complexity of inference Learningby Analogy. The learner acquiresa new conceptby as performed by the learner. In some general sense,this order modifying the definition of a known similar concept.That is, reflects the increasing difficulty for the student to learn the rather than formulating a rule for a new conceptfrom scratch, conceptand the decreasingdifficulty for the instructor to teach the student adapts an existing rule by modifying it approprithe concept. In any practical act of learning, more than one ately to serve the new role. For example, if one knows the stratery is often simultaneously employed. It should also be concept of an orang€, learning the concept of a tangerine can noted that this classification of strategies applies not only to be accomplishedeasily by just noting the similarities and dislearning of conceptsbut also to any act of acquiring knowltinctions between the two. Another example is learning about edge. electric circuits by drawing analogies from pipes conducting water. Direct lmplantingof Knowledge.This is an extreme casein Learning by analogy can be viewed as inductive and deducwhich the learner does not have to perform any inference on tive learning combined and for this reason is placed between the information provided. The knowledge supplied by the the two. Through inductive inference (see below) one detersource is directly acceptedby the learner. This strategy, also mines general characteristics or transformations unifying concalled rote learning, includes learning by direct memorization cepts being compared. Then, by deductive inference, one deof given concept descriptions and learning by being pro- rives from these characteristics features expected of the grammed or constructed. For example, this strategy is em- concept being learned. Winston (18) describesa method for ployed when a specific algorithm for recognizing a concept is learning conceptsby analogy basedon matching semantic netprogrammed into a computer or a database of facts about the works. Learning by analogy plays an important role in probconcept is built. In Samuel's CHECKERS program (5) rote lem solving (e.g.,Ref. 22)" learning was employed to save the results of previous game tree searches in order to deepen and speed up subsequent Learningby Induction. In this strategy the learner acquires searches. a conceptby drawing inductive inferences from supplied facts or observations.Depending on what is provided and what is Learningby lnstruction(or Learningby BeingTold). Here the known to a learner, two different forms of this strategy can be learner acquires concepts from a teacher or other organized distinguished: learning from examples and learning from obsource,such as a publication or textbook, but doesnot directly servation and discovery. Learningfrom Examples.The learner induces a conceptdecopy into memory the information supplied. The learning process may involve selecting the most relevant facts and/or scription by generalizing from teacher- or environment-protransforming the sourceinformation to more useful forms. The vided examples and (optionally) counterexamplesof the consystem NANOKLAUS (25), which builds a hierarchical cept. It is assumedthat the conceptalready exists; it is known knowledge base by conversing with a user, is an example of to the teacher or there is some effective procedure for testing the concept membership. The task for the learner is to determachine learning employing this strategy. mine a general concept description by analyzing individual Learningby Deduction. The learner acquires a conceptby conceptexamples. An example of this strategy takes place when a senior docdeducing it from the knowledge given and/or possessed.In other words, this stratery includes any process in which tor examines medical records and makes interviews with paknowledge learned is a result of a truth-preserving transfor- tients in the presence of one or more interns, noting that mation of the knowledge given, including performing compu- "this is a patient with hepatitis"; "this is another patient with tation. A very simple example of this strategy determining hepatitis, but notice that . . ", and so on. The latter part of that the factorial of 6 is 720 by executing an already known this entry briefly discussesa few methods for learning from algorithm and having this fact for future use. This technique examples. Learningby Observationand Discovery.In this stratery is called "memo functions" (26D.A form by deduction is explanation-based learning which transforms an abstract, not di- the learner analyzes given and/or observedentities and deterrectly usable, concept definition to an operational definition mines that some subsetsof these entities can be grouped use-
CONCEPTLEARNING
187
fully into certain classes(i.e., concepts).Becausethere is no terized as reasoning from specificto general, from particular to teacher who knows the conceptsbeforehand, this strategy is universal, or from part to whole. Such a characterization is also called unsupervised learning. Once a conceptis formed, it simple but not too informative. It does not identify all the is given a name. Conceptsso created can then be used as terms componentsplaying a role in the inductive process,nor doesit in subsequent learning of other concepts. explain how this inference is possible.To understand this inAn important form of this stratery is clustering (i.e., partiference more precisely, its major components are distintioning a collection of objectsinto classes)and the related pro- guished, and the properties of its conclusions are specified. cess of constructing classifications. Classifications are typically organized into hierarchies of concepts.Such hierarchies Given: exhibit an important property of inheritance. If an object is premise statements(facts, specific observations, intermedirecogntzedas a member of some class, the properties associated specifically with this class, as well as with classesat the ate generalizations) that provide information about some higher level of hierarchy, are (tentatively) assigned to the objects,phenomena,processes,and so on; given object. For example, if one learns that Freddy is an a tentatiueinductiue assertion,which is an a priori hypotheelephant, then, without seeing Freddy, one will typically assis held about the objects in the premise statements (in sume that Freddy has four legs, a trunk, and all the distinsome acts of inductive inference there may not be any tenguishing properties of elephants, vertebrates, and generally, tative hypothesis; if there is such a hypothesis, the inducanimals. Hierarchical classifications vary in height: Some tive processmay be simplified, 8s it may involve merely a may be tall, like the classification of living organisms, and modification of the tentative hypothesis rather than creatsomemore flat,like the social hierarchy. The topics of clustering a new hypothesis from scratch); and ing (in particular, conceptual clusterirg) and classification backgrourudknowledge, which contains general and doconstruction are treated in a separate entry in the encyclopemain-specific conceptsfor interpreting the premisesand india (seeClustering). ference rules relevant to the task of inference; it includes Another form of learning by observation and discovery is previously learned concepts,domain constraints, causality descriptive generalization. This form is concernedwith discovrelations, assumptions about the premise statements and ering regularities and formulating new concepts and rules candidate hypotheses,goals for inference, and methods for characterizing collections of any entities (objects,events, proevaluating the candidate hypotheses from these goals' o'most cesses,etc.). It producesstatements such as peopleare viewpoints (specifically, the preference criterion or bias). honest," "whenever there are independent events, the normal distribution should hold," or "John is in the habit of amblin' Determine: down to the soda fountain every day about now." Examples of research on this topic are two programs by an inductiue assertion (a hypothesls) that strongly or Lenat (L5,23): AM, which searchesfor and developsnew ,,inweakly implies the premise statements in the context of teresting" conceptsafter being given a set of heuristic rules background knowledge and is most preferable among all and initial conceptsin elementary mathematics and set theother such hypotheses. orY,and EURISKO, which formulates new heuristics.Another example is the BACON system (e.g.,Ref. zB), which syntheA hypothesis strongly implies premise statements in the sizes mathematical expressions representing chemical or context of background knowledge if by using background physical laws on the basis of given empirical data. knowledge (and standard rules of inference), the premise In the AI literature the term "concept learning" is frestatements can be shown to be a logical consequenceof the quently used in a more narrow sensethan it is here, namely, to hypothesis. In other words, the assertion mean solely learning conceptsfrom examples.One reason for this is historical, as this strategy was studied first, and most is Hypothesis & Background knowledge ) Premise statements known about it. It subsequently served as the springboard for studies of other strategies, but it continues to be the area most is valid, that is, true under all interpretations (the symbol + intensively investigated. Learning from examples and learn- denotesimplication). A hypothesis that satisfiesthis condition is called a strong candidate hypothesis. In contrast, a weak ing from observationand discovery(i.e., inductive learning in hypothesis is the one that only weakly implies premise stategeneral) are fundamental forms of concept learning. When ments, that is, these statements are a plausible, but not ceracquirittg any abstract concept,examplesare typically needed tain, consequenceof the hypothesis. The following two-part to achieve a deeper understanding of the concept;and initial learning of any conceptsand natural laws is typically achieved example illustrates both types of hypotheses. by generalizing from our sensoryobservations.For these reaExample:Part 1. sons the remainder cf this entry concentrates on inductive learning. For coverageof other strategies the reader is advised Premise statements: to consult other references,in particular Ref. 29. The nature of Socrateswas Greek. Aristotle was Greek. Plato was Greek. inductive inference, which is the core of inductive learning processes,is explored in more detail. Background knowledge: InductivelnferencesGeneratesHypothesesfrom Factsand/or Other Hypotheses Inductive inference is the primary vehicle for creating new knowledge and predicting future events. It is usually .huru.-
Socrates,Aristotle, and Plato were philosophers.They lived in antiquity. Philosophersare people.Greeks are people. PreferenceCriterion. Prefer the hypothesis that is short and useful for deciding the nationality of philosophers.
188
CONCEPTTEARNING
Candidate hypotheses(a selection): 1. Philosophers who lived in antiquity were Greek. 2, Atl philosophers are Greek. 3. All people are Greek. Preferred hypothesis: 4. Atl philosophers are Greek. (It is shorter than 1 and more specific than 3; it allows one, unlike 1, to determine the nationality of all philosophers.) It can be seen that the original premise statements are a logical consequenceof the generated hypothesis and background knowledge. The fact that the generated hypothesis is too general is a result of the poverty of the background knowledge and/or the premise assertions. Example:Part 2. Supposethat the stock of facts has been enlarged with statements such as "Spencer was British" and "Hume was British" and that the background knowledge includes also the statement "Hume and Spencer were philosophers." In this case a strong candidate hypothesis would be "All philosophers were Greek except Spencer or Hume, who were British." A weak hypothesis would be "Most (or some)philosophers were Greek." Given a fact that Plato was a philosopher, the new hypothesis, in contrast to the old one, doesnot allow one to concludestrongly that he was Greek. It allows one only to say that it is likely (or that it is possible)that he was Greek. However, unlike the first hypothesis, it will also not conclude strongly that philosopher Russell was Greek! This example illustrates important properties of inductive inference. One is that it may not be truth preserving, that is, its conclusions may be incorrect though the premise statements are correct. Going back to the first hypothesis, though Socrates,Aristotle, and Plato were Greek, it certainly doesnot follow that all philosophers were Greek. This quality of nontruth preservation contrasts inductive inference with truthpreserving deductive inference. Figure 1 illustrates the relationship between rleductive and inductive inference. Inductive inference that produces strong hypotheses is fal-
Premise statements facts
D e du c t i o n
Background knowledge
Figure
1. Relation between deduction and induction.
sity preserving. This means that if the original premise statements are false, the derived hypothesis will be false also. For example, if it were not true that Socrates was Greek, then clearly the first hypothesis, "All philosophers were Greek," could not be true either. Hypotheses generated by inductive inference have unknown truth status. They must be tested and verified before they becomerules or acceptedtheories (see section on hypothesis verification). The premise statements, background knowledge, and derived hypotheses need to be expressedin some language. In human inference it is the language of the mind, a "mentalese," that at the surface level takes the form of natural language augmented with special representations of sensory stimuli, such as drawings, pictures, sounds, or gestures. In machine inference it is a formal language, such as propositional logic, predicate calculus or other logic-style formalisms, or a knowledgerepresentation system, such as semantic networks, mathematical expressions,frames, scripts, or conceptualstructures (30). Sometimesexpressingthe premise statements is easier in one language and expressing hypothesesis easier in another Ianguage. In conceptlearning from examples(conceptacquisition) the main concern is with a special case of inductive inference, called inductive gen erahzation. Here both the premise statements and the hypothesis are either interpretable as descriptions of sets (in this casethere is instance-to-classgeneralization) or as descriptions of componentsof someobject or process (in the latter case there is part-to-whole generaltzation). In instance-to-class generalization properties known to hold for a set of objects are assignedto a larger set of objects. This form can be seen in the example above,in which a property (the nationality) assignedby premise statements to a few individuals was assignedto all individuals in some class (all philosophers). In part-to-whole generalization the premise statements describe parts of some object, and the goal is to hypothesizea description of the whole object.For example,the following is a part-to-whole generalization. Premise: His hands and his legs are strong. Background knowledge:Hands and legs are parts of a body. Hypothesis.'His whole body is strong. An important form of part-to-whole generalization is sequenceor processprediction (31,32). Inductive inference was defined as a processof generating descriptions that imply original facts in the context of background knowledge. Such a general definition includes inductive generalization and abduction as special cases.The term "abduction" was coined by the American logician Peirce (33). In abduction, the generated descriptions are specificassertions implying the facts (in the context of background knowledge) rather than generalizations of them. For example, given a premise assertion, "these roses are purple," and background knowledge "all rosesin Adam's garden are purple," an abductive assertion would be "perhaps these roses are from Adam's garden." A description that implies some facts can be viewed as an explanation of these facts. The most interesting form of an explanation is when it provides a causal, goal-orientedcharactefization of the facts. To derive such an explanation, background knowledge must contain, along with other inference rules, causal inference rules as well as a specification of the
CONCEPTLEARNING
goal(s) of inference. Generating causal explanations can thus be viewed as a form of inductive inference.
InductivelnferenceCan Be Performedby Rules One of the important results of researchon inductive inference is the development of the concept of an inductive inference rule. An inductive inference rule performs some elementary act of inductive inference. It takes one or more assertionsand generatesan assertion that tautologically implies them. The concept of an inductive inference rule permits one to view inductive inference, at least conceptually, as a rule-guided process that starts with initial premises and background knowledge and ends with an inductive assertion (84). Here are a few examples of such rules:
189
Interpretation Reformulation
I n s t a n c es p a c e
D e s c r i p t i o ns p a c e
Equivalent descriptions
E x a m p l es e l e c t i o n e x p e r i m e n tp l a n n i n g
Figure 2. Interaction between instance space and description space. Dropping conditions (removing a conjunctively linked condition from a statement; e.g., replacing the statement ,,a nation is strong if it has a strong economyand high deterConsider a simple casewhere examplesof a concept(posimination" by "a nation is strong if it has high determina- tive examples) and counterexamples (negative examples) are tion"). represented by attribute vectors, that is, by lists of values of Turning constants into uariables (e.g., it generalizes the certain attributes. Considering attributes as dimensions spanstatement "this apple tastes good" into "all apples taste ning a multidimensional space,each example maps to a point good"). in this space.Points that do not correspondto any observed Adding options (it generalizes a statement by adding a example represent potential examples.Such a spaceis called a disjunctively linked condition; e.g., it might generarizethe feature spaceor an event spaceand can be viewed as a geometstatement "peace will be preserved if all nations have ric model of an instance space. One may ask where the attributes come from. In simple peaceful intentions" into "peacewill be preserved if all nations have peaceful intentions or if nonaggressivenations methods the attributes are defined by the teacher. Such methods are called selective becausethe learned concept does not are much stronger than the aggressiveones"). include any new attributes but only those definedby a teacher. Climbing generalization tree (replacitrg a less general term In more sophisticated methods the system is provided with by a more general term in a statement; e.g., generalizing some initial attributes plus various rules of inference, heuristhe statement "I like oranges" into "I like citrus fruits,,). tics, or proceduresthat a learner uses for generating new attributes. The latter methods are called constructive (ga,gs). A systematic presentation of inductive rules is in Ref. 34. Different subsetsof the instance spacecorrespondto different concepts.Descriptions of those concepts are elements of the description space.For simplicity, assumethat the descripInstanceSpaceversusDescriptionSpace tion spaceis the set of all logical expressionsinvolving attriEarlier two forms of inductive learning have been distin- butes used in char actenzittg examples.Depending on tf,e conguished: learning from examples and learning by observa- straints imposed on these expressions,all (or only some) tion. Learning a concept from examples is a processof con- subsetsof the instance spacecan be representedby an expresstructing a representation of a designated class of entities by sion in this language. usually, any concept correspondsto a observing only selectedmembers of that class and optionally subset of (logically equivalent) descriptions in the description nonmembers (counterexamples).Learning from observations space. involves creating conceptsas useful classesfor characteri z1ng A concept is consistent with regard to the examples if it observationsor any given facts. Both processesdependon the covers some or all positive examples and none of the negative learner's background knowledge, in particular, on the type of examples. A conceptdescription is completewith regard to the description language the learner usesfor characteri zingu"uo,- examples if it coversall positive examples.A description of a ples and learned concepts. conceptthat is both complete and consistent with regard to all In this context it is instructive to distinguish between an examples is a candidate hypothesis.The requirement for cominstance space and a description space. The instance space pleteness and consistency follows from th; assumption that consistsof all possibleexamplesand counterexamplesof con- the hypothesis should imply the initial examples(seeRef. 84). cepts to be learned. Actually observed positiv. utrd negative The set of all candidate hypotheses is called the candidate examples constitute subsets of such an instance space. The hypothesis spaceor the version space.The candidate hypothedescription space is the set of all descriptions of instances or sis spacecan be partially ordered by the relation of g.tt"rulity classes of instances that are possible using the description that reflects the set inclusion relation betweenthe .*r.rpondlanguage specified by the learner's background knowlidge. ing coneepts.The most general hypothesis describesthe conLearning a concept involves an interaction between the two cept that is the complement of the union of negative examples; spaces. Such an interaction may involve reformulation or and the most specific hypothesis describesthe conceptthat is transformation of initial assertionsas well as experimentation the union of all positive examples. and active selectionof training examples (Fig. z). Because the candidate hypothesis space is usually quite
190
CONCEPTTEARNING
large, a preferencecriterion is used to decidewhich candidate hypothesis to choose.Such a criterion may favor, for example, hypotheses that are short, hypothesesthat require the least effort to measure the attributes involved, or generally, hypothesesthat best reflect the goal of learning. If the concept representation language is incomplete, for example, allows one to express only conjunctive hypotheses, and a sufficient number of positive and negative examples is supplied, the resulting version space may contain only one candidate hypothesis.In such a casethe preferencecriterion is not needed(17). In summary, learning a concept can be described as a heuristic (qv) search (qv) through the description spacefor a most preferred hypothesis among all those that are consistent and complete with regard to the training examples. SelectedMethodsof InductiveLearning An important characteristic of learning methods is the way in which descriptions in the description spaceare generatedand/ or searchedin relation to the examplesor facts in the instance space. Three types of methods can be distinguished: data driven, model driven, and mixed. A data-driven method starts with selecting one or more examples,formulates a hypothesis explaining them, and then generalizes(and occasionallyspecializes)the hypothesis to explain further examples.A modeldriven method starts with some very general hypothesesand then specializes(and occasionally generalizes)them to fit all the examples.Roughly speaking, data-driven methodsproceed from specific to general, and model-driven methods proceed from general to specific.A mixed method has elements of both: It uses an example(s)to jump to one or more general hypotheses,tests the hypotheses,and then modifies them to fit other examples. Data-driven methods tend to be more efficient, and model-driven methods tend to be more tolerant of errors in data (29). Below are examples of the three types of methods.
The generalization step (step 2) applies such operators as dropping conditions, turning constants to variables, or climbing generalization tree. When confronted with multiple choice in generalizing, the program chooses the least "drastic" change to the current conceptdescription. For example, it witl replace a less general term by a more general term rather than drop a term. The specialization step (step 3) adds more conditions and introduces exceptionsor the must-not conditions to the currently held hypothesis.There are usually many ways to specialrzea hypothesis so that it doesnot cover a given negative example (as many as there are differences between the example and the hypothesis). For that reason the program favors the near misses,that is, negative examples that diffler from the hypothesis in only a few or, in the best case,in only one aspect. Other examples of data-driven methods are the candidate elimination algorithm (I7 ,37) for learning from examples and the method for learning from observation embodied in the BACON system (28). The latter method discovers equations characterizing empirical laws. Model-DrivenMethods
Learningby lncrementalSpecializationand Modification:The Program.This progTamimplements a modelMeIa-DENDRAL driven method for discovering rules characterizing the operation of a mass spectrometer (38). These so-called cleavage rules predict which bonds in a molecular structure of a chemical compoundwilt likely break when bombarded by electrons in the mass spectrometer.To avoid undue technical details of the specificdomain, the rule-learning processis presentedat a level of abstraction. This processconsists of two phases.First, the rule generation phase conductsa general-to-specificsearch of the spaceof possible cleavage rules (subprogram RULEGEN). Next, the rule modification phase makes the rules so obtained more precise and less redundant by performing local hill-climbing searches(subprogram RULEMOD). Training examplescan be Data-DrivenMethods viewed as attribute vector descriptions of the environment of Winston'sBlock World: Learningby IncrementalGeneraliza- individual bonds in a molecule. Among the attributes are the tion and Modification. Winston's program (36) is an excellent type of atoms on both sides of the bond, the number of hydrorepresentative of a data-driven method of conceptlearning. It gen and nonhydrogen atoms bound to each atom, number of learns structural descriptions of concepts in a blocks world unsaturated valence electrons of the atom, and so on. With (e.g.,the conceptof an arch) from representative examplesand each example is associateda decision as to whether the correcounterexamplesprovided by a teacher. The progTam repre- sponding bond will break in the mass spectrometer.An imporsents examples and conceptsin the form of a semantic net- tant feature of this application is a large-srzed,error-laden set work. At each step of learning it maintains only one working of input examples. The rule generation phase starts with the most general hypothesis. In searching for the final hypothesis, it uses a algorithm rule, stating that every bond will break. Abstracting from si-pl" form of best-first search method. The basic the specific domain-dependent notation, such a rule can be can be describedas follows: written: 1. Take first positive example of the concept and assume that If a bond is any bond, then it will break. it is a concept descriPtion. the not satisfy does and positive 2. If the next example is The next step specializesthe left side of the parent rule by current concept description, generalize the description so making a change to atoms at a specified distance from the that it includes the examPle. bond. A changemay involve changing properties of an atom or current the satisfies but negative is adding a new atom. New rules so obtained are then tested to B. If the next example description, specialize the description so that it excludes seeif they perform better in predicting the breaks in the given set of examples. This two-step process of rule specialization the example. of performance is 4. Repeat steps 2 andS until the processconvergeson a stable and testing repeats until a local optimum as: be characterized can rules resulting The achieved. concept descriPtion.
CONCEPTLEARNING
If a bond environment has properties so and so, then it will break. MeIa-DENDRAL was an important learning system that worked well in a real-world domain with noisy data. In addition to the processof rule development,outlined above, it also performed a sophisticated transformation of the initial data (the input spectrum) to usable training instances (the bond environment descriptions). In all aspects of its operation the program relied on a large amount of domain-specific knowledge. Another example of a model-driven method is the conceptlearning progr&ffi,CSL (3), and its modified version, IDg (Bg). The program starts by attempting to find the best one-attribute rule characterizinggiven examples.If this is not possible, it builds a decisiontree of such rules that classifiesall input examples. In such a tree nodes correspondto attributes, emanating branches to the attribute values, and leaves to classes.
T91
been found). Otherwise, find positive examplesthat remain uncovered. 5. Repeat steps I-4 for the remainder set. Continue until all positive examples are covered.The disjunction of hypotheses selected at the end of each cycle is a consistent and complete description of all the positive examplesand maximizes the preference criterion.
Thus, the program builds a disjunctive description of a concept when a conjunctive description is not possible.The individual conjuncts in such a disjunction may significantly differ as to the size of coverageof the training examples.This allows for an interesting interpretation: The conjunct that covers most of the events could be viewed as a char actenzation of the typical, or "id eal," members and those with light coverageas a characterization of exceptional cases. The incremental part of the program performs operationsof modifying generated descriptions to fit new examples. The background knowledge of the program contains information Mixed Methods about the properties of the attributes used to describe examples and various domain constraints. The program has been Learningby RapidGeneralizationand StepwiseSpecialization: applied to various problems in medicine, agriculture, chess, AQl 1. Inductive conceptlearning can be viewed as a generate-and-test process.The "gener ate" part creates or modifies and other areas. A more advanced version of the program, hypothesesand the "test" part tests how well the hypotheses INDUCE (34), is capableof learning not only attribute-based fit the data. In data-driven methods the "gener ate'i part is but also structure-based concept descriptions. These descripsophisticatedand the "test" part is simple, whereas in model- tions charactenze conceptsas structures of componentsbound driven methods the opposite holds. A mixed method, imple- by various relationships, and are expressedin an extended mented in the program AQll, attempts to more equally em- predicate calculus. The program has the ability to utili ze general and domain-specific knowledge to generate new attriphasize the "gener ate" and "test" parts. butes. AQl1 is a multipurpose learning program that formulates general rules describing various classesof examples (40). Input to the program consists of attribute value vector descrip- How are Learned ConceptsValidated? tions of examplesfrom different classes.It also includes background knowledge about the application domain and a Although inductive inference represents the basic method for hypothesis preference criterion. The output can be viewed as acquirittg knowledge about the world and is one of the most common forms of inference, it suffers from a fundamental rules, weakness.Except for special cases,results of this inference are inherently insusceptible to complete validation. This is because an inductively acquired hypothesis may have an infinite where"condition" *"fi::ffi;:::, a disjunction orconjunctions, such that it describesall entities assigned ,,class.,, number of consequences,but only a finite number of tests can to A simplified version of the algorithm, called AQ-,which under- be performed. This property of inductive inference was oblies the nonincremental learning part of the program is as served early on by the Scottish philosopher David Hume and subsequently analyzedby twentieth-century thinkers such as follows. Popper (e.g., Ref. 4r).Consequently, one typically assumes that conceptdescriptions learned inductively huro. only a ten1. Select at random one positive example (called the seed). 2. Comparing the seedwith the first negative example, gener- tative status. When new examples become available, these ate all maximally general hypothesesthat cover the seed descriptions are tested on them and, if neccessary,appropriately modified. A standard method for testing inducti*ly ..and exclude the negative example. quired descriptions (rules) is to apply them to iesting examples 3. Specializethe hypothesesto exclude all negative examples. and compute a confusion matrix. Such a matrix records the This is done by considering one negati'r. .*umple at a time number of correct and incorrect classifications of the testing and adding, whenever neccessary,additional constraints to examplesby the rules. the hypotheses.After each step of spectalizatron the newly generated hypotheses are ranked according to how well they classify remaining examples and u..ording to other ExtendedNotionsof a Concept aspectsdefined in the preference criterion. Only the most The basic ideas and a few selectedmethods of conceptlearning promising hypotheses are kept. The set of frypotfreses ob- have been described here. These methods were based on the tained at the end of the specialization processis called a notion that concepts are classesof entities describable by star. a logic-style description. This means that concept descriptions 4. Select from the star the best-ranked hypothesis. If this hy- have sharp boundaries and all members are equal ,"pr"rentapothesis covers all positive examples,exit (a solution has tives of a concept.As pointed out above, this simpiification,
192
CONCEPTLEARNINC
though useful for research, misses some important aspectsof the human notion of a concept. Human concepts,exceptfor special casesoccurring predominantly in science(conceptssuch as a triangle, a prime number, a vertebrate, etc.), are structures with flexible and/or imprecise boundaries. They allow a varying degree of match between them and observedinstancesand have context-dependent meaning. Flexible boundaries make it possibleto "fi.t" the meaning of a conceptto changing situations and to avoid precision when not needed or not possible. The varying degree of match reflects the varying representativenessof a conceptby different instances. Instances of a conceptare rarely homogeneous. Among instances of a concept, people usually distinguish a "typical instance," a "nontypical instance," or, generally, they rank instances according to their typicality. By the use of context, the meaning of almost any concept can be expanded in a multitude of directions that cannot be predicted in advance.An imaginative discussionof this property is by Hofstadter (42), who shows how a seemingly well-defined concept, such as "First Lady," car' expressa great variety of meanings depending on the context in which it is applied. Despite various efforts, the issue of how to represent concepts in such a rich and context-dependent sense remains open. This issue is, of course,crucial for conceptlearning becauseto learn concepts,the learner must be able to represent them. In view of this, a brief review of basic approaches to concept representation may be useful for understanding the current research limitations and directions in concept learning. Smith and Medin (43) distinguish between three approaches:the classical view, the probabilistic view, and the examplar view. The classical view assumesthat conceptsare representableby features that are singly necessaryand jointly sufficient to define a concept.This view is a special caseof the one assumed in this entry, as it does not allow disjunctive concept descriptions. The probabilistic view representsconceptsas weighted, additive combinations of features. Using the aforementionednotion of a feature space,this means that conceptsshould correspond to linearly separable subareas in such a space. Experiments indicate, however, that this may be too limiting a view (43). The exemplary view represents conceptsby one or more typical exemplars rather than by generalized descriptions. The notion of typicality can be captured by a measure' called family resemblance.This measure represents the sum of frequencieswith which different features occur in different subsetsof a superordinate concept,such as furniture, vehicle, and so on. The individual subsets are represented by typical members. Nontypical members are viewed as corruptions of the typical, differing from them in various small aspects, as children differ from their parents (e.g., Refs. 44 and 45). Another approach uses the notion of a fuzzy set as a formal model of a concept (46). Members of such a set are characterized by a gradual numerical set membership function rather than by the in-out function seen in the classical notion of a set. This set membership function is definedby peopledescribing the concept and thus is subjective. This approach allows one to expressthe varying degreeof membership of entities in a concept but does not have mechanisms for expressing the context dependenceof the conceptmeaning.
Elements of the above approacheshave been unified in a more recent idea, which postulates that the conceptis characterized by a well-defined description, but the use of this description is flexibl e (47). If an entity does not satisfy the description precisely, a consonance degree is computed that specifiesthe degreeto which the description is satisfied.Thus, objectsprecisely satisfying the formal description can be considered as typical conceptmembers and those that satisfy approximately as less typical, with the degree of membership defined by the consonancedegree. In the case of disjunctive descriptionsthe component(conjunction)that explains most of the examplescan be viewed as representing the ideal form of a concept. Other componentsthen represent exceptional cases. The method of computing consonancedegreecan be shared by many concepts;therefore, there is no need for storing a set membership function with each concept,as in the caseof fuzzy sets. The dependenciesamong the attributes characterizing a conceptand its relationship to other conceptscan be expressed in the same logic-basedformalism. Thus, in such a "flexible logic" approach the total meaning of a concept is distributed between its formal description and the function evaluating the degreeof consonance.The description gives the basic meaning to a concept,and the evaluation function allows for its flexibility. Major questions, then, are how to properly distribute the concept meaning between these two componentsand how to expresscontext-dependentmeaning. An adequate concept representation should include not only a description that permits one to recognrzethe given concept among other conceptsor to evaluate the typicality of its members but also a number of other components.It should specify the constraints and correlations among the defining or characteristic attributes, the relationship of the concept to other concepts,its typical and nontypical examples,the dependenceof meaning on different contexts,the purposeand use of the concept,and its position and role in knowledge structures and theories in which it is embedded.Many of these components are present in the representation described in Ref. 48. Murphy and Medin (24) argue that the role a conceptplays in a theory that usesit provides a basis for conceptualcoherence, that is, for explaining why certain classesof entities constitute a meaningful concept and some others do not. Further progress on concept learning is predicated on progress in concept representation.
Conclusion Conceptlearning has been presentedas a processof constructing a concept representation on the basis of information provided by an external source, a teacher, or an environment. The type of transformation performed by the learner defines the learning strategy. The main emphasis of this entry is on inductive learning, which is divided into learning from examples and learning from observation and discovery. Principles are described that underly inductive inference, and several methods are presented for conceptlearning from examples. A number of topics in concept learning have not been covered. Among these are methods for creating new concepts, noninductive learning strategies, techniques for evaluating learned concept descriptions, and learning from noisy or incompletely defined examples. The general referencesinclude papers on these topics.
CONCEPTLEARNING BIBLIOGRAPHY
193
2L R. C. Schank, Looking at Learning, Proceedingsof the European Conferenceon Artifi,cial Intelligence,Orsay, France,July 1982,pp. 11-18. 22. J. G. Carbonell, Learning by Analogy: Formulating and Generalizing Plans from Past Experience,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artifi'cial IntelligenceApproach, Tioga, 1983,pp. 137-161.
1. C. I. Hoveland, A "communication analysis" of conceptlearning, Psychol.Reu. 59(6),46L-472, L952. 2. J. S. Bruner, J. J. Goodnow,and G. A. Austin, A Study of Thinking, Wiley, New York, 1956. 3. E. B. Hunt, J. Marin, and P. J. Stone,Experimentsin Induction, 23. D. B. Lenat, The Role of Heuristics in Learning by Discovery: AcademicPress,New York, 1966. Three Case Studies, in R. S. Michalski, J. G. Carbonell, T. M. 4. A. Newell, J. C. Shaw, and H. A. Simon, A Variety of Intelligent Mitchell (eds.),Machine Learning: An Artificial Intelligence ApLearning in the General Problem Solver, Rand Corporation Techproach, Tioga, 1983, pp. 243-306. nical Report, Santa Monica, CA, 1959. 24. L. Murphy and D. L. Medin, "The role of theories in conceptual G. 5. A. L. Samuel, "Some studies in machine learning using the game coherence,"Psychol.Reu. 92(3), 289-316 (1985). of checkers,"IBM J. Res. Deu. (3), 210-229, lgbg, reprinted in 25. N. Hass and G. G. Hendrix, Learning by Being Told: Acquiring E. A. Feigenbaum and J. Feldman (eds.), Computersand Thought, Knowledge for Information Management, in R. S. Michalski, J. G. McGraw-Hill, New York, 1963,pp. 71-10b. Carbonell, and T. M. Mitchell (eds.), Mq,chineLearning: An Artifi6. M. Kochen, "Experimental study of 'Hypothesis Formation' by Intelligence Approach, Troga, Palo Alto, CA, 1983, pp. 305cial Computer," in C. Cherry (ed.), Information Theory: 4th Lond,on 427. Symposium, Butterworth, London and washington, DC, 1961. 26. D. Michie, "Memo functions and machine learning," Nature 7. S. Amarel, On the Automatic Formation of a Computer Program 218(5136),19-22 (1968). which Representsa Theory, in M. Yovits, G. Jacobi,and G. Gold27. T. M. Mitchell, R. M. Keller, and S. T. Kedar-Cabelli,"Explanastein (eds.), Self-Organizing Systems,Spartan Books, Washingtion-Based Generalization: A Unifying View," Machine Learning ton, DC, L962,pp. 102-L78. 1(1),47 -80 (1986). 8. R. B. Bane{i, Computer Programs for the Generation of New Con28. P. Langley and G. L. Bradshaw, RediscoveringChemistry with cepts from Old Ones, Neure Ergebnisse der Kybernetik, in K. Steinbuch and S. Wagner (eds.),Oldenberg-Verlag,Munich , Lg64, the Bacon System, in R. S. Michalski, J. G. Carbonell, T. M. p. 336. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, PaIo Alto, CA, 1983,pp. 307-329. 9. N. Bongard, Pattern Recognition,Spartan Books,New york, 1gz0 (translation from a Russian original published in 1966). 29. T. G. Dietterich and R. S. Michalski, A Comparative Review of SelectedMethodsfor Learning from Examples,in R. S. Michalski, 10. S- Watanabe,Pattern Recognition as an Ind,uctiueProcess,MethoJ. G. Carbonell and T. M. Mitchell (eds.),Machine Learning: An dologiesof Pattern Recognition,Academic Press,New York, 1968. Artifi,cial Intelligence Approach, Troga, Palo Alto, CA, 1983, pp. 11. M. Minsky and S. Papert, Perceptrons,MIT Press, Cambridge, 41-81. MA, 1969. 30. J. F. Sowa, Conceptual Structures: Information Processing in L2- P. H- Winston, Learning Structural Descriptionsfrom Examples, Mind and Machine, Addison-wesley, Reading, MA 1984. Ph.D. Thesis, Report No. TR -28r, AI Laboratory, MIT, 1920 [re31. H. A. Simon and G. Lea, Problem Solving and Rule Induction: A printed tn The Psychologyof Computer Vision, P. H. Winston (ed.), Unified View, L. W. Gregg, (ed.), tn Knowled,geand, Cognition, McGraw-Hill, New York, Ig7S. Lawrence Erlbaum, Potomac,MD, pp. 10b-I27, 1974. 13. B. G. Buchanan,E. A. Feigenbaum,and J. Lederberg,A Heuristic 32. T. Dietterich and R. S. Michalski, Learning to Predict Sequences, Programming Study of Theory Formation in Sciences,Proc. of the in R. s. Michalski, J. G. carbonell and r. M. Mitchell (eds.),MaSecond International Joint Conferenceon Artificial Intelligence, chine Learning: An Artificial Intelligence Approach, YoL 2, MorLondon, L97L,pp. 40-48. gan Kaufman, Los Altos, CA, 1986,pp. 6g-106. 14. R. S. Michalski, A Variable-Valued Logic System as Applied to Picture Description and Recognition, in F. Nake, A. Rosenfeld 33. C. S. Peirce, Essaysin the Philosophy of Science,The Liberal Arts Press,New York, IgS7. (eds.),Graphic Languages,North-Holland, Amsterdam, Lg72, pp. 20-47. 34- R. S. Michalski, Theory and Methodology of Inductive Learning, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.). 15. D. B. Lenat, AM: An Artificial IntelligenceApproachto Discovery Machine Learning: An Artificial Intelligence Approach, Tioga, in Mathematics as Heuristic Search,Ph.D. Dissertation, Stanford Palo Alto, CA, 1983,pp. 88-L84. University, 1976. 35. L. A. Rendell, Substantial Constructive Induction: Feature For16. P. Langl.y, "BACON: A Production Systemthat DiscoversEmpirmation in Search, Proc. of the Ninth IJCAI, Los Angeles, CA, ical Laws," Proc. of the Fifth International Joint Conferenceon August 1985,pp. 650-6b8. Artificial Intelligence, Cambridge, MA, L977, pp. 344-346. 17. T. M. Mitchell, Version Spaces:A Candidate Elimination Ap36. P. H- Winston, "Learning Structural Descriptions from Examproach to Rule Learning, Proc. of the Fifth International Joint ples," The Psychology of Computer Vision, McGraw-Hill, New Conference on Artificial Intelligence, Cambridge, MA, August York, 1975,ch. 5. L977,pp. 305-310. 37. T. M. Mitchell, P. E. Utgoff, and R. Banerji, Learning by Experi18. P. H. winston, "Learning and reasoning by analogy," CACM mentation: Acquiring and Refining Problem-solving Heuristics, 23(L2),689-703 ( 1979). in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning: An Artificial Intelligence Approach, Tioga, 19. J. R. Anderson, A Theory of Language Acquisition Based on GenPalo Alto, CA, 1988,pp. 168-190. eral Learning Principles, Proceedingsof the SeuenthInternational Joint Conferenceon Artificial Intelligence, Vancouver, British Co38. B. G. Buchanan and E. A. Feigenbaum, ,,Dendral and Meta_ lumbia, August 1981,pp. gZ-109. Dendral: Their applications dimension," Artif. Intell. ll, E-24 (1e78). 20. R. S. Michalski and R. E. Stepp,"Learning from observation:Conceptual clusterirg," in R. S. Michalski, J. G. Carbonell,and T. M. 39. J. R. Quinlan, Learning Efficient Classification Procedures and Mitchell (eds.),Machine Learning: An Artificiat Intelligence Aptheir Application to chess End Games,in R. s. Michalski, J. G. proach, Tioga, Palo Alto, CA, 1988,pp. 881_864. Carbonell,and T. M. Mitchell (eds.), Machine Learning: An Artifi_
CONCEPTUALDEPENDENCY cial Intelligence Approach, Tioga, palo Alto, CA, 1gg3, pp. 463-
482. 40. R. S. Michalski and J. B. Larson, Selectionof Most Representative Training Examples and Incremental Generation of VLl Hypotheses:The Underlying Methodology and a Description of Programs ESEL and AQ11, Report 867, Department of Computer Science, University of Illinois, Urbana, 1928. 41. K. R. Popper, objectiue Knowledge: An Euolutionary Approach, Oxford, Clarendon Press, 1929. 42. D. R. Hofstadter, Metamagical Themas: Questing for the Essence of Mind and Pattern, Basic Books, New York, 198b, Chapter 24. 43. E. E. Smith and D. L. Medin, Categoriesand Concepfs,Harvard University Press,Cambridge,MA, 1981. 44. L. Wittgensteirt, TractatusLogico-Philosophicus,Routledge& Kegan Paul, London, 1921. 45. E. Roschand C. B. Mervis, "Family resemblances:Studies in the internal structure of categories," Cog. Psychol., 7(4), bZB-60b (1e75). 46. L. A. Zadeh, "A Fuzzy-algorithmic approach to the definition of complex or imprecise concepts,"Int. J. Man-Machine Stud. 8(B), 249-291 (1976). 47. R. S. Michalski and R. L. Chilausky, "Learning by being told and learning from examples:An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybeandiseasediagnosis,"PoI. Anal. Inform. Sys. 4(2), L25-L61 (June 1980). 48. D. Lenat, M. Prakash, and M. Shepherd,"CYC: Using common senseknowledge to overcomebrittleness and knowledge acquisition bottlenecks,"AI Mag. 6(4),65-85 (1986). General References T. G. Dietterich, B. London, K. Clarkson, and G. Dromey, Learning and Inductive Inference, in P. R. Cohen and E. A. Feigenbaum (eds.), Handbook of Artifi,ciq,l Intelligence, Vol. 3, 325-511, W. Kaufmann, Los Altos, CA, 1982. J. McCarthy, Programs with Common Sense,Proceedingsof the Sy*posium on the Mechanization of Thought Processes,Vol. 1, National Physical Laboratory, 1958. N. Zagoruiko, Empirical Prediction Algorithms, Computer Oriented J. C. Simon (ed.), Noordhoff, Leiden, The Learning Processes, Netherlands. 1976. R. S. Mlcuemrr UniversitY of Illinois
This work was supported in part by the NSF under grant No. DCR 84-06801,by the ONR under grant No. N00014-82-K-0186,and by DARPA under gsant No. N00014-K-85-0878.
DEPENDENCY CONCEPTUAL Conceptual dependency(CD) is a theory of natural language and of natural-language processing (see Natural-language generation; Natural-language understanding). tt has been developed by Schank with the motivation to enhance one's ability to construct computer programs that can understand language well enough to summarrze it, translate it into another language, and answer questions about it. At the heart of the theory lies the conjecture that language is a medium whose purpose is communication. Therefore, the central issue dealt with by the theory is the kinds of things that can be communicated, the meaning content of the communication.
What inferences are made? When are these inferences made? Where do they come from? For example, most peoplewould agree that the sentence"John sold his old car" contains a referenceto money even though the word "money" is not mentioned in the sentence.Furthermore, most people would agree that as a consequenceof John's action, he no longer owns that car. Any computer program that understands this sentencemust answer no to the question "Does John own the car?" and yes to the question "Did John receive money?" How could a program know that? To model language understanding on a computer, one needs a strong theory of human inference that operates on the level of conceptual manipulations. Furthermore, in order for a theory of language to have relevance in the field of AI, it must provide a representation of meaning as well as the means to map into and out of that representation (seeRepresentation,knowledge). Conceptual dependencytheory is a theory of the representation of meaning. It is a representation of everyday concepts and events in a way that reflects natural thinking and communication about those conceptsand events. At the time of its development, the approach taken by Schank was not considered unusual within the AI framework. Since AI is largely an experimental field, the theory and its computer implementations were viewed as investigation into the dynamics of natural-language understanding. However, in the field of linguistics thoughts about the nature and the purpose of language were oriented in a direction opposite to that reflected by Schank's theory, and the latter was consideredradical. ConceptualStructures Conceptual dependencytheory views understanding of natural language as a processof mapping linear strings of words into well-formed conceptual structures. A conceptual structure is defined as a network of concepts,where certain classesof concepts can be related in specific ways to other classes of concepts(seealso Semantic networks). The basic axiom of the theory is: For any two sentencesthat are identical in meaning, regardless of langusg€, there should be only one representation. A corollary that derives from it is: Any information in the sentencethat is implicit must be made explicit in the representation of the meaning of that sentence. The rules by which classes of objects combine may be viewed as conceptual syntax rules. It is important to note that these rules underly the language, but they are independentof it. They are rules of thought as opposedto rules of a language. The initial framework consists of the following rules (1): The meaning of a linguistic proposition is called a conceptualization or CD form. A conceptualization can be active or stative. An active conceptualization consists of the following slots: actor; action; object; and direction, source (from) destination (to) (instrument).
CONCEPTUALDEPENDENCY
A stative conceptualization consists of the following slots: object, state, and value.
Rule 1. Certain PPsCan ACT. For example, the sentence "Kevin walked" may be represented using the primitive act PTRANS (seebelow) as
Each CD has associatedsemantic constraints on the kinds of entities that can fill its slots. These semantic constraints reflect different levels of specificity. For example, some rules may be applied to any object that plays the actor role in any action. On the other hand, other rules will be very specificto a particular action and its slot values. ConceptualDependencyRules The CD rules prefer combinations of conceptsthat go along with experienceover those that violate experience.Of course, it is possible for the CD rules to be idiosyncratic, but most people share enough of them to be able to communicate.What is usually referred to as semantics (qv) in linguistics is the set of operations at the conceptual level. When the word "semantics" is used in the context of the CD theory, it means the experiential laws that allow for concept combinations. The vocabulary that expressesconceptualrules makes use of the fottowing conceptual categories of types of objects described below. PPs:PictureProducersor ConceptualNominals.Only physical objects are PPs. PPs may serve in various roles in the conceptualization.PPs that are animate or have animate properties (like machines) or that are natural forces(wind, gravity) may be actors. Any PP may serve in the role of an object.A PP in the role of sourceor destination refers to the location of that PP. Animate PPs may also serve as recipients.
r95
Actor Action Object Direction
Kevin PTRANS Kevin From: unknown To: unknown
The graphic notation is Kevin<+PTRANSgKevin Here there is a mutual dependencylink between the PP and the ACT and it is representedgraphically by the double arrow. Exactly which PP can do what ACT is to be determined in each caseby the semantic nature of the two objects. Rule2. PPsand SomeConceptualizations CanBe DescribedBy an Attribute. For example, the sentence"Nancy is heavy" may be represented using the following stative conceptuahzation: Object State Value
Nancy WEIGHT Above average
Graphically, Nancy .€
WEIGHT(above average)
ACTs:Actions. Actions can be done by an actor to an object. The major primitive ACTs will be given below.
This dependency between the object and the state is represented gpaphically by the triple arrow.
LOCs: Locations.Every physical ACT has a location that modifies the place of occurrenceof the conceptualizationthat included it. Locations are considered to be coordinates in space.LOCs can modify conceptualizationsas well as serve as sourcesand destinations.
Rule 3. ACTs Have objects. For example, the sentence "Perry Kicked the cat" may be representedusing the primitive act PROPEL (seebelow) as Actor Action Object Direction
Ts: Times. Most conceptualizationshave a time. The time is consideredto be a point or a segment on a time line. This point or segment may be measured on some absolute scale (e.g.,2 p.m. on April 1, 1983) or relative (after last Christmas).
Perry PROPEL cat From: unknown To: unknown
Graphically, AAs: Action Aiders. Action aiders are modifications of features of an ACT. For example, PROPEL has a speed factor, which is an AA. Very few AAs have been developed. PAs:PictureAides,or Attributesof an object. Every physical object can be defined by a set of attribute states with specific values. A PA is an attribute characteristic such as color or size plus a value for that characteristic, for example, blue or 5 ft ( 1 . 5m ) . As part of the CD theory, Schank specifieddifferent ways in which the above conceptual categories can combine. These rules may be viewed as formulating partial semantics of the knowledge representation since they specifysomeof the meaning incorporated in a given dependencybetween concepts.In the following are six such rules. A more complete list and a detailed discussion of this important issue is given in Refs. 1 and 2.
Perry <+ PROPEL L cat
The dependencybetween the act and the object is represented
bv9. Rule 4. ACTs Have Direction. For example, the sentence "Bill fell from the ladder" may be represented using the primitive act PTRANS (see below) as
Actor Action Object Direction
Bill PTRANS Bill From: ladder To: ground
CONCEPTUAT DEPENDENCY
Graphically,
ttseer" ttlearnr"
ground
Bill <+PTRANS.l eiil ladder The dependencybetween the act and the direction is represented graphically bV {. Rule 5. ACTs Have Recipients.For example, the sentence "John donated blood to the Red Cross" may be represented using the primitive act ATRANS (seebelow) as Actor Action Object Direction
John ATRANS blood From: John To: Red Cross
Graphically, Red Cross John 19 ATRANS 9ltood John The dependencybetween the act and the direction is represented graphically by -. Rule 6. ACTs Can Have InstrumentalACTs. For example, "John hit Bill with his hand" would be Actor Action Object Instrument
John PROPEL Bill Actor: John Action: MOVE Object: hand
Directt'" il::Ti,i"n"
Graphically, John
giil
1I JohneMOVE+hard
t
r-l AJ The Need For ConceptualPrimitives The requirement that sentencesthat have the same meaning be represented in the same way cannot be satisfied without some set of primitive ACTs. The ACTs presentedhere are not category names for verbs. Rather they can be consideredto be elements of those verbs. An analogous situation is the formation of compoundsfrom the basic elements in chemistry. As is demonstrated below, the use of such primitives severely reducesthe inference problem in AI since the inference rules need only be written oncefor any ACT rather than many times for each verb that referencesthat ACT. For example, in a situation that involves the transfer of information, one rule may state that "if you MTRANS something to your long-term memory, then it is presentthere (i.e.,Youknow it)." This rule is true whether the MTRANSing was expressedusing the verbs
tthearr"
ttinformr"
or "remember."
This
is be-
causethe inference comesfrom the ACT rather than the verb. The need to introduce a vocabulary of primitives into the knowledge representation schemeoriginates from the need to write general rules in the most succinct way. For example, when two sentences describe the same event and have the same overall meaning but a different form, the CD representation has to be identical. To illustrate this point, considerthe following example (3): 1. 2. 3. 4. 5. 6.
John gave Mary a book. Mary took a book from John. Mary received a book from John. John sold Mary a book. Mary bought a book from John. Mary traded a cigar to John for a book.
Undoubtedly, these six sentenceshave common meaning elements. To uncover this faet, it is presented in the following a stepwise analysis of the conceptual structures underlying the first two. As was mentioned earlier, in a CD representation an event is always describedby a combination of actors, actions, objects, and directions. Thus, to represent the event "John gave Mary the book," one starts with: Actor Action Object Direction
Unknown Unknown Unknown Unknown
Not all of the fillers for the slots are given in the sentence. What one is told though is that the actor is John and the object is book. For the moment one assumesthat the action is "give," whose direction is the animate object Mary. This produces: Actor Action Object Direction
John Give Book Mary
Consider now the analysis of the sentence "Mary took a book from John." The actor here is Mary, the object is again a book, and the direction would seem to be John. But at this point it seemsthat there is something missing in this singlevalued description of direction since there is a difference between "from" and "toward" that this representation does not capture. The problem is that directions must have at least two parts in order to be specified.If the direction is from John, it has to be to somebodyelse. Likewise, in the earlier sentence, just because there was only "to Mary" does not mean that there is no "from" part to be provided by the remainder of the text or to be figured out from previous knowledge. Therefore, the analysis must be revised as follows: Actor: John Action: give Object:book Direction: To Mary From unknown
Actor: Mary Action: take Object:book Direction: To unknown From John
At this stage of the analysis the meaning representation of the two sentencesdoes not look very much alike. This is becausethe meaning of "give" and "take" has not yet been dealt
CONCEPTUALDEPENDENCY
with. To consider what their meanings might be, one can attempt to fill the empty slots in the direction roles. Actor: John Action: give Object: book Direction: To Mary From John
Actor: MarY Action: take Object: book Direction: To Mary
197
The sentence"Mary traded a cigar to John for a book" has a similar representation except that cigar replacesmoney in the object slot of the structure on the right.
The PrimitiveACTs
The following is a list of the most important primitive ACTs and examples of their use. ATRANS is the transfer of an abstract relationship, such as The two sentenceshave different actors, but they involve possessionownership or control. For example, one senseof the an event whose overall consequenceis the same.The effect of verb "give" is to ATRANS something to someoneelse; one the event is the transfer of possessionof the book from John to senseof the verb "take" is to ATRANS something to oneself. Mary. The only difference between them is the focus on the The verb "buy" is made up of two conceptualizations that actor of the event. In CD, transfer of possessionis called causeeach other: one is an ATRANS of money and the other is ATRANS, and the representation for the two sentencesnow an ATRANS of an object being bought. looks as follows: PTRANS is the transfer of the physical location of an object. For example, the action "go" is to PTRANS oneselfto a place; Actor: Mary the action "put" is to PTRANS an object to a place. In certain Actor: John Action: ATRANS casescertain words only imply PTRANS during the analysis. Action: ATRANS Object:book For example, the verb "throw" means PROPEL (see below) Object: book To Mary Direction: and the PTRANS causedby it has to be inferred. Since most Direction: To Mary From John things that are PROPELed are also PTRANSed, the inference From John mechanism will have to decide in each case of PROPEL if PTRANS is true too. It is important to notice that the words "give" and "take" PROPEL is the application of a physical force to an object. are not lost during the analysis but can be recoveredfrom the The ATRANS conceptualization. is used whenever any force is applied regardless of in PROPEL the direction information natural-language generator that can do that is mentioned be- whether a movement (PTRANS) took place. For example, the verbs"pushr""pullr" "throwr" "shover"and "kick" havePROPEL low. What are the benefits form this analysis? as part of them. The sentence "John pushed the table to it can ATRANS, sees system the wall" is a PROPEL that causesa PTRANS. The sentence the Every time Economy. make the appropriate inferences by applying the inference "John threw the ball" is a PROPEL that involves a simultarules attached to this primitive action. Consider a casewhere neous ending of a GRASP act. Often words that do not necesthe inferences were attached to the verbs themselves.For ex- sarily mean PROPEL can imply PROPEL. For example, ample, "give" would have attached to it the inference "if you "break," which means to do something that causesa specific give somebody somethitg, then they have it," and "take" kind of physical-state change, often implies that PROPEL would have attached to it the inference "if somebodytakes causedthe state change. MOVE is the movement of a body part of an animal by that somethirg, then they have it." This situation would involve tremendous duplication for every verb that inferred transfer of animal. MOVE is nearly always the ACT in the instrumental possession.This loss of economywould be considerableas it conceptualization for other ACTs. For example, in order to applies not only to "give" and "take" but to hundreds of other "throw," it is necessaryto MOVE one'sarm; in order to "kick," it is instrumental to MOVE one'sfoot; in order to "hand someverbs that involve transfer of possessionas well. thing," it is instrumental to MOVE hand. Noninstrumental Similarityin Meaning. The representationcapturesthe sim- uses of MOVE are verbs as "raise your hand" and "scratch." ilarity in meaning between the two sentences.The perspective GRASP is the grasping of an object by an actor. For examis slightly different in each case since the actors in the events ple, the verbs "hold," "grabr" "let go," and "throwt' involve are different but the overall meaning is very close and the GRASP or the ending of GRASP. meaning representation reflects this fact. INGEST is the taking of an object by an animal to the To consider a further ATRANS example, examine next the inside of that animal. Most commonly, the objectsof INGEST sentence"Mary bought a book from John": are food, liquid, and gas. Thus, the verbs "eat," "drink," "smoke," and "breathe" are common examplesof INGEST. Actor: Mary Actor: John EXPEL is the expulsion of an object from the body of an Action: ATRANS Action: ATRANS animal into the physical world. Whatever is EXPELed is very Object:money Object:book likely to have been previously INGESTed. Words for excretion Direction: To Mary Direction: To Mary and secretion are described by EXPEL. The verbs "sweat," From John From John "spit," and "cry" ate common examplesof EXPEL. MTRANS is the transfer of mental information between The € is a notation for causal.Thus, A < B meansA causedB. animals or within an animal. For the purposesof the analysis The double causal above indicates that the two events caused here, memory is partitioned into three locations:the CP (coneach other. What is the analysis of "John sold a book to Mary?" sciousprocessor),where things are thought of; the LTM (longIt seemsobvious that the meaning is very close.[Context may term memory), where things are stored; and IM (intermediate introduce deviations in the meaniilgs, but these are dealt with memory), where the current context is stored. The various using higher level memory structures, such as scripts and senseorgans can also serve as sourcesin an MTRANS. Thus, plans (4,5).1 the verb "tell" means MTRANS between people,the verb "see" From John
198
CONCEPTUAL DEPENDENCY
means MTRANS from eye to cP, the verb "remember,,means MTRANS from LTM to cP, the verb "forget" means the inability to MTRANS from LTM to CP, and the verb "learn,,means the MTRANSing of new information to LTM. MBUILD is the construction by an animat of new information from old information. The verbs "decide," "conclude,', "imagi[€," and "consider" are common examplesof MBUILD. SPEAK is the action of producing sounds.Many objectscan SPEAK. fn the case of humans SPEAKing is usually instrumental for MTRANSing. The verbs "ruy," "sing," "purr," and "scream" are common examples involving SPEAK. ATTEND is the action of attending or focusing a senseorgan toward a stimulus. For example, the verb "listen" means ATTEND ear, and the verb "see" means ATTEND eye. ATTEND is almost always referred to as the instrument of MTRANS. Thus, for example, in CD "see" is treated as MTRANS to CP from eye by the instrument of ATTEND eye to object. StativeCD Forms Stative conceptualizations that are attribute-value statements use a large number of scales. These scales often run from - L0 to 10 and can also be used to indicate changes in states. Some of the scalesthat are used are shown below: Health (-10 - dead, -3 - under the weather, - d - (negative increment) got sick, 10 _ perfect health). Anticipate (-10 - terrified, -2: nervous, +5 hopitg). Anger (goesfrom - 10 to 0). Mental state (-5 - depressed,+2 : happy, *9 : ecstatic). Physicalstate (-10 : end of existence,-5 - damaged,+10 - complete). -2 Awareness(-10 _ dead, -7 _ unconscious, : asleep, : +5 alert). CD-BasedComputerUnderstanding The development of the theory of CD went hand in hand with the development of the computer programs that implemented it. As is often the case in AI research, the computer program served as an experimental device that tested the theory and suggestedmodifications. The first step in the evolution of the computer implementation of the CD theory was the MARGIE system (6). It had three distinct pieces:a parser that mapped sentencesinto CD forms, [a memory mechanism that generated and stored inferencesabout the meanings generated by the parser, and a generator that translated the conceptual meanings back into natural language. The following is an example of MARGIE input and output (3): Input: John gave MarY an aspirin. Output 1: John believed that Mary wants an Aspirin. Output 2: Mary is sick. Output 3: Mary wants to feel better. Output 4: Mary will ingest the aspirin. Schank and his co-workers recognizedat the time that the above division into modules was somewhat unrealistic since
the three modules must share data and processeswith each other. However, their intention was to test how CD could function as a meaning representation language, and the above modular design was sufficient for that purpose.MARGIE was the first computer program that made inferences from input sentencesin the context of an overall theory of the inference process.A full discussionof the MARGIE system and its modules is given in Ref. 6. To get the flavor of the computer programs that implement the CD theory, consider some examples of procedures that handle someof the conceptualinformation processinginvolved in using the primitive ACT ATRAI\IS. The codefor these procedures,which is written in the programming language LISP (qv), is taken from the programs featured in Ref. 3. MakingInferences.The primitive ACTs may be represented in the program as a data structure. For example, the following LISP function, when called with its four arguments, will build a conceptual structure for ATRANS. (DE ATRANS(ACTOR OBJECTTO FROM) (LIST'ATRANS(LIST'ACTOR ACTOR) (LIST'OBJECTOBJECT) (LIST'TOTO) (LIST'FROMFROMI Organized around each primitive ACT, there should be inference-generating procedures. For example, to compute the consequencesif an ATRANS, one may have the following: (DE ATRANS-CONSEQS( ) (NOTICE$(ACTOR)-CD-) (HAS$(TO)$(OBJECT)))) (ADD-CONSEQ (NOTICE$(ACTOR) (ADD-CONSEQ (IS-AT$(OBJECT) $(TO))) ((COND($(FROM) (NEGATE(HAS (NOTICE$(ACTOR)(ADD-CONSEQ $(FROM)$(OBJECT] In this codethe dollar sign indicates a function that can fetch the filler of the slot (role) that follows it. The function ATRANS-CONSEQS essentially says the following about the consequencesof an ATRANS: the actor of an ATRANS knows it happened;the actor knows that there is a resulting change of possession due to the ATRANS; the object that was ATRANSed changed location; and the filler of the FROM slot knows he no longer has the object. This list of consequencesis obviously not complete,but it should serve as an illustration for the types of knowledge organized around the primitive actions. LanguageGeneration.In English, ATRANS may be expressedas "take" if the filler for the actor slot equals the filler of the TO slot or as "give" otherwise. This can be handled by the following simple procedure: (DSPATRANS (coND ((EQUAL$(ACTOR) $(TO)) (SAY-SUBJ.VERB'(ACTOR)'TAKE) (SAY-FILLER'(OBJECT)) (SAY.PREP'FROM'(FROM))) (T (SAY-SUBJ.VERB'(ACTOR)'GIVE) (SAY.FILLER'(TO)) (SAY.FILLER'(OBJECT)] The function DSP attaches the procedureunder the name of the ACT so that it can be evaluated by the generator when the
C O N N E C T I O NM A C H I N E S
199
Models of Thought and Language, W. H. Freeman, San Francisco, ACT is to be expressed.It should be noticed that the informacA, 1973. tion contained in this word definition comesfrom both concep3. R. C. Schank and C. K. Riesbeck,Inside Computer Understanding, tual as well as language-specificsources.
Parsing.The conceptualparser, like the generator,usesinformation from many different sources. Conceptual parsers use expectation to guide their processing(seeParsing, expectation driven). Since the knowledge representation contains semantic constraints around the primitive acts and their role filler, the parser uses this knowledge for effective focusing of attention and context-dependentdisambiguation. As an example of a lexical entry that can be used by a conceptual parser, consider the lexical definition of the verb "take." fn this definition the verb "take" means that someone ATRANSed something to the subject. "Take" looks for a noun phrase to fill the object slot.
Lawrence Erlbaum, Hillsdale, NJ, 1981. 4. R. C. Schank and R. P. Abelson, Scripts Plans Goals and Understanding, Lawrence Erlbaum, Hillsdale, NJ, 1977. 5. R. C. Schank, Dynamic Memory: A Theory of Learning in Computers and People, Cambridge University Press, Cambridge, United Kingdom, 1982. 6. R. C. Schank, ConceptualInformation Processing,Elsevier, New York, L975. General References Schank R. C., and P. G. Childus, The CognitiueCompu.ter,AddisonWesley,Reading, MA, 1984.
(DEF-WORDTAKE ((ASSIGN-PART.OF-SPEECH-'VERB -CD-FORM-(ATRANS ?GET-VAR3 ?GET.VAR2 ?GET-VAR1 ?GET-VAR3) GET.VARI-SUBJECTGET.VAR2NIL GET.VAR3NIL) (NEXT-PACKET ((TEST(EQUAL-PART.OF-SPEECH-'NOUN.PHRASE)) (ASSIGNGET-VAR2.CD-FORM-I
S. L. Hnnrr SUNY at Buffalo
CONNECTIONMACHINES
Connection machines are a class of parallel computers designed for symbolic computation, especially AI. Many current AI programs run so slowly on conventional serial machines that they are impractical for everyday use and difficult to put to the test. Connection machines promise to increase the spled of these programs and to make possible the development of the The function DEF-WORD stores a definition of a word under a even more complex, and hence slower, programs that will be word. The definition consistsof a list of requeststhat represent written in the future. different expectations. Further definitions of the conceptsas In typical current machines, the cpu and the memory are well as implementation details of the parser are shown in separate. Data are stored in the memory, and all computation Ref. 3. is performed in the CPU. Since the CPU can only do one thing In both the parser and the generator the language-specific at a time, the time to process a certain amount of data inparts can be changed independently of the parts that contain creases almost linearly with the amount of data. AI programs the knowledge sources.This fact considerably facilitates the often have very large databases that take a long time to matask of multilingual parsittg and generation. nipulate. The guiding idea behind the development of connection machines is that every piece of data should be able to do its own computation. The processing power is more uniformly ConcludingRemarks distributed throughout the memory so that larger databases The theory of CD and its various computer implementations can have more computation devoted to them.
have had an impact on the way natural-language processingis perceived.It brought forward the notion of language-independent conceptualprimitives. It provided a content-basedknowledge organi zation scheme that facilitated expectation-based language processing.As any AI theory should, cD theory suggests extensions. Higher level memory structures, such as scripts and memory organization packets, were later introducedinto the theory of language processing.Theseprocessing structures are essential for effective understanding and learning in situations that involve complicated but commonly occurring sequencesof CDs (3,5).In retrospect,the theory of CD can be viewed as a part of a commonsensetheory of language processing.
BIBLIOGRAPHY 1' R' C. Schank, "Conceptual dependency: A theory of natural language understanding," Cogn. psychol., B(4), SE2_6BL (Lg7Z). 2' R' C. Schank, Identification of Conceptualizations Underlying Natural Language, in R. C. Schank and K. M. Colby (eds.), Coiputer
Hardware Connection machines are composed of a very large number (50,000-1,000,000) of small processors connected ly a highspeed communication network. Each processor stores a very small piece of data, such as might be stored in a few words on a conventional machine. The communication network, or "router," permits any processor to send a message to any other processor, with a delay of only a few dozen machine cycles. Each processor is too small to fetch its own instruction stream. Instead, all processors receive the same instruction stream, which is generated by a conventional serial machine, called the "host machine." The programmer does not always want all processors to do the same thing. To allow different processors to perform different actions, processors may selectirruty ignore the instruction stream. Each processor contains a hardware flug, which, when set, causes the processor to ignore the instruction stream. It is also possible for the host machine to issue unconditional instructions that are not blocked out by this flug.
200
coNNECT|ONtSM
The host machine provides a front end for the connection machine. It provides features that are not practically performed directly on the connectionmachine. These include programmer support such as text editors, operating systems, compilers, and network and terminal interfaces. Arry task that cannot be done in parallel is faster on the host machine than on the connection machine. The connection machine can be thought of as a "symbolic manipulation accelerator" for the host machine, analogousto a floating-point accelerator.A single connectionmachine processorcontains an ALU whosedata paths are 1 bit wide. A l-bit-wide ALU is capableof performing any computation that can be done by wider ALUs, albeit more slowly. Arithmetic operations that can be performed in a single clock cycle on conventional processorstake time proportional to the length of the arguments on a connectionmachine. For example, to add two 16-bit numbers in every processorof a connection machine would require 16 clock cycles. For this reason, connection machines are not particularly distinguished as "number crunchers."
W. D. Hillis, "The connectionmachine: A computer architecture based on cellular automata," physica lOD, zr}-22g (1gg4). C. FByNMAN Thinking Machines Corp.
CONNECTIONISM
Connectionism is a highly parallel computational paradigm that appears to promise efficient support of intelligent activities such as vision (qv) (1-3), knowledgerepresentation(4-g), natural-language understanding (qv) (g-11), learning (qv) (L2-L4), and motor control (f S-fl).Connectionism suggests that piecesof information be representedby very simple computing elements that communicate by exchanging simple messages.Complex computations are carried out by virtue of massively parallel interconnection networks of these elements. The approach is markedly different from the standard (von Neumann) model of computing in which information is repreSoftware sented as passive patterns and complexity of computations is dependent on the complexity of the processors(programs)that Symbolic data structures are formed out of objects connected use this information. by pointers. To manipulate these structures in parallel requires not only the ability to manipulate the data stored in each individual object simultaneously, as can be done on con- Motivations ventional array processorsystems,but also the ability to perConnectionismwas born out of the difficulties in programming form operations on data spread out over several objects convon Neumann computers to perform intelligent tasks and the nected by pointers. In connection machines the router serves recognition that the computational paradigm underlying the as a communication path between processors.Each object is brain appears to be quite different from that of traditional stored in a single processor.If an object has a pointer to ancomputers. Intelligent activities require the integration and other object, this means it knows the address of the other resolution of large numbers of interacting constraints and object and can thus send messagesto it. By using object-oripiecesof knowledge. For example, visual recognition consists ented programming methods,the activities of multiple objects of mapping the many bits of information (intensities at points, can be coordinated to produce the desired computation. If difline segments,color patches, etc.) in the image into an interferent programs are to be run by different types of data objects, nal model of the object. The integration of these pieces of the instructions can be sent out for the various programs, but knowledge is subject to a great many interacting constraints every processoris told to block out any program that doesnot imposed by world knowledge. The difficulty in visual recogniapply to the data object stored in it. If the user is writing in a tion lies in quickly reducing the combinatorial number of poshigh-level language,this method of instruction delivery is hidsible interpretations to that one that "best" fits the input and den, and it appears that different types of processorsrun difconstraints. Another example of complexity of intelligent beferent programs. havior is the task of understanding the spoken sentences"I Algorithms that benefit from running on connection masaw the Grand Canyon flying to New York" or "The cotton chines are those that require large numbers of similar opera- clothing is made of is grown in Mississippi." LJnderstanding tions on large symbolic database. Many AI tasks are of this these sentences requires the interaction of many levels of type. Examples include sorting, unification, production sys- knowledge;from low-level knowledge that helps parse streams tems, and retrieval from semantic networks. The early stages of morphemesinto words all the way to higher levels of knowlof vision (qv), such as feature extraction (qv) and line detec- edgethat tell you that the Grand Canyon doesnot fly and that tion, can be computed very quickly on connectionmachinesby clothing is not grown. To better understand the weakness of dedicating one processorto each pixel in the image. Connec- the von Neumann model for programming such tasks, consider tion machines include special hardware for fast communica- the simple, low-level vision problem of finding edges in an tion between adjacent processorsrepresenting pixels in order image. A reasonableimage size is 1k x 1K and edgefinding in to further acceleratevision computations. such an image typically takes on the order of 107 machine Connection machines are currently under development at cycles.On a state of the art computer this operation takes 500 the MIT Artificial Intelligence Laboratory and at Thinking ms. The human brain, on the other hand, carries out the entire Machines Corporation. recognition task in about 200 ms. The difference is really startling when one finds that the switching time of a gate in a computer is on the order of 10 ns, and the switching time of a neuron is on the order of 1 ms; a neuron is 105times slower! You Has To Let A. Bawden,What a ParallelProgrammingLanguage s"y, MemoNo. 796,MIT AI Laboratory,cambridge,MA, 1984. Hence the belief that the successof the brain at complextasks Machine,MIT Press,Cambridge,MA, must hinge on its architecture. The brain, it appears,functions W. D. Hillis, The Connection in a highly parallel, distributed manner. The computing ele1985. General Referenees
CONNECTIONISM
ments in the brain, the neurons, are relatively simple (although this is a matter of somedispute), and the complexity of behavior appears to arise from massively parallel neuronal interconnection schemes.Connectionism as a field is an attempt to formalize such a computational paradigm and to examine how it furthers our understanding of intelligent systems. It is important to note, however, that the brain is far more complex than the connectionist paradigffi, and it would be an overstatement to emphasize the analogy between the brain and the connectionist models as they exist today.
increasesits potential. The TAB unit, not getting any input, will remain at low potential. Eventually the BIT unit will saturate (reach maximum potential) and the input is then said to be recognized.This particular scenario is really quite simplistic; noise in the input will easily causeboth units to saturate, but as shown below, connectionism has more machinery that helps build realistic networks. Units are formally defined in terms of a 7-tuple, (q, i, u,p, f, g, h,) (e.g.,Ref. 18): e, d small (<10) set of states. l, a small (-10) set of input tokens (usually the numbers 0, 1, . , 10). u, d small set of ouputs similar to the inputs. p, d small subsetof the real line (e.g.,[0, 1]) that constitutes the internal potential of the unit. f: q x i x p ) p, a next potential function. g : q x i x p ) q, a next state function. h : q x i x p ) u, a next output function.
BasicModel The basic computing elements in connectionism are called units. A single unit (or perhaps a small group of units) represents a piece of knowledge (e.g.,a symbol, a feature, or a concept).Units maintain an internal potential or activation level. The potential is restricted to a small finite subset of the real line [e.g.,(-1, 1)].Units are connectedto other units in that they send and receive messagesfrom other units. These messagesare usually restricted to a small subset of the integers [e.9., (-10, 10)] and are simple functions of the potential. A single cycle of a unit consists of accepting input messages, updating the potential based on a simple function of the inputs, and generating output messages.By "simple" it is meant that the complexity of computation is strictly limited, multiplication and thresholding are about the most complex operations allowed, and the number of operations is on the order of the number of input lines to the unit. It is important to note that the update computation usually integrates the input over many cycles. For example, consider a fragment of a network that recognizesthe written words BIT and TAB (Fig. 1). The lower layer of the network contains three sets of units. Units in the first set represent possibleletters in the first position of the input, units in the secondset represent possible letters in the secondpositionof the input, and so on. By "represent"it is meant that the potential of a unit specifiesthe confidencethat the letter representedby the unit occurred in that position in the input. For example, high potential for the ..B" unit in the first set signifies high confidencethat the first letter is a B. These units generate their outputs by suitably scaling and truncating their potentials. The higher layer consists of two units, one each for the words BIT and TAB. The potentials of the word units specify the confidencethat the input is that word. The word units update their potentials in direct proportion to their inputs. The connectionsin Figure l-, 8s specified by the arrows, are from the lower to the higher level. The operation of the network is as follows: If the net is presented with BIT as the input, the B, I, and T units in the first, second, and third position sets slowly integrate the input and increase in potential. This increase is reflected in an increase in the output messagesof these units. The higher level units are continuously integrating their inputs, and henbethe BIT unit
The function f usually adds a fraction of the normalized sum of the inputs to its current potential; g is typically based on potential; and h is a simple real-to-integer conversionof the potential. Note that even though p theoretically has infinite points, onceimplemented on a digital machine a unit becomes a finite-state automaton. A link is definedby a triple: (S, D,W (e.g.,Ref. 18): ,S,the source unit of the connection. D, the destination unit of the connection. W, d number from a small subset of the real line, called the weight of the link. Typically weights are both negative and positive; negative weights are termed inhibitory and positive weights excitatory. A link works by takittg the output of the sourceunit, multiplying it by the weight, and delivering it as input to the destination unit. Quite often, the weight of a link is associatedwith the input site, reducing a link to a pair (S, D). A connectionist network is said to have a completeda computation when it has formed a stable coalition (18). When a correctly setup network is given an input, the units spend a certain amount of time exchanging messagesand updating their potentials. Eventually the network settles into a stable state. In this state the only units that have high potential are those that constitute the object that has been identified. Alt of the other units are at a low potential. The units that are active form a mutual support group and keep each other active. Such a mutual support group is termed a stable coalition. The framework of connectionism as described above is sparse, leaving many details to the individual researcher.
Hi g h e r level
Lower level P o s i t i o nI Figure
201
2
3
1. simple network to recognize the words BIT and TAB.
202
CONNECTIONISM
However, a core set of issues must be consideredin the development of any connectionist system: representation of knowledge,choiceof the three functions of the units, construction of the network, and the behavior prediction of the network. The first of these issues (representation) divides current connectionist research into two distinct approaches.The "localist" approachadvocatesthe "unit-value" principle, which suggests that every concept or hypothesis be represented by a unique unit (18). The "distributionist" approach suggests that concepts and hypotheses are represented by patterns of activations over large numbers of units (I2). The architectures and behaviors of localist and distributionist networks differ enough that it is useful to divide further details of connectionist nets into these two categories.
underlying lateral inhibition can be extended to interactions between position sets. For example, given that the vocabulary doesnot consist of all possible combinations of the letters, the existence of certain letters in one position inhibits/supports the existenceof other letters in different positions of the input. Theseinterset links also increasethe speedof decisionmaking and help alleviate saturation effects. Consider a recognition task in which a particular feature of the input is obscuredby noise and a spurious feature is present nearby. With the usual bottom-up excitation and lateral inhibition, the spurious feature will quickly rise to peak confidence and the actual feature will be suppressed (see Processing, bottom-up and top-down).The networks describedthus far will not recover from this state. In conjunction with modified potential and output functions, positive feedback can help circumvent such states. In positive feedback,also called top-down LocalistConnectionism excitation, links run from high- to low-level units, thus allowLocalism requires that each concept or hypothesis about the ing high-level knowledge (perhaps from context information) world be representedby one unit (as in Fig. 1). Links represent to guide the behavior of lower units (3,L1,16).Units are deone of two relationships, support (excitatory) or opposition (in- fined to include a secondstate that is entered when the potenhibitory), between hypothesesor concepts;akin to the logical tial reaches the peak value. The potential function ( f) of the A + B and A + NOT B. Localist nets tend to form hierarchical first state is modified to include a spontaneousdecay compolayers of units. This reflects the hierarchical nature of most nent. In the second state f is set to nothing but decay. The problem spaces, the efficiency of hierarchical computations, output function of the second state is modified to clamp the and the easewith which IS.A and PART.OF relationships can output to zero. Hence, in the manner of the refractory period of be wired up using the one-unit one-conceptapproach and ex- neurons, units accept no input and generate no output in the secondstate. A unit in the secondstate will therefore take no citatory links. Building localist networks has a strong empiriformally derivpart in the computation and will decay in its potential until primarily because is cal component to it. This it falls back into the first state. With these modifications a ing connectionist structures for a given task is difficult (with some exceptions, e.g., Ref. S). However, there exist powerful correct stable coalition will gently oscillate between the two states. In the noisy case it can be shown that if a spurious heuristics and techniques for building localist networks. concept goes high, it enters the refractory state. During this time it can no longer inhibit the correct concept.Further, posilateral lnhibition and PositiveFeedback.With noisy input all the letter units in Figure 1 will have some potential, and tive feedbackcan help raise the potential of the correct concept the whole network will eventually saturate at the peak poten- to a level where it is competitive with the spurious concept. tial. Lateral inhibition is a powerful technique to overcome Figure 3 shows the earlier example, complete with positive feedback,as it might look in a fully implemented connectionist such saturation effects (18) and is basedon the idea that units net. that represent competing hypotheses should compete directly with each other. In the example presented the individCross-Talk,Binding, and Learning.Representing two new ual letter units in each position set are competing hypotheses; at the same time, for example, red ball and gleen Lateral concepts letter. one have only input can each position in the posito difficulties in localist nets. Given that the conleads stick, a of units the among competition the up sets inhibition cepts red, green, ball, and stick exist in the system, the first tion set by allowing them to mutually inhibit each other. Latproblem is actually linking red to ball and green to stick and is graph of links eral inhibition consistsof setting up a complete of called the binding problem. The second,called the cross-talk weight the that position such set between the units of a problem, is preventing the linking of red to stick and green to (see inhibimutually This 2). FiS. 1 graph is every link of the ball. tory network is called a winner-take-all (WTA) net becauseit Cross-talk is usually prevented by detecting spatial or temcan be shown that it allows only one unit to reach peak potenporal order in the input (3). A possiblemechanism for binding over input of integral tial; this unit is the one with the highest of forming dynamic links between conceptsvia a mulconsists sharpen WTAs operation. network's the an initial portion of in a layer is connectedto the competition among the units of a class and significantly tilayered network in which each unit and so on. If the last layer, next the in units of number Iower the overall level of activation in the network. The ideas a fixed
TAB
2
P o s i t i o n1 FigUre
2. Lateral
intiibition
in the example of Figure 1 .
CONNECTIONISM
203
+
Figure 3. Network with positive-feedbacklinks added.
layer wraps around to the first, any unit in the first layer can set up a link with any other unit in that layer even if the units have limited fan-out. Unfortunately, once a link has been set up, the units in that link can no longer participate in other links, and the efficacy of the schemedrops very quickly (18). Learning in connectionist nets is based on changing the weights on links. There is a long history of researchinto learning networks based on the Hebbian notion that the link between two units should be strengthened if the two units are active together (19). However, making arbitrary connections is limited by the fan-out of the units, although the dynamic link approachcan be used to extend the fan-out of units (18). Another approach is to ensure that the network has many "free" units that are randomly connected to concept units. These free units are recruited in order to form a new combination out of existing concepts(13).Unfortunately, none of these schemeshave proved to be practically useful. Predictabilityand Convergence.The previous discussionrefers to networks "completing" their computations. It is actually a difficult matter to decidewhen a network has generated its answer becauseunits continue to compute and communicate in the final state. A closely related issue is that of provirg, from the network structure, that the net will actually converge to the desired answer in any situation. In practice, the final state of the system is obvious in its relative stability; building networks to perform given tasks is not a black art. Most localist systems are empirical in nature, and arguments about their predictability are rooted in the experimental verifications of the behaviors of the systemsand their substructures (e.g.,Refs. 2 and 3). Formal proofs of behavior have yet to be developed for localist nets though they will certainly prove useful as systems get larger. However, it must be pointed out that the units and networks of connectionism are nonlinear elements, and it is well known that convergenceproofs for nonlinear systems are almost nonexistent. The directions toward a theory of convergencelie in the analysis of / (and perhaps9). The key appearsto be to considerinputs to a unit to be piecesof evidencethat must be integrated to make a decision. unfortunately, existing theories of evidence (e.g., Ref. 20) have not been helpful in analyzing localist nets. Feldman and Shastri (8) have developed a theory of evidential reasoning that appears to be successfulfor a limited class of networks used in knowledge representation. DistributedConnectionism
in efficiency of representation. (In the limit, N conceptsrequire N binary units in a localist representation but only log, N units in a distributed representation.) Representing conceptsas patterns over large numbers of microfeatures also increasesthe reliability of the network. The net can now suffer limited local damage without impairing performance significantly. The links represent microrelations (inferences,constraints), parts of a regular inference that are too small to be individually named. Though the actual update functions used by the units are different, the mechanisms of computation (i.e.,the units exchangeactivations and update potentials) are much the same for distributed and localist networks. As in the localist approach, a distributed network finally settles on a stable coalition of active units, a pattern that least violates the microconstraints imposed by the links. The answer is the entire pattern of activation instead of one high-level unit as in localist nets. Representing concepts as patterns over shared microfeatures admits a first-order solution to the problem of generalizing concepts.Activating or changing a conceptin a computation consists of changing the pattern of activity and weights over the constituent units and links. Since these units and links are shared by similar concepts,the change is transferred to them. If the conceptswere entirely unrelated, such transferrence would be interference, but since in reality the concepts are interrelated, the effect is a form of generaltzatton. The IS.A structure is relatively easy to implement in distributed networks. A pattern that represents a conceptcan be thought of as consisting of many subpatterns, each subpattern naming and providing the distinguishing information for its class in the hierarchy. For example, one subpattern distinguishes the concept within its immediate peer group, for example, SPRINGER and ENGLISH among SpANInr,S. Another distinguishes SPANIELS within DOGS and so on. The shared microfeatures, the subpattern SPRINGER is shared with the concept SPANIEL, also give rise to inheritance via the transferrence mechanism describedabove.Unfortunately, once the pattern-subpattern relationship is used to represent IS.A information, it becomes very difficult to represent the PART.OF structure within distributed nets. Additionally, concepts are represented by entire patterns of activation, and it becomesdifficult to represent concept and subconceptat the same time. A possible approach to representing pnn''.Of' structure is to recognize that the subconceptsthat composea conceptare actually role-filler pairs and to allow the subpatterns to represent such pairs. The representation now becomes significantly more complex than before, and the cross-talk problem restricts the network to activatitrg only one concept at a time.
The distributed approach to connectionism is based on representing conceptsas patterns over large numbers of units (e.g., Ref. L2). Each unit is said to represent a microfeature, apart of a concept that is too small to have a name. Conceptsthat are Constructionof Networks. Hand-wiring a distributed netsimilar share microfeatures (units), and therefor. ,i-ilar con_ work to perform a given task is practically impossible for cepts have similar patterns. The sharing of units also results obvious reasons: Given that units and links have no immedi-
204
CONNECTIONISM
ately accessiblesemantics,there is no basis for choosingpatterns of activation or interconnection. It is therefore clear that a successfuldistributed approach must be based on networks learning the appropriate representations. The approach must include a formal framework from which it must be possibleto derive the learning rules and the proofs of convergenceof the system once it has learnt. An approach that has had some successin this one that is basedon statistical mechanics Q2). Consider a network built of units that have only two states, [0, 1], with the links still representing microinferences.If the structure of the links is such (e.g., completely symmetrical connections) that each unit can decide if the global sum of constraint violations (or relation matches) is changed by switching state and if such updates are carried out asynchronously, it can be shown that the system will convergeto a network optimum. Unfortunately, the optimum may be local in the space of network patterns. Overcoming local minima is made possible by introducing noise in the form of a global parameter called temperature. Each unit now switches to a better state with a probability dependent on the goodnessof the state and the temperature: p(switch; -
7|;^*,
where E is the goodnessof state and ? the temperature. When ? is high, units flip even if doing so takes the state away from an optimum point. When ? is low, units flip only toward an optimum. The activation or confidenceof a microfeature is now represented by the probability that its state is 1. The claim made is that, for any given probability p, there exists a temperature trajectory in time from high T to low 7 such that so "cooling" the system will result in a probability p of the system being in its globally optimum state. The behavior of such a system is akin to many systems analyzed in statistical mechanics, and the cooling referred to above is exactly the process of annealing for atomic systems. It is also possible to derive a weight change algorithm, which offers one form of learning in such networks, from the physics of such systems (L2). UnfortunatelY, although the approach is very elegant and offers the first formal convergence and learning theories in connectionism, in practice it has proved difficult and time consuming to get a system to learn even simple patterns. Applicationsand Summary The ideas of connectionismdate back to the work of McCulloch and Pitts (2L) and the perceptrons (qv) of Rosenblatt (22). These early computational models of neural networks used computing elements that were simple threshold logic units (TLUs). TLUs are devices that have binary output that goes high if the weighted sum of the inputs exceedsa threshold. Rosenblatt'swork consistedprimarily of showing that systems that were composedof multiple layers of such TLUs exhibited learning behavior in pattern associationtasks. Unfortunately, the behavior of these networks did not scale to interesting tasks and Minsky and Papert (2D later showed that this was becausethe TLUs were too simple. Note that the elements of modern connectionism are more complex; they are multistate, continuous potential machines. Around the same time Quillian (4) proposedusing marker propagation between nodesof a semantic network to carry out associative matches. The computation was a simple digital version of connectionismwith the drawback that the network
was quickly floodedwith markers. A similar idea reappeared later in the form of NETL, d system developedby Fahlman (6) that also used marker passing to carry out limited inference in a semantic network. Collins and Loftus (5) attempted to overcome the problem of marker saturation by using analog messagesand computations. They used decay in the activation to control saturation and, although the approachworked, it was fragile in that the behavior was very sensitive to the setting of the decay parameter. More recently Feldman and Shastri (8) invoked much of modern connectionism,developeda theory of entropy, and proposedan inference mechanism that not only controls saturation but also attempts to provide a framework for evidential reasoning. Vision was the arena in which modern connectionismmade its initial appearance.Low-level vision computations appear to be particularly well suited to connectionist computations becausethey require the integration of large numbers of simple, interacting pieces of knowledge. Ballard (2) shows how different formulations of connectionist nets can be used for low-level computations such as shape from shading, computing optical flow (qv), and so on. It is possible to argue that connectionism is well suited to the entire vision enterprise. It appears that, Iooked at in one w&y, there is not much differencebetween the kinds of computations performed at the low, mid, and high levels of vision. Sabbah(3) describesa systemin which the basic connectionist paradigm is used for all levels of the vision task, from line finding to object recognition. Atthough the system appearsto show that connectionistvision is viable, fult generality of connectionismfor all of vision has yet to be demonstrated. Feldman (24) attacks the interaction and representation issues for the highest levels of knowledge in vision. He attempts to provide a framework for representing knowledge of both individual objects and their spatial relationships in specific situations. The proposal has to be fully worked out but is the first approachto integrating a computational model of low- and mid-Ievel vision with the higher levels of knowledge that a system must possess. Natural language is much like vision in that early stagesof processingappear to require much interaction between many pieces of different kinds of knowledge. Cottrell and Small (9) proposea connectionist schemefor the disambiguation of word sensesin a sentence.Waltz and Pollack (10) also describea connectionist architecture for parsing sentences.Early experiments show that the approachesdo produce appropriate behavior (e.g.,handling garden-path sentences),but current implementations are quite small. Connectionism has also provided a computational framework for the study of both high- and low-level motor control (15-17). Rumelhart and Norman (15) developed a connectionist system that simulated the interactions of high-level movement commands for finger placement in typing. Their model replicated many of the common error patterns found in skilled typists. Addanki (17) completesthe circle by showing that modern connectionism is a viable computational paradigm for the study of neural motor control circuits. The study shows how connectionist models can account for some of the nonlinearities and anomalies inherent in the oculomotor system of the higher primates. In summary, it can be said that connectionismoffers a computational paradigm that appears to be well suited for the synthesis of intelligent systems (seeRefs. 25 and 26 for more papers in connectionism).Although they are still in the early stages of development, formal theories of convergence,learn-
CONSTRAINTSATISFACTION
irg, and computation are immature, and the many systems that do exist today attack only a small portion of their domains the wide range of applicability and the feasibility of overcoming the combinatorial nature of AI problems make connectionism very attractive. The simplicity of computing elementspermits implementing connectionist nets in the technologiesof tomorrow, and though it may be argued that connectionist nets do rely on massiveamounts of communication(expensive in architectures), the expensemay be outweighed by the gains in controlling combinatorial growth. In summary, it seems clear that connectionism holds much promise for future research in AI and that this promise will be fulfilled as the properties of connectionist nets are better understood.
205
20. G. Shafet, A Mathematicq,lTheory of Euidence,Princeton University Press, Princeton, NJ, Lg7G. 2L W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in neural nets," Butt. Math. Biophys. 5, llb-lgz (1e43).
22. F. Rosenblatt, Principles of Neurodynamics, spartan, New york, 1962. 23. M. Minsky and S. Papert, Perceptrons,MIT Press, Cambridge, MA, 1969. 24. J. A. Feldman, "Four frame suffice:A provisional model of vision and space,"Behau.Brain sci. B, 26s-2gg (lggb). 25. J. A. Feldman (ed.),Cog.Scj. g(1) (198b).SpecialIssueon connec_ tionist models and their applications. 26. J. A. Feldmatr,D.H. Ballard, c. M. Brown, and G. s. Dell, Rochester ConnectionistPapers:1979-1985,TechnicalReport I72,ComBIBLIOGRAPHY puter Science Department, University of Rochester, Rochester, NY, 1995. 1. C. M. Brown,"Computervisionandnaturalconstraints."Science
224(4655), (1984). 1299_1305 2. D. H. Ballard,"Parameternets:Towardsa theoryof low-level General Referenees vision," (1984). AI J.22,235-2GZ G. S. Dell, A Spreading Activation Theory of Retrieval in Sentence 3. D. Sabbah,"Computingwith connections in visualrecognition of Production, Technical Report 21, Department of Psychology,Uni-
origami objects,"Cog. Scj. g, ZE-80 (1ggb). 4. R. M. Quillian, Semantic Memory, in M. Minsky (ed.),Semantic Information Processing,MIT press, Cambridge,MA, pp. 227-270, 1968. 5. A. M. Collins and E. F. Loftus, "A spreadingactivation theory of semantic processing,"Psychol.Reu. gz,407-429 (lgzb). 6. S. E. Fahlman, NETL: A Systemfor Representingand,Using Real World Knowledgr, MIT Press, Cambridge, MA, Ig7g. 7. G. E. Hinton, Implementing SemanticNetworks in Parallel Hardware, in G. E. Hinton andJ. A. Andersen(eds.),ParaltelMod,elsof AssociatiueMeffioU, Erlbaum, Hillsdale, NJ, 1981. 8. L. Shastri and J. A. Feldman, Evidential Reasoningin Semantic Networks: A Formal Theory , Proceedingsof the Irtinth IJCAI, Los Angeles,CA, pp. 465-474, 198b. 9. G. W. Cottrell and S. L. Small, "A connectionistschemefor modeling word sense disambiguation," Cog. Brain Theory 6, gg_120 ( 1e83). 10. D. L. waltz and J. B. Pollack, "Massively parallel parsing: A strongly interactive model of natural language interpretation,,' C o g . S c l .9 , 5 I - 7 4 ( 1 9 8 b ) . 11. J. C. McClelland and D. E. Rumelhart, "An interactive model of context effects in letter perception.Part I: Basic findings,,,psychol.Reu.88, 325-407 (1981). 12. G. E. Hinton, T. sejnowski, and D. Ackley, ,,Boltzmannmachines: Constraint satisfactionmachinesthat learn," Cog.Scl.g, L47-169 (1985). 13. J. A. Feldman, "Dynamic connectionsin neural networks,,, Biot. Cyber. 46, 27-89 (1982). L4' D' E. Rumelhart and D. Zipser, "Feature discoveryby competitive learningi' Cog. Scl. g, TE-LL} (Lggs). 15' D. E. Rumelhart and D. A. Norman, Simulating a Skilled Typist: A Study of Skilled Cognitive-Motor Perfor-"r.u, Technicat Report SL02,Institute for cognitive science,ucsD, 1gg1. 16. D. H. Ballard, Task Frames in Robot Manipulation Proceed,ings , of the Fourth National conference on AI, Austin, TX, 1gg4, pp. 16-22. 17' S' Addanki, Applications of ConnectionistTechniques to Simulations of Motor Control Systems,Ph.D. Thesis, Computer Science Department university of Rochester,Rochester,Ny, 1gg3. 18' J' A. Feldman and D. H. Ballard, "Connectionistmodels and their properties,"Cog.Scl. 6, 208_284(19g2). 19. D. o. Hebb, "The organization of Behauior,,,wiley, New york, 1949.
versity of Rochester,Rochester,Ny, 1994. M. Fanty, A Connectionist Simulator for the Butterfly, Technical Report L64, Computer ScienceDepartment, University of Rochester, Rochester,NY, 1986. s. L. small, L. shastri, M. L. Brucks, s. G. Kaufman, G. w. cottrell, and S. Addanki, ISCON: A Network ConstructionAid and Simulator for ConnectionistModels, Technical Report 10g, Computer Science Department, University of Rochester, Rochester, Ny, 1983. W. Kornfeld, Using Parallel Processingfor Problem Solving,AI Memo 561, MIT AI Labs.,Cambridge,MA, Lg7g. S. AonANKr IBM
CONNIVER CONNIVER is a model and languagefor generalproblem solving (qv) developedby Sussmanand McDermott at the MIT AI Lab for overcomingPLANNER's backtracking problem by introducittg multiprocessing control in which a processwith a certain data environment stays around to be resumedlater as long as it is directed by some external program (seealso cybernetics).See G. Sussmanand D. V. McDermott, ,,Why Conniving Is Better than Plannirg," MIT AI Lab Memo No. 2SS4 Q972) and G. sussman and D. v. McDermott, .,From PLANNER to CONNIVER-a genetic approach,"rn proc. Fatt Joint 9o^p. conf., Anaheim, cA. Dec. E-7, 1972. AFIps press, Reston,VA., pp. IL71-1 I7g (Lg7D. A. HaNyoNG YuHnN SUNY at Buffalo
CONSTRAI NT SATISFACTION Constraint satisfaction is an umbrella term for a variety of techniques in AI and related disciplines. In this entry attention is focused on the main approaches, such as backtracking, constraint propagation, and cooperative algorithms, with some consideration given to the motivations and techniques underlying other constraint-based systems.
206
CONSTRAINTSATISFACTION
The first class of constraint satisfaction problems considered are those in which one has a set of variables each to be instantiated in an associated domain and a set of Boolean constraints limiting the set of allowed values for specifiedsubsets of the variables. This general formulation has a wide variety of incarnations in various applications: it is a general search (qv) problem. One standard approach involves backtrackin* 1qv);various forms of "intelligent" backtracking are surveyed. A complementary approach based on the class of consistencyalgorithms has some nice properties that are described and illustrated. The secondclass of problems consideredare the numerical optimization problems that arise when one is designing a system to maximize the extent to which the solutions it provides satisfy a large number of local constraints. Algorithms for their solution are based on gen eralizations of the consistency algorithms for applications primarily in computational vision (qv). These algorithms, which have a high degree of potential parallelism, are variously known as cooperativeor probabilistic relaxation algorithms. One can caII these two problem classesBoolean constraint satisfaction problems and constrained optimization problehs, respectively. As with all dichotomies,this one is not absolute. Some approacheslie between these two poles; others combine them. There are, in fact, many other dimensions along which one could categorrzethe area, but this is the best first cut. BooleanConstraintSatisfactionProblems
they only constrain individual variables or pairs of variables. These restrictions apply to the map-coloring example above. However, they are not necessary for some of the techniques reported here to be applicable. For example, supposeone were planning the layout of furniture in an office. The position of each item of furniture would be a variable, with an associated domain that would contain an infinite number of pairs (or triples, if rotations are allowed) of real values. Those domains would have to be describedintensionally by, for example, describing the boundaries of the connectedsubspacespermitted for that item. The constraints, such as "The wastebasketmust be within three feet of the chair. The door must be unobstructed" must also be specifiedintensionally using, perhaps, algebraic inequalities on the values of the constrained variables.Moreover, one might have p-ary relations such as "The desk must be between the chair and the door." Crossword puzzles are used here as a tutorial example of the conceptsof constraint satisfaction. Consider the puzzle in Figure 1. To simplify the presentation, assume that one is required to find in the given word list the eight words that comespondto 1 across, 2 down, and so oh, with duplicates allowed. The reader should try to solve this simple CSP now, introspecting on the methods used as one goes through the processof looking for a solution. In general, one may represent the satisfiabitity decision problem for a CSP as equivalent to determining the truth value of a well-formed formula in fi.rst-order predicate logic (qv):
(x, e D") ( ! l r r X 3 r 2 .) ' . ( 3 r , ) ( q a D ) ( x 2 e D r ) ' ' A Boolean constraint satisfaction problem (CSP) is character. , P{x) n Pz@) . . . A Pn(x) A Pn(xr, x) A P:;.(h, xe) tzed as follows: given is a set V of n variables {q, uz, un\,associatedwith each variable ui is a domain Di of possible A''' A P n * t ,P n ( x n - tx, r ) ( 1 ) values. On some specifiedsubsetsof those variables there are constraint relations given that are subsets of the Cartesian Here P;; is only included in the formula if i < i since it is product of the domains of the variables involved. The set of assumedthat Pii(xi, rc) : Pij(xi, xi).Initially here, only constraints representable as unary and binary predicatesare consolutions is the largest subset of the Cartesian product of all the given variable domains such that each n-tuple in that set sidered. For the crossword puzzle the unary constraints {Pi} satisfies aII the given constraint relations. One may be re- specifythe word length. Prrequires that the word starting at 1 quired to find the entire set of solutions or one member of the acrosshave five letters. The binary constraints arise when a set or simply to report if the set of solutions has any mem- word acrossintersects a word down. For example,Pp requires bers-the decisionproblem. If the set of solutions is empty, the that the third letter of word 1 acrossbe the same as the first Ietter of word 2 down. In general, but not for this example, CSP is unsatisfiable. A surprisingly large number of seemingly different applica- p-ary predicates (1 ' p = n) are required. For binary predicates another convenient problem repretions can be formalized in this way. Someof them are enumerated below. Of particular theoretical interest is the map-color- sentation is a network consisting of a graph with a vertex for each variable with its associateddomain attached and an edge ing problem. Consider,for example,the problem of deciding if given planar each map such that a to color suffice three colors region is a different color from each of its neighbors. This is WordList formulated as a Boolean CSP by creating a variable for each region to be colored,associatingwith eachvariable the domain {red, green, blue} and requiring for each pair of adjacent reLaser Aft gions that they have different colors. Since the map-coloring Lee Ale problem is known to be NP-complete and is therefore believed L in e Eel inherently to require exponential time to solve, one does not to deterpolynomial time algorithm S ails i k e H expect to find an efficient mine if a generat CSP is satisfiable. Sheet Hoses Various restrictions on the general definition of a CSP are Steer Keel possible.For example, the domains may be required to have a Tie Knot finite number of discrete values. If this is the case, the constraining relations may be specifiedextensionally as the set of all p-tuples that satisfy the constraint. One may further reFigure 1. A constraint satisfaction problem: Solve the crossword. quire that all the relations be unary or binary, that is, that
CONSTRAINTSATISFACTION
between the vertices corresponding to each pair of directly constrained variables. In the crossword puzzle constraint network shown in Figure 2, the initial domain of words for each variable is shown inside the vertex for that variable. Note that only words satisfying the unary word length constraint are shown. In general, for p-ary constraints (p > 2) a hypergraph representation with a hyperedge for each constraint connecting the p vertices involved is required. Backtrackingand ConsistencyAlgorithmsfor Constraint SatisfactionProblems Generateand Test. Assuming finite discrete domains, there is an algorithm to solve any CSP. The assignment spaceD _ x Dn is finite, and so one may evaluate the D1 x D2 x body of formula (1) on each element of D and stop if it evaluates to true. This generate-and-test algorithm is correct but slow. In the crossword puzzle the number of different assignments to be tested is 58, or 390,625. BacktrackingAlgorithms. Backtracking algorithms systematically explore D by sequentially instantiating the variables in some order. As soon as any predicate has all its variables instantiated, its truth value is determined. Since the body of formula (1) is a conjunction, if that predicate is false, that partial assignment cannot be part of any total valid assign-
I Across
2 Down
4 Across
3 Down
7 Across
5 Down
8 Across
Figure 2. The crossword puzzle constraint network.
6 Down
2O7
ment. Backtracking then fails back to the last variable with unassigned values remaining in its domain (if any) and instantiates it to its next value. The efficiency gain from backtracking arises from the fact that a potentially very large subspace of D, namely, the product space of the currently unassigned variable domains, is eliminated by a single predicate failure. The reader is invited to solve the crossword puzzleby backtracking, instantiating the words in the order 1-8. Start with word 1 acrossas "hosesi' try word 2 down as "hoses";Pnis not satisfied so all potential solutions with these two choicesfor 1 and 2 are illegal. Next try word2 as "laser," and so on. The efficiency ofbacktracking has been investigated empirically (1-4). Good analytical results are hard to come by, but see Refs. 4-7. Other factors being equal, it pays to preorder the variables in terms of increasing domain size; one thereby maximizes the average size of the subspacerejected by the failure of a predicate. This principle has been extendedto dynamic reordering (2,8) involving one or two or more levels of look-ahead search to find the variable with the smallest domain of acceptable values to instantiate next. Regardlessof the order of instantiation, one almost always observesthrashing behavior in backtrack search (9). Thrashing can be defined here as the repeated exploration of subtrees of the backtrack searchtree that differ only in inessential features, such as the assignments to variables irrelevant to the failure of the subtrees (10,11).This ubiquitous phenomenonis indeedobserved, in abundance, 8s one develops the search tree for the crossword pazzle. Many of the techniques reported in this section and the next are designed to reduce or eliminate thrashirg, essentially by providing the algorithms with better memories. One form of so-calledintetligent backtracking usesvarying degreesof look-ahead to delete unacceptablevalues from the domains of all the uninstantiated variables (4,r2), Another form of intelligent backtracking identifies the latest instantiated variable causing the failure and fails back to it, possibly across many intervening levels (3,10,19). Gaschnig,s (14) backmarking algorithm is another potential improvement on backtracking that looks backward to remember value combinations that guarantee failure or successso that they are not retried elsewhere in the tree. Similar techniques are exploited in dependency-directed backtracking (15) and truth or belief maintenance systems (16) (seeBacktracking, dependencydirected; Belief systems). Those systems generally abandon the chronological stackbasedcontrol discipline of pure backtracking, allowing choices to be undone independent of the order in which they were made. The AI programming languages Micro-planner and PROLOG are based on automatic backtrack control structures. The possibility of providing some of the techniques surveyed in this entry as general AI tools should not te overlooked(10,11,16). ConsistencyAlgorithms. Another family of algorithms complementary to the class of backtracking algorithms has been characterized as the class of consistency algorithms (11). By analyzing the various causes of thrashing behavior in backtracking, various authors have described algorithms that eliminate those causes(11,17-20). They are most easily de_ scribed in the network model of CSPsgiven earlier. For binary constraints each edge in the graph between vertices I and j is replacedby arc (i,;) and arc (j, i).
2OB
CONSTRAINTSATISFACTION
Node i, composedof vertex i and the associateddomain of variable ui, is node consistent iff (Vr)lx e D,l)
lAcross
Pt(*1
Each node can trivially be made consistent by performing the domain restriction operation Dr+Dirr{xlP(r)} In the crossword puzzle this correspondsto the obvious stratery of deleting from each variable's domain any word with the wrong length (and, in a real crossword puzzle, &trI word that does not fit the clue). Similarly, arc (i,,r) is arc consistent iff (Vr)[r e Dil: (!ly)(y e D) ,y piL@,y)
4 Across
that is, if for every element inD;there is at least one element in D_isuch that the pair of elements satisfy the constraining predicate. Arc (i, j) can be made arc consistent by removing from D; all elements that have no corresponding element in D; with the following arc consistencydomain restriction operation: Dr * Di A {rl(Ag0
e D;) A Pii@, y)}
Heel Hike Keel Knot Line
(2)
In the language of relational databasetheory this operation is known as a semijoin (21).A network is node and arc consistent iff all its nodes and arcs are consistent. A given network for a CSP can be made nodeconsistentin a single passover the nodes.However, a single pass of the arc consistencyoperation over the arcs will not guarantee that the network is arc consistent. One must either repeat that pass until there is no reduction in any domain in a complete pass or use a more selective constraint propagation teehnique that examines each of the arcs, keeping track of the arcs that may have becomeinconsistent as a result of deletions from the domain at their destination node (11,18).The first approachis a symbolic relaxation algorithm and suggests parallel implementation techniques (22).The secondis usually more efficient on a single processor. The Waltz (18) filtering algorithm uses the secondapproach (seeWaltz filtering). That arc consistencyalgorithm requires time linear in the number of constraints to make the network arc consistent (23). The best framework for understanding these algorithms is to see them as removing local inconsistenciesfrom the network which can never be part of any global solution. When those inconsistenciesare removed, they may cause inconsistencies in neighboring arcs that were previously consistent. Those inconsistencies are in turn removed so the algorithm eventually arrives, monotonically, at a fixed-point consistent network and halts. An inconsistent network has the same set of solutions as the consistent network that results from applying a consistencyalgorithm to it, but if one subsequently applies, say, a backtrack search to the consistent network, the resultant thrashing behavior can be no worse and may be much better. The result of applying algorithm AC-3, a serial arc consistency algorithm (11), to the crosswordpuzzleconstraint graph is shown in Figure 3. The arcs to be initially examined are put on a queue in the order t2, 21, 13, 3L, 42, 24, 43, . . . , 86, 68, and the deleted words are italicized. When words are deleted from a domain at a node, all the arcs into that node not currently waiting on the queue (except the reverse of the arc causing the deletion) are addedto the end of the queue. In Figure 3 the numbers follow-
8 Across
14 15 2I 16
27
29
22 23 24
30
3r 32
Figure 3. The arc consistentconstraintnetwork.
ing the deleted words give the order in which they are deleted. Since each domain is eventually reduced to a singleton set of one element, there is a unique solution to the puzzle, shown in Figure 4. A gen erahzation of this technique is to path consistency (11,19).A path of length2 from nodeI through nodem to node / is consistent iff (VrXVz)[Pii&, z)l I (:ly)(y € D*) A Pi*@, y) n Pmj(y, z) A path is made consistentby deleting entries in the relation matrix representing P ij if it is not. Analogous relaxation and propagation techniques apply. A further generaltzation to p-ary relations is the conceptof
Figure 4. The crossword puzzle solution.
CONSTRAINTSATISFACTION
ft-consistency(1 = p, k - n) (20). A network is ft-consistentiff 1 variables satisfying all given any instantiation of any & it is possible to variables those among constraints tfr. direct find an instantiation of any hthvariable such that the /avalues taken together satisfy all the constraints among the k variables. Node, arc, and path consistencycorrespondto k-consistency for k - | ,2,and.3, respectively.A network is strongly k' consistent iff it is7-consistentfor allT ' k. Another generalization to p-ary relations Q4) involves only arc consistency techniques. Even though a network is strongly /s-consistentfot k < n, there is no guarantee that a solution exists unless each domain is reduced to a singleton. One approach to finding complete solutions is to achieve strong n-consistency(20), but that upptouch can be very inefficient as Freuder's algorithm fot kconsistencyis O (nk) (zil. A secondapproachis to achieveonly strong arc consistency. If any node still has more than one element in its domain, choosethe smallest such domain and recursively apply strong arc consistencyto eachhalf of it. Only the arcs coming into that node can initially be inconsistent in the two subproblemsgenerated.A third and related approach is to instantiate the variable with the smallest domain that has more than one value in it and repeat arc consistencyrecursively, backtracking on failure. Again, initially only the arcs coming into that node can be inconsistent. Or, fourth, one can simply backtrack on the consistent network using any of the backtracking algorithms shown above. This is the sense in which backtracking and consistency algorithms are complementary. Backtracking is a depth-first instantiation technique whereas consistencyis an elimination approachruling out all solutions containing local inconsistenciesin a progressively wider context. Other names for the class of consistencyalgorithms include discrete relaxation, constraint propagation, domain elimination, range restriction, filtering, and full-forward look-ahead algorithms, but these terms do not properly cover the range of consistencytechniques describedhere. Applications.As surveyedin Refs.4 and 11, various combinations of backtracking and consistencytechniques have been suggestedfor, or actually applied to, finite assignment space puzzles such as cryptarithmetic probleils, Instant Insanity, magic and Latin squares, and the ru-queensproblem (not to mention crossword puzzles). Other applications reported include map coloring, Boolean satisfiability, graph and subgraph homomorphism and isomorphism, database retrieval for conjunctive queries, theorem proving (qv), and spatial layout tasks. The first application in computational vision (qv) was to edge labeling (18) (see Edge detection), but there have been many others reported including sketch map interpretation (24) and consistencyfor schema-basedsystems(26). In Ref. 27 arc consistencyis used on a vision problem in which the domains are not discrete. In that application the domains correspond to a range of allowable surface orientations at various locations in an image of a smooth surface.In general, the only requirement for using consistencyis that one be able to carry out restriction operations typified by Eq. (2) on the descriptions of the domains and relations, which may be intensional rather than extensional. Various experimental and theoretical results on the running time of these algorithms have been reported (3,4,18,23, 28-30), but the results must be interpreted with care sincethe authors are not always discussingthe same algorithms, different measures of time are used, some results are task specific,
2O9
and some authors analyze the decision problem and others analyzethe problem of synthesizing the global n-ary relation, r"potting all solutions. More work needsto be done,but at this point the situation is that arc consistency techniques can markedly improve the overall efficiency of backtracking algorithms, BScan the various intelligent backtracking enhancements. The general lessonis that by doing a limited amount of local computation at each level using, s&Y,linear, quadratic, or cubic time, one can optimize backtracking search suffi.ciently to effect an overall substantial improvement in performance on somedifficult problems; however, there is still no adequate theory of how the nature of the task constraints affects the performance of these techniques. RelaxationAlgorithmsfor ConstrainedOptimizationProblems The restrictions on the Boolean CSP paradigm can be relaxed in several ways. In computational vision and other AI domains one is often not just satisfying a set of Boolean constraints but rather optimi zungthe degree to which a solution satisfies a variety of conflicting continuous constraints. Several generalizations of the consistency techniques have been invented to copewith that problem. In Ref. 31 the labels in the discrete domains have associatedweights in the unit interval [0, 1], and the relation matrices are allowed to have entries from t- f , 11.These entries measure the extent to which two values from related domains are compatible. The algorithm looks at each variable domain in parallel, adjusting the weight of each label based on an updating rule that adjusts the weight's previous value using the strength of the connection from this variable to each of its neighboring variables, the compatibility coefficient between this label and each of its neighbor's labels, and the previous weight of that neighboring Iabel. This processiterates until a fixed point is reachedwhen no significant change occurs in any weight or until some other stopping criterion applies. The details of the various updating and stopping rules used by these so-calledrelaxation-labeling algorithms can be found in the surveys in Refs. 32 and 33, where applications and other variations on this formulation are also given. An interpretation of the weights as probabilities and the compatibilities as Bayesian conditional probabilities was suggested;hence the term "probabilistic relaxation algorithms." The term "relaxation" was suggestedby the loose analogy with the numerical methods used to solve, s&y, the heat equation for a steel plate. However, the probabilistic interpretation has several problems of semantics and convergence,and other interpretations are now preferred. For example, this class of algorithms can be seen as finding the optimal solution to a linear programming problem as surveyed in Ref. 33. Algorithms in this generic class are often termed cooperative algorithms (34,35).Here the senseis that compatiblevalues in neighboring domains can cooperatively reinforce each other by increasing each other's weight. Simultaneously, incompatible values compete, trying to suppress each other. Each value in a domain is competing with each of the other values in that domain. This general class of algorithms is attractive becausethey are highly parallel, requiring only local neighborhood communication between uniform processors that need only simple arithmetic operations and limited memory. These features suggest various implementations for lowlevel perception (such as stereovision) in artificial and biological systems,which are being explored (31,34-39).
210
CONSTRAINTSATISFACTION
The semantics of these algorithms-the specification of what is being computed-has been clarified (40,41).The best formal analysis and design of these algorithms is basedon the conceptof minimization of a figure of merit (or "enerry") of the system under study. If that surface is everywhere a downward convex function of the configuration variables of the system, there is a unique global minimuffi, and steepestdescenttechniques will find it. If that requirement is not met, techniques such as simulated annealing based on the Metropolis algorithm and Boltzmann distributions @2) are useful (seeBoltzmann machines). In Ref. 37 an iterative shape-from-shading(seeShapefrom shading) algorithm is proposed in which a specific figure of merit is minimized. The algorithm is given an image of a smooth surface for which the dependenceof the gray value on surface orientation is known. Since surface orientation at a point has two degreesof freedom, that single constraint is not sufficient. Accordingly, the additional requirement that the surface be as smooth as possible is introduced. The figure of merit is a weighted sum of measures of the extent to which these two constraints are violated. The requirement that it be minimized translates analytically to a very large, sparseset of equations on the values of surface orientation at each pixel in the image. That set of equations is solved by standard numerical iterative relaxation techniques using gradient descent, yielding a simple updating rule for approximations to the surface orientation values. Note, here, however, that the domains no longer consist of a discrete set of possiblevalues with associated weights but simply the best current approximation to the value. Systems Other Constraint-Based The constraint satisfaction approach has considerable attraetion both in AI and other areas of computer science.In graphics and simulation constraint propagation is the mechanism underlying two pioneering systems: Sutherland's Sketchpad (zil and Borning's Thinglab (43). Stefik's Molgen system (44) propagates constraints arising at different levels of planning abstraction to generate plans for gene-splicing experiments. Various systems have been implemented for domains such as circuit analysis (L5,45) and job shop scheduling (46). Other applications in computational vision are describedin Refs. 35, 47, and48. Constraint propagation and data flow as the design principles for new computational architectures are discussed in Ref. 49.Part of the appeal of logic programming (50) (qv) is that attention is focused more on the constraints of the problem and less on the way they are used. There is, for example, less of a distinction between input and output variables in a relational language like PROLOG than in a functional language like LISP. Personal computer spreadsheet systems based on Visicalc and its descendantsalready embody someof these constraint-based ideas. There the variables take only numeric values, and the constraints are simple algebraic formulas, but some of the latest systems allow relaxation for the solution of mutually dependent constraint sets. Conclusions The definition of the word "constraint" varies enormously. It has been taken to mean a relation over a Cartesian product of sets, a Boolean predicate, a fuzzy relatioo, & continuous figure of merit analogous to energy, &r algebraic equation, an in-
equality, a Horn clause in PROLOG, and various other arbitrarily complex symbolic relationships. Nevertheless,underlyit g this variety, a common constraint satisfaction paradigm is emerging. Much of one's knowledge of the world is best expressedin terms of what is allowed or, conversely,what is not allowed. On the other hand, most current artificial computational systems insist on a particular direction of use of that knowledge. This forcesthe designer or user to overspecifycontrol information, leading to undesirable representational redundatrcy, a rigid input-output dichotoffiy, and conceptual mismatch at the human-computer interface. The constraint satisfaction paradigm allows the system designer to concentrate on what, not how. In computational vision, for example, it is crucial to determine precisely how an image constrains the equivalence class of scenesthat could produce it and to identify other constraints that will further constrain the scene. The constraints implicit in other knowledge and data sources can be analyzed and represented. These constraints may be uniformly introduced and used in various directions depending on the curuent availability to the system of specific data and knowledge. BIBLIOGRAPHY 1. D. E. Knuth, "Estimatingthe efficiencyof backtrackprograms," Math.Comput.29,LZL-L36(1975). 2. J. R. Bitner and E. M. Reingold,"Backtrackprogrammingtech1975). niques,"Cornmun. ACM 18(11),651-656(November Measurement and Analysisof Certain 3. J. Gaschnig,Performance Departmentof ComSearchAlgorithms,Thesis,CMU-CS-?9-L24, puterScience, University,Pittsburgh,PA, t979. Carnegie-Mellon 4. R. M. HaralickandG. L. Elliott, "fncreasingtreesearchefficiency for constraintsatisfactionproblems,"Artif. Intell. 14, 263-313 (1980). 5. E. C. Freuder, "A sufficient condition for backtrack-free search," JACM L9,24-32 (1982). 6. P. W. Purdom, Jt., and C. A. Brown, Evaluating SearchMethods Analytically, Proceedings of the Second National Conferenceon Artificial Intelligence, Pittsburgh, PA, pp. t24-I27 , 1982. 7. B. Nudel, Consistent-Labeling Problems and Their Algorithms, Proceedingsof the SecondNational Conferenceon Artifi,cial Intelligence,Pittsburgh, PA, pp. 128-t32, 1982. 8. P. Purdom, C. Brown, and E. Robertson, "Multi-level dynamic searchrearrangement,"Acta Inform. 15, 99-114 (1981). 9. D. G. Bobrow and B. Raphael, "New programming languagesfor AI research,"Comput. Suru.6, 153-L74 (L974). 10. G. I. Sussman and D. V. McDermott, "Why conniving is better than planning," Artificial Intelligence Memo No. 2554, MIT (t972\. 11. A. K. Mackworth, "Consistency in networks of relations," Artif. Intell. 8(1),99-118 (1977). L2. R. M. Haralick and L. Shapiro,"The consistentlabeling problem: Part I," IEEE Trans. Pattern Anal. Machine Intell. PAMI'L, I73184(197U. 13. M. Bruynooghe,"solving combinatorial searchproblems by intelligent backtracking,"Inform. Proc.Lett. lz(L),36-39 (1981). 14. J. A. Gaschnig,A General Backtrack Algorithm that Eliminates Most Redundant Tests, Proc. of the Fifth IJCAI, Cambridgu, MA, p. 457, August 1977. 1b. R. M. Stallman and G. J. Sussman,"Forward reasoning and dependency-directed backtracking in a system for computer-aided circuit analysis," Artif. Intell. 9(2), 135-196 Q977). 16. J. de Kleer, Choices without Backtracking, Proceedingsof the
CONTROL STRUCTURES Fourth National Conferenceon Artiftcial Intelligence, Austin, TX, pp. 79-85, 1984. L7. J. R. Ullman, "Associating parts of patterns," Inform. contrl. 9(6), 583-601 (1966). 18. D. Waltz,"IJnderstanding line drawings of sceneswith shadows," in P. H. Winston (ed.), The Psychologyof Computer Vision, McGraw-Hill, New York, PP. 19-gL, L975. 19. U. Montanari, "Networks of constraints: fundamental propertils and applications to picture processing," Inform. Scie. 7, 95-132 (1974). 20. E. C. Freuder, "synthesizing constraint expressions," Comm. ACM 21, 958-966 (1978). 2I. D. Maier, The Theory of Relational Databases,Computer Science Press,Rockville, MD, 1983. 22. A. Rosenfeld,R. A. Hummel, and S. W. Zucker, "Scenelabeling by relaxation operations,"IEEE Trans. SMC 6, 420-433 (1976). 23. A. K. Mackworth and E. C. Freuder, "The complexity of some polynomial network consistency algorithms for constraint satisfaction problems,"Artif. Intell.25(1), 65-74 (1984). 24. A. K. Mackworth, On Reading Sketch Maps, Proc. of the Fifth IJCAI, Cambridge, MA, pp. 598-606, L977. 25. I. E. Sutherland, Sketchpad: A Man-Machine Graphical Communication System, MIT Lincoln Laboratory Technical Report 296, Cambridge,MA, 1965. 26. W. S. Havens and A. K. Mackworth, "Representingknowledge of the visual world," IEEE Comp. 16(10),90-96 (1983). 27. R. T. Woodhah, A Cooperative Algorithm for Determining Surface Orientation From a Single View , Proc. of the Fifth IJCAI, Cambridge, MA, pp. 635 -64I, 1977. 28. J. J. McGregor, "Relational consistencyalgorithms and their application in finding subgraph and graph isomorphisms,"Inform. Sci. 19, 229-250 (1979). 29. R. Seidel,A New Method for Solving Constraint SatisfactionProblems, Proc. of the SeuenthIJCAI, Vancouver, British Columbia, pp. 338-342,1981.
211
4I. R. A. Hummel and S. W. Zucker, "On the foundations of relaxation labeling processes,"IEEE Trans. Pattern Anal. Machine Intetl. PAMI-5(3), 267-287 (1983). 42. S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, "Optimization by simulated anneali\g," Science220,671-680 (1983). 43. A. Bornirg, Thinglab: A Constraint-OrientedSimulation Laboratory, Report No. CS-79-746,Computer ScienceDepartment, Stanford University, California, 1979. 44. M. Stefik, "Planning with constraints," Artif. Intell. 16, 111-140 (1981). 45. V. E. Kelly and L. I. Steinberg, The Critter System: Analyzing Digital Circuits by Propagating Behaviors and Specifications, Proceedingsof the SecondNational Conferenceon Artificial Intelligence,Pittsburgh, PA, pp. 284-289, L982. 46. M. S. Fox, B. Allen, and G. Strohm, Job-Shop Scheduling: An Investigation in Constraint-Directed Reasonitg, Proceedingsof the Second National Conferenceon Artificial Intelligence, Pittsburgh, PA, pp. 155-158, 1982. 47. R. A. Brooks, "Symbolic reasoning among 3-D models and 2-D images,"Artif. InteIL, l7(L and 3), 285-348 (1981). 48. A. K. Mackworth, On Seeing Thitrgs, Again, Proc. of the Eighth IJCAI, Karslruhe, FRG, pp. 1187-1191, 1988. 49. H. Abelson and G. J. Sussman, Structure and Interpretation of Computer Programs, MIT Press, Cambridge, MA, 198b. 50. R. Kowalski, Predicate Logic a.sa Programming Language, IFIp 74, North-Holland, Amsterdam, pp. b6g-524, IgZ4.
f;1'ffi,'il#ffihcorumbia CONTROLSTRUCTURES
It is necessary to distinguish control structures from algorithms and from virtual machines. Control concernswhat hap30. R. Seidel, On the Complexity of Achieuing k-Consistency,Technipens in a computational process.The word "control" has two cal Report 83-4, University of British Columbia, Department of different meanings. On the one hand, there is the problem of Computer Science,Vancouver, British Columbia, 1983. assuring that a processdoesas little work as necessary.Thus, 31. S. W. Zucker, R.A. Hummel, and A. Rosenfeld,"An application of relaxation labeling to line and curve enhancement,"IEEE Trans. one speaks of "search control" when one wants a search to explore a minimum of wrong paths. This is more properly a Comput. C-26, 394-403, 922-g2g (Lg7T). matter for the study of algorithms. On the other hand, there is 32. L. S. Davis and A. Rosenfeld,"Cooperatingprocessesfor low-level the problem of specifying clearly what should happen in a vision: A survey," Artif. Intell. 17,248-2GB(1981). 33. D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, computational process.The notion of a control structure concerns this problem of specification.It is a more general notion Englewood Cliffs, NJ, L982. than that of an algorithm, though there is no preciseline to be 34. B. Jule sz,Foundations of CyclopeanPerception,University of Chidrawn. cagoPress,Chicago,1971. One must also distinguish the notion of a control structure 35. D. Marr,Visioft,W. H. Freeman,San Francisco,L982. from the notion of a virtual machine. For simplicity, this entry 36. H. G. Barrow and J. M. Tenenbaum, RecoveringIntrinsic Scene assumesthat a virtual machine is defined by a programming Characteristics from Images, in E. M. Riseman and A. R. Hanlanguage, which might be a machine language. A virtual ma(eds.), son, Computer Vision Systems,AcademicPress,New York, chine provides the programmer with a collection of primitive pp. 3-26 (1978). operations (no matter if they have a direct physical embodi37. K. Ikeuchi and B. K. P. Horn, "Numerical shapefrom shading and ment in the architecture of a real machine). More important, occludingboundaries,"Artif. Intell. L7, I4l-184 (1981). 38. S. W. Zucker, CooperativeGrouping and Early Orientation Selec- though, a virtual machine provides an ontology of objectsand processes,on top of which proglammers can build their own tion, in O. J. Braddick and A. C. Sleigh (eds.),Physicaland Biological Processing of Images, Springer-Verlag, Berlin, pp. 926-.894, abstractions.The notion of control originated in the days when 1983. computers all had simple von Neumann architectures and a programmer neededno more sophisticateda metaphor for con39. G. E. Hinton, T. J. Sejnowski,and D. H. Ackley, Boltzmann Machines: Constraint Satisfaction Networks That Learn, Technical trol than simply running one's finger through the code. The Report CMU-CS-84-119, Department of Computer Science, conclusionof this entry suggeststhat the notion of control can Carnegie-Mellon University, Pittsburgh, PA, 1984. becomeinappropriate on a virtual machine that departs sub40. S. Ullman, "Relaxation and constrainedoptimization by local prostantially from this model. cesses,"Comput. Graph. Image Proc. 10, 115-12b (lgzg). Finally, one should distinguish between a particular con-
212
CONTROTSTRUCTURES
trol structure and a whole philosophy and style of programming. For example, object-oriented programming i; a style that requires a particular control structure, the familiar typedispatching procedure call. (With a sufficiently rigid model of types, the outcome of this dispatch can be determined at compile time (1). Thus, a control structure can be entirely a fiction of the virtual machine.)This distinction is particularly important for the history of AI becauseof the frequency with *hi.h subtle and profound philosophies of programming are melted down to catalogs of control and data structures. A control structure must be analyzed in the context of a coherentphilosophy of programming. A control structure is a technieu€, especially one set down as a linguistic construct, that an algorithm can use in determining what happens when on some virtual machine. This entry does not exhaustively treat all the different control structures becauseeach has its own entry in this encyclopedia. Instead, this entry outlines cument issues and describesthe history of AI researchers' attitudes toward processorganization in general. (For discussion of particular control structures, see Agenda-basedsystems; Backtracking, dependency directed; Backtracking; Blackboard systems;Constraint propagation; Coroutines;Distributed problem solvirg; Languages, object-oriented; Logic programming, Means-ends analysis; Metaknowledge, metarules, and metareasoning; Parsing, wordexpert; Processing,bottom up and top down; and Rulebased systems. For discussion of languages, systems and machines for which control structures are a central issue, see Connection machine; Conniver; Eurisko; Hearsay II; LISP; LISP machines;Loops;Merlin; OPS-5;Planner; POP-Z;PROLOG; Simula; and Smalltalk. For discussionof search techniques, see A* algorithm; Alpha-beta pruning; Beam search; Search, best first; Search; Search, bidirectional; Search, branch and bound; and Search, depth first.) History. The history of research into control structures is the history of the proposition that one can usefully pursue issuesof control separately from issues of representation. The 1960ssaw the development of goodways of implementittg and using the basic techniques of serial machine programming: data abstraction, interaction and recursion, the procedurecall, lexical scope, dynamic storage management, and the many varieties of search. During the 1970sAI researchersexplored a variety of nonstandard virtual machines and control structures, including production system architectures, semantic networks and network-based constraint propagation, chronological and dependency-directedbacktracking, declarative progTamming,and a large collection of LlSP-embedded languages incorporating these and other ideas. In the search for methods of general value, control research was generally pursued with little regard for the peculiarities of individual domains. The 1980s have brought an increasing awareness of the value of clearly separating control issues from representation issues,with control generally taking a back seat to representation. Researchhas increasingly concentratedon the details of particular problems and partieular domains, especially in the areas of vision, language comprehension,and motor control. At the same time interest has grown in programming languages, such as PROLOG, that provides a generic control stratery and are intended to allow the programmer to write down not algorithms but knowledge.
A thorough discussion of control structures must relate issuesof control to issuesof programming language design,r€presentation design, and computer architecture. These three topics are consideredin turn below. Control structuresand ProgrammingLanguageDesign Research into novel control structures has often produced novel programming languages that presupposethem. Evaluating such languages is a subtle matter (2). A programming langu agedefines a virtual machine. Ordinarily this virtual machine will have a simple correspondence to the physical machine on which the language is implemented. This is not simply a lack of imagination. The wisdom of an efficiency decision is determined by the physical machine; a good compiler can compensatefor many differences between the virtual and physical machine, but any gross divergenceswill require the programmer to outsmart the language. Consequently, the purpose of most programming language constructs, one might s&y, is to provide abbreviated ways of invoking common conventions in machine language programming. AI language research has a built-in tension: New languages often appeal to as yet unrealrzed computer architectures, but users of these languages are stuck with traditional von Neumann machines. This section concentrates on control structures in languages intended for traditional von Neumann machines. There are, roughly, two purposesthat linguistic support for a control structure can serve: It can support a philosophy of program modularity or it can permit some generalization of the serial nature of the language's semantics. Philosophies of Modularity. Traditional languageslike ALGOL and the early versions of LISP had a philosophy of modularity based on data abstraction and the procedurecall. There was only a simple theory of data types, and individual procedures tended to be quite large. Over time programmers have learned to impose fine modularities on their programs, so that a LISP procedureof over 30 lines is now generally considered bad form. There have been about three responsesto this trend: an increasing emphasis on efficient procedure call implementations, increasingly sophisticatedtheories of data types, and attempts to institutionalize fine-modularity programming in the form of production systems. The developmentof the Schemelanguage (3,4) is representative of the emphasis on efficient procedurecall implementations. Schemeis a variant of LISP employing lexical scopeand a heap-allocatedstack. Sussmanand Steeledemonstratedthat these features permit a programming style basedon intensive use of procedure calls and procedural data objects(5,6). Other dialects of LISP based on dynamic scopeare often difficult to optimize fully, and sequentially allocated stacks make it difficult to efficiently endow procedural objects with their proper semantics[this is called the funarg problem (7)]. The most widespread philosophy of program organization in the AI community is object-oriented programming (8,9). Bits of code (methods) are individuated by both the operation to be performed and the type of the object it is to be performed upon. Types can be built up in a tree (as in Smalltalk) or a general partial order (as in Flavors), with an assortment of conventions governing the combination of the different methods under the same operation in a given type. Much effort has
CONTROL STRUCTURES
213
ever, intermittent garbage collection can ruin the real-time properties of a system. LISP programming relies heavily on efficient dynamic storage allocation. This reliance is especially heavy in an implementation that allocates procedure call frames from the heap rather than on the stack. Efficiency considerations have led most LISP implementations to stack-allocate frames even though this scheme precludes making general use of procedures as data objects. Scheme (4) is an exception,and Interlisp's spaghetti stacks (23) are an attempt at a compromise. Somelanguages [e.g., Sail (24)l support coroutining, a generalization of a serial virtual machine in which a number of processessharing the same memory move through the same program. The single physical processorplays the part of the different virtual processorsat different times. The compiler must keep track of what information must be saved and restored at various specific points in the program (say, at the calls to the intercoroutine communication constructs)when it is time for the physical processorto play a different virtual processor.It is common for an operating system to allow the processorto change virtual identities among various processes at arbitrary intervals at any place in the code outside of declared critical sections. This ability generally requires the compiled code to confine its local state to the registers. Matters become more complicated if a language incorporates backtracking into its semantics, as did Planner (zil. Bookkeeping is required if any information-losing operation is to be undone by backtracking. (For example, an assignment forgets the old value of the assigned variable and a branch forgets where it came from.) The POP-2 language's state-saving features are used in implementing not only chronological backtracking but also coroutining (26). This bookkeeping can grow immense if backtracking can always potentially reach arbitrarily far back. As Ref. 27 points out, Planner had no way of indicating that one was satisfied for all time with the result of some calculation. Additionally, Planner's chronological backtracking typically reversed far too many calculations; dependency-directedbacktracking (28) is an attempt to automate a more accurate pinpointing of the choicesthat actually led to the difficulty that provoked backtracking. The record for the most profound generalization of an underlying virtual machine is held by B-LISP (29), which provides facilities for arbitrary run time modification of the underlying virtual machine. This ability is made possibleby the Generalizationsof Serial Virtual Machines.It is a simple existence of a simple metacircular 3-LISP interpreter, meanmatter to add a new iteration construct to an ALGOL-like itg, an interpreter for 3-LISP written in 3-LISP. The virtual language. But giving a language a new construct that general- machine of 3-LISP is an infinitely deeptower of 3-LISP interizes its virtual machine will have pervasive effectsin the lan- preters, each one running the one above it. A user's program guage's implementation. It is useful to classify extensions to can "reflect," that is, it can ask the interpreter running it to serial languages in terms of the additional work required of apply an arbitrary procedure to its own internal state. Reflecthe implementation. tion allows many common control structures, like nondeterIn FORTRAN the compiler assigns every variable in every ministic choice and LISP's catch-and-throw operations, to be procedure to a fixed machine address. Once a language sup- implemented as user code. The implementation involves no ports recursive procedure calls, a frame must be allocated to infinite towers, of course,but rather a schemefor running only store the values of formal parameters arid local variables for as many levels of interpretation above the hardware as necesevery procedure call. Such a schemerequires the architecture sary (30). (See Ref. 31 for a reconstruction of reflection that to efficiently implement a stack. doesnot require the infinite-tower semantics.)Needlessto say, Most modern languagesprovide somedynamic storageallo- heavy use of reflection makes a B-LISp program hard to comcation features, but it is in LISP that the matter has been most pile efficiently. thoroughly pursued. But storage management is a module in The utility of being able to modify or advise the virtual the run time system; the implementation of arithmetic and machine running one's progTam has long been understood, the procedure call are independent of its sophistication. How- though only recently has technology for doing so been develgone into finding efficient implementations of the resulting generalized procedure call (called messagepassing). It is important to separate the program-organizing role of object-oriented programming from the highly parallel connotations of the vocabulary of objects and messages.The latter aspect of object-oriented programming is central to the Actors formalism (10) but is not an element of everyday proglamming with Smalltalk and Flavors. Production systems began as a model of the human mind (11,L2). Since their inception, they have been used both for psychologicalmodeling (13) and for system buildin g (L4-16). Production systems support a style of programming based on large numbers of small modules, each called a rule or production, arranged around a central databaseor blackboard.At the top level a production system is a loop, on each cycle selecting a production to run and then running it. The processof selecting a production to run has two steps. Each production has associatedwith it someindication, often called a trigger or left side, of when it is appropriate for that bit of code to be run. This trigger generally takes the form of a symbolic expression with someunfilled slots signified by variables. On the first step of production selection every production whose trigger matches an entry in the database becomes a candidate for execution. The secondstep, called conflict resolution, somehow selectsone of the candidates.The virtual machine of a production system is parallel in the sense that all triggers are matched against the databasein parallel. Production systems are also serial in the sensethat only one production is run at a time. [Recent work relaxes this constraint as well as the requirement of a fixed conflict resolution scheme(17).1 With their fine modularity, production systems do not encourage intermediate levels of organi zation. Even the somewhat more structured "heterarchical" architectures, such as that of Hearsay-Il (18,19),were to be criticized as unprincipled by competence-orientedAI researchers.(For a thorough treatment of heterarchical systems see Ref. 20.) The claim that large production systems actually support an effective fine modularity, in the sensethat changesto a system can be localized to one or a few productions, is also open to question (see Ref. 2L for evidence on this point). Production systems have nonethelessproven a valuable vehicle for applicationsinvolving small rule sets;these questionsonly apply to large systems (22).
214
CONTROL STRUCTURES
oped (32,33,L7).Researchmust now seek to reconcile seman_ tic flexibility with effrcient compilation. one promising approach views compilation as a process of specializing an interpreter to run a particular program, treating the program as a constant to be folded into the interpreter (94,95). Makin g a control structure implicit in the semantics of a language raises it to the status of a virtual machine. The difference is that it is no longer optional: If your language incorporates chronological backtracking or backchaining search, for example, so will your program. Users of such languages often find themselves fighting the language to prevent activity that they know to be uselessor destructive. Certainly this was the casewith Planner, and it is often observedwith PROLOG. Conniver went to the opposite extreme, giving the user's program accessto its own run time internals (36). This insight of Conniver lives on in its essentials in Scheme,which unlike Conniver can be efficiently compiled. Design Control Structuresand Representation It has long been understood that giving a program more knowledge can simplify its reasoning tasks. For example, Waltz's scene-labelingprogram (37) was able to rule out most interpretations of a line drawing when given information about edgesand vertices. When given additional information about shadows,it could rule out all but the correct interpretation, using only a simple local-basedconstraint propagation algorithm. The developersof DENDRAL had a similar experience (38,39).Rule sets were addedfor interpreting a variety of tests, each of which constrains the identity of an unknown chemical compound.As new information was added,the number of possible identifications that could typically be ruled out without search dropped from many millions to only a few. Theseprograms did entirely without complex control schemes. This experience suggested to many that complex control schemesare unnecessaryin general, given sufficient study of a program's domain. Although this is certainly an open question, it is generally consideredbad engineering to use complex control structures to compensatefor inadequate domain representations. Much has come to be understood about the relationship and the mechanism between the knowledge a system possesses by which the system deploysthis knowledge.Attitudes toward the distinction are influenced by historical happenstance;the tendency to employ the distinction in order to ignore half of it is still widespread.These issueshave been explored primarily in vision and linguistics, but they apply broadly.
resentations of grammatical competence,and at one time it was difficult for linguists and nonlinguists alike to conceptualize an alternative (seeRef. 41 for discussion;seealso Ref. 4Z). This view is now widely considerednaive. Computational theory has considerably clarified these issues. Central to this development was the school of vision research founded by Marr (43). Marr's importance was arguably lessin specifictheories than in his influential insistenceon a clear distinction between computational theory, an algorithm instantiating that theory, and an implementation of that algorithm. One can caricature two approachesto vision research (likewise other perceptual skills), top-down and bottom-up (seeProcessing,bottom up and top down for more detailed discussion). Top-down research points to ambiguous percepts and insists on control schemesthat can apply general cognition to perceptual interpretation. Bottom-up research prefers to consider unambiguous percepts and postulates self-sufficient ["encapsulated" (44)l modules subserving perceptual interpretation. As a matter of engineering, there is a trade-off: Modular perceptual interpretation gains efficiency at the price of the occasional illusion. The existence and experimental robustness ["cognitive impenetrability" (45)] of perceptualillusions is evidence for the modular view of human perceptual psycholory.
Trend toward Priority of CompetenceResearch.History has left the terms "top down" and "bottom up" with someunnecessary associations. Top-down research has been carried out more by engineers, and bottom-up research has been carried out more by psychologists.Top-downresearchhas emphasized general-purposecontrol schemeswithout paying extensive attention to the perceptsthemselvesor the processesthat generate them. Bottom-up research has emphasizedthat deep understanding of a problem can often eliminate search and the need for complex control schemesin solving it. (This observation is also at the base of the philosophy of most present-day expert systems.)Logically, however, sophisticatedrepresentations and sophisticated control are compatible. The movement toward competence-orientedAI research emphasizesan ambiguity in the term "representation." Much theory of representation attempted to design formalisms, called semantic networks, that allowed the meanings of arbitrary English declarative sentencesto be captured(46-53). On this view, a representation may make ontological assumptions (e.g., that there exist individuals and conceptsand relationships of instantiation and subsumption among them) but no empirical assumptions. An alternative view of representation is that a good representation exploits knowledge about the Competence and Performance.The distinction between world to simptify descriptions of it. Were one to assume that competenceand performance was once poorly understood.Its aII physical surfacesare flat, for example, one could represent principal origin is in Chomsky's distinction between compe- the visual world using lists of corners of planar surface eletence and performance in linguistics (40) (see Competence, ments. For many researchersfirst-order logic servesas a genlinguistic, and the various entries on the theories of gram- eral representation, and different ontologies are formulated for each domain (54,55). Future research must lay out the mar). Chomsky makes the working assumption (roughly) that generation of middle gfound between the top-down and bottom-up stereothere is a mental module responsible for the types. versa. vice language and natural of parse trees from sentences (Actually, one ought to distinguish the strong claim that there Controversy.These distinctions The Procedural-Declarative is a physical module in the brain from the weaker claim that controversy procedural-declarative the in issues the clarify there is a module in the competence,meaning that whatever human underlying knowledge (b6-58). the is whether issue At no employs computations performs relevant the mechanism world the about propositions nongrammatical knowledge in performing them.) It is con- skills is best phrased in terms of Perhaps world. the manipulating for procedures of in terms ceivable that one parses sentencesby referring to explicit rep- or
CONTROL STRUCTURES
the most cogently argued position on the matter is that of Hayes (59), for whom the question is one of apples and oranges, or, roughly, competenceand performance. Hayes insists, furthermore, that the proper medium of expression for competence theories (the competence-performanceand topdown-bottom-up distinctions) is formal logic, specifically some slight extension of first-order predicate calculus (fopc).Hayes points out that many semantic network formalisms are simply improved syntaxes for fopc. It is controversial whether competencetheories are best expressedin fopc or in higher order or modal logics (e.g.,deontic, temporal, or default logics). The question is beyond the scopeof this entry except for one detail. Just as it was once routine to assumethat a parser had to explicitly represent the transformations of its grammar, so it was once routine to assume that mechanizedreasoning with a fopc theory must employ a general theorem prover. This is a possible view, of course, and certainly at least occasionally correct. But it is much less plausible when the logic is very much more general than fopc, for the proof theories of such logics are often computationally intractable. Instead, in designing a mechanism that deploys a competencetheory in carrying out some task, one must design a control scheme that has the effect of formal reasoning without the expenseof fully general inference. Logic Programming.This is the proper context for discussion of logic programming languages such as PROLOG (60,61).Logic programming (qv) gives linguistic recognitionto the distinction between competenceand performance:the programmer writes domain competencein the form of a collection of logical expressions.Somemechanismthen resolvesqueries by somehowtraversing the spaceof logical inferencesfrom the user's premises to some conclusion that answers the query. This mechanism could take any number of forms. At one extreme this mechanism could be a general-purpose theorem prover of exceptional sophistication. This approach would be inefficient for systems of any size. At another extreme the programmer could additionally provide the query mechanism with extensive advice about how to searchthe inference space defined by the program. This advice would preferably be consideredtheorem-proving competenceitself expressedin logical expressions.Although this is a common idea (e.g., Ref. bg), MRS (32,62) is the only practical system that works on this principle. All practical PROLOGs adopt some variant of a compromise position: Code is limited to Horn-clause form, and the mechanism is simple backchaining search through inference space. Textual ordering of expressions determines which branches in the spaceare to be pursued first, and a construct (called Cut) is provided to block search in unpromising directions. An important argument for a fixed control structure is compilation:PROLOG can be efficiently compiled(63),but it is quite possiblethat systems (like 3-LISP and MRS) that take broad classes of advice about control cannot, in general, be efficiently compiled. It is often argued that these amendments to PRoLoG ruin the ideal of logic programming. A common responseis that there is nothing wrong if real PROLOG correspondsas little to the ideal of logic programming as real LISP correspondsto "pure LISP." There is a deeperpoint, however. Until machines become infinite, efficiency-minded programmers will write logic programs with rough ideas about paths through inference spacealready in mind. This is not in itself
215
bad unless there is a difference between the most elegant or parsimonious formulation of the competenceand the formulation that leads a particular proof mechanism to operate efficiently. More experience is required on this point. Control Structuresand ComputerArchitecture Computer architectures have long been designed to improve the efficiency of particular ways of using them. This trend is accelerating as architectures continue to differentiate as to sizeand intended application. For 40 years,though, almost all architectures have been based on serial virtual machines (even when, as is common in large mainframes, there are several processorspresent). Recalling the distinctions elaborated in the introduction, architectural adaptations can be divided into two classes:specializationsto the traditional serial architecture that accelerate particular styles of programming and nonserial architectures that directly support nonserial virtual machines. Adaptation of Serial Machines to Symbolic Computation. Most specializedarchitectures have been concernedwith numerical computations, for example, pipelinittg of operations over large arrays. Since AI is primarily concernedwith symbolic programming, these architectures are of no interest here. Architectures did not make many allowancesfor symbolic programming until the LISP, Smalltalk, and PROLOG machines of the past 10 years. One arguable exception, the hardware support for block-structured programming languages in the Burroughs machines of the 1960s(G4,Ob), was far ahead of its time. How doesa serial architecture support a philosophy of programming? There are two broad answers that are relevant to AI practice, the first typified by the DEC pDp-10 (66) and the secondtypified by the IBM 801 (62). According to the PDP-10 philosophy, the most important adaptation an architecture makes to a style of programming is in its choice of instruction set. By providing a clean, orthogonal instruction set, the architecture encouragescompiled languages over hand coding. Single instructions are provided for each of the language'sbasic operations.The PDP-10'sinstruction set, though quite conventional,was nonethelessdesigned with symbolic computation in mind. The word size (86 bits) was exactly twice the size of an address,allowing a cons node to be cleanly implemented in a word. Thus, its half-word instructions implement LISP's CAR and CDR. Furthermore, it has a single-instruction stack, push, and pop operations.The PDP-10's repertoire of specialized instructions forced later PDP-10 models, like most architectures of that era, to be microcoded. The IBM 801 reverses most of these positions. The 801 group observedthat compiler technology improved greatly in the 1970s. Hand coding is much less necessarythan before, and optimizers can reason well about unconventional instructions and large register sets. To take advantage of these advances, they designed their compiler (for a variant of PL/1) alongsidethe architecture. Freed from having to second-guess the compiler, they could implement a much smaller und less conventional instruction set without microcoding. The result is both improved language support and decreasedcycle time. The modern history of specialized symbolic architectures begins with the MIT CONS machine (68), a conventional
216
CONTROT STRUCTURES
tagged architecture heavily adaptedto LISP, and its successor, the CADR machine. The CADR design has given rise to three lines of development,at Syrnbolics,at Lisp Machines Inc, and at Texas Instruments. Perhaps the critical feature of all these machines is their hardware support for run-time type checking and fast procedurecalls. (Run-time type checking is necessary becauseLISP is not type-safe.)The symbolics 9600 (g) has hardware support for garbage collection and is optimized for messagepassitg, which has becomecentral to LISP systems programming. There is a line of Interlisp workstations developed at Xerox (23), all descendedfrom the Alto (1gZB),a remarkable early workstation. Their architectures,however,are not specializedfor LISP. All the LISP machines are designed from philosophies of program development and user interface. For example, the Symbolics 3600's designerscounted fast incremental compilation among their original design goals, and the Xerox workstations are heavily optimized for fast redisplay of windows on their bitmap displays. Japaneseresearchersare developing workstations basedon PROLOG. The first such machine, called PSI, is a conventional microprogrammed architecture that implements a version of the Warren instruction set (69). It speedsunification the same way the LISP machines speed list processing,using tagged data, a cache,and fast memory at the top of the stack. A more recent design,called PIM-R, is more ambitious (70).The various branches of the program's spaceof inferencesare assigned to different "inference modules." The machine promisesto deliver as much parallelism as is inherent in the program, but the usefulness of the machine depends on finding natural PROLOG coding styles that allow for large amounts of parallelism (seeRef. 7L for work on the inference distribution problem and Ref. 72 for work on a Concurrent PROLOG). All architectures are informed by the statistical properties of the way a style of programming usesa machine. This principle has begun to be applied to AI architectures in a number of ways.
grams become garbage almost immediately (76,77). By never actually allocating these short-lived nodes, the 3600's "ephemeral garbage collection" scheme can delay the headachesassociatedwith ordinary garbage collection in large virtual address spaces(28).
Novel Architecturesfor Symbolic Computation.Much current researchis concernedwith architectures comprising large numbers of small processors.Designing such a machine is easy;designing one that can be programmed is not. Decomposing a problem into natural piecesis much easier on a serial machine than on a parallel machine becausethere is no requirement that the pieces make themselves useful simultaneously.Discussionof control structures on massivelyparallel machines requires new metaphors. On a serial machine one thinks of a program as having spatial extent; the program is a map over which a locus of control passes.On a massively parallel machine the metaphor of spatial extent passesfrom the program to the process,which might be thought of as "spread out" over the machine. A programmer tries to decomposea problem into piecesthat can be implemented in parallel. Such a decompositionexists when the problem has a structure that can be mirrored in the structure of a process.There may be many problems with no such structure. Consequently, one principled approach to designing massively parallel machinery is to isolate a class of useful processeswith a common structure and design a machine with that structure. One such structure is simple two-dimensionality, a trait shared by many problems, most notably in graphics and image processing. Two-dimensional machines are especially convenient to build becausethey are easily embeddedin a three-dimensional physical space. There are several machines with two-dimensional connectiontopologies,including the NASA MPP (79), the CMU WARP systolic array machine (80),and the ConnectionMachine (CM) (which also has a more general message-routing network) (81). These architectures are designedto move information from a processorto its imme1. In Smalltalk it happens that on almost all occasionsthat diate neighbors. These machines have very simple organizamessageM is sent to object O, O is of the same type as the tions, but they can be difficult to program even on the simplest last object that was sent message M. (There may be M two-dimensional problems when the problem is not an even methods for each of several types, and locating the right multiple of the machine size or when edge effects become method requires some sort of table lookup on every call.) clumsy. The FAIM-I architecture has a planar organization The Smalltalk implementation of Ref. 73 takes advantage but is designed to move messagesquickly across the plane of this observation by caching the method correspondingto without interrupting useful work along the way (82). The the most recent use of each messagename; the Smalltalk NON-VON (qv) machines (83,84)are specialtzedto tree-strucchip (74) performs this caching in hardware. tured processes,particularly for applications involving large 2. The Scheme chip (Til has special support for heap-consed databases. stacks (which, &s mentioned above, are required for effiOther massively parallel machines can be classifiedaccordcient support of Scheme'sgeneral use of procedures).There ing to the topology of the network by which the processors is a separate stack for each register that must be pushed, exchangeinformation. Among machines whose processorsare and the top element of each stack is stored in an adjacent standard commercial microprocessors,the BBN Butterfly (85) register, thus saving the call to cons that would ordinarily uses a crossbarcircuit of the same name, the Maryland ZMOB be required to push it. Because the depth of each stack (86,87)usesa fast circular-shift register, and the Caltech Cosfluctuates greatly (rather than tending to grow and shrink mic Cube (88) is arranged in a four-dimensional hypertorus. monotonically over long periods), much dynamic storage An alternative is represented by the CM, which consists of management is saved. 64,000 very small processors.In addition to their two-dimennodes cons are recently created sional connectivity, the CM's processors can communicate 3. On the Symbolics 3600 stored in a special page of memory that keeps track of through a router whose topology (on the TMC prototype) is a pointers to nodes within the page from outside the page. Boolean hypercube.The CM (like the MPP) is a SIMD (singleWhen this page fills up, nodes that are still pointed to are instruction, multiple-data) machine, meaning that the procesallocated words of ordinary memory. It has long been ob- sors do not have their own instruction fetch-and-decodecirserved that most cons nodes allocated by most LISP pro- cuits but instead share a common instruction bus. SIMD
CONTROLSTRUCTURES operation increases the total computational power of the machine at the price of restricting it to quite homogenous computations. Many AI applications have a homogenous process structure, especially graph-based operations like semantic network lookup (89) and electric circuit simulation. Research on massively parallel architectures currently suffers from a severe shortage of understood programming techniques. It is inordinately difficult to find sensible decompositions of real problems into massively parallel forms. Often these decompositions are in fact brute force solutions of questionable advantage over their more sophisticated serial competitors. Especially sophisticated serial machines may be the most appropriate for these problems.
Conclusion The first research into sophisticated control structures was largely motivated by the extraordinary flexibility of human thought. Computers were too rigid in their operation then, and they still are. There is now a widespread bias against sophisticated control structures that originates in a concern for good representations and principled engineering. The previous section has suggested that the problem lies in the very notion of control. If so, an alternative might be found in more general ideas about the overall structure of highly parallel processes. These ideas will ultimately be reflected in programming languages, representations, and architectures. Programming languages, in particular, have become very good at hiding details of an architecture from the programmer. But hiding the basic nature of an architecture behind a traditional virtual machine makes it impossible to desigR a process structure that fits comfortably into the structure of the real machine.
217
13. P. S. Rosenbloom,The Chunking of Goal Hierarchies: A Model of Practice and Stimulus-ResponseCompatibility, Ph.D. Thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, August 1983. L4. A. Newell, Production Systems:Models of Control Structures, in W. G. Chase(ed.),Visual Information Processing,AcademicPress, New York, 1973. 15. C. L. Forgy, OPSStJser'sManual, Technical Report CMU-CS-8l135, Carnegie-MellonUniversity, 1981. 16. C. L. Forgy, "Rete: A fast algorithm for the many pattern/many objectpattern match problem," AI J. t9, t7-37 (1982). L7. J. E. Laird, P. Rosenbloom,and A. Newell, TowardsChunking As a General Learning Mechanism, Proc. of the Fourth AAAI, Austin, Texas, 1984,pp. 188-197. 18. L. D. Erman and V. R. Lesser, A Multi-level Organization for Problem Solving Using Many, Diverse, Cooperating Sources of Knowledge, Proc. of the Fourth IJCAI, Tbilisi, Georgia, 1975, pp. 483-490. 19. L. D. Erman, F. Hayes-Roth,V. R. Lesser,and D. R. Reddy,"The Hearsay-Il speech-understandingsystem: Integrating knowledge to resolve uncertainty," Comput. Suru. l2r 213-253 (June 1980). 20. D. A. Waterman and F. Hayes-Roth (eds.),Pattern-DirectedInferenceSystems,Academic Press, New York, 1978. 2I. J. Bachant and J. McDermott, "Rl revisited: Four years in the trenches,"AI Mag. 5,2I-32 (Fall 1984).
22. L. Brownston, R. Farrell, E. Kant, and N. Martin, Programming Expert Systemsin OPSS:An Introduction to Rule-BasedProgramming, Addison-Wesley,Reading, MA, 1985. 23. Xerox, Interlisp ReferenceManual, Xerox Corporation, Palo Alto, CA, October 1983. 24. J. Feldman et al., Recent Developmentsin SAIL: An Algol-Based Language for Artificial Intelligence, FJCC Proceedings, AFIPS Press,L972. 25. C. Hewitt, Procedural Embedding of Knowledge in Planner, Proc. of the SecondIJCAI, London, Ig7I, pp. 167-I92. 26. R. M. Burstall, J. S. Collins, and R. J. Popplestone,programming BIBLIOGRAPHY in PoP-2, Edinburgh University Press, Edinburgh, u.K., Lg7r. 27. G. J. Sussman and D. McDermott, From Planner to Conniver: A 1. B. Liskov, R. Atkinson, T. Bloom, E. Moss, J. C. shaffert, R. Genetic Approach, Proc. FJCC, AFIPS Press, vol. 41, pp. 11zlSheifler, and A. Snyder CLU ReferenceManual, Lecture Notesin LL7g,L972. Computer Science114, Springer-Verlag, New York, 1981. 28. R. M. Stallman and G. J. Sussman, "Forward reasoning and de2. D. G. Bobrow and B. Raphael, "New programming languagesfor pendency-directedbacktracking in a system for computer-aided AI research,"ACM Comput. Suru.6, 155-174 (SeptemberLg74). circuit analysis," Artif. Intell. g, 195-196 (october rg77). 3. H. Abelson and G. J. Sussman, Structure and Interpretation of 29. B. C. Smith, Reflection and Semantics in a Procedural Language, Computer Programs, MIT Press, Cambridge, MA, 198b. Report TR-272, MIT Laboratory for Computer Science,1982. 4. G. L. Steele and G. J. Sussman,The RevisedReport on Scheme:A 30. des Rivieres and B. C. Smith, The implementation of ProcedurJ. Dialect of Lisp, Memo 452, MIT AI Laboratory, January 1978. ally Reflective Languages,Report CSLI-84-9, Stanford Center for 5. G. J. Sussman and G. L. Steele, Lambda: The Ultimate Imperathe Study of Language and Information, 1984. tive, Memo 353, MIT AI Laboratory, March, 1gTG. 31. D. P. Friedman and M. Wand, Reification: Reflectionwithout Me6. G. L. Steele, Lambda: The Ultimate Declarative, Memo 379, MIT taphysics, 1984 ACM Symposium on Lisp and Functional ProAI Laboratory, November 1976. grarnming, Austin, TX, August 1984,pp. 848-Bbb. 7. J. Moses, The Function of Function in Lisp, or, Why the Funarg 32. M. R. Genesereth,An Overview of Meta-level Architecture, ProProblem Should Be Called the Environment Problem, Memo AIceedingsof the Third AAAI, Washington, DC, 1989,pp. 1Lg-r29. 199, MIT Project MAC, June 1970. 33. D. B. Lenat, "Eurisko: A Program that Learns New Heuristics and 8. A. Goldberg and D. Robson,Smalltalk-8}: The Language and lts Domain Concepts,"Artif. Intell. ZL, G1-98 (1989). Implementation, Addison-Wesley,Reading, MA, 1989. 34. N. D. Jones, S. Sestoft, and H. Sondergaard,An Experiment in 9. Symbolics,3600 Technical Summa4y,Symbolics,Cambridg", MA, Partial Evaluation: The Generation of a Compiler Generator, in 1983. G. Goosand J. Hartmanis (eds.),Rewriting Techniquesand,Appti10. C. Hewitt, "Viewing control structures as patterns of passingmescations,Lecture Notes in Computer Science202,Springer-Verlag, sages,"Artif. Intell. 8, 323-364 (June lg77). Berlin, pp. L24-L40, 1985. 11. A. Newell and H. A. Simon, Human Problem Soluing, Prentice- 35. A. Ershov, "Mixed computation: Potential applications and probHall, Englewood Cliffs, NJ, L972. lems for study," Theor. Comput. Sci. 18, 4I-62 (1982). L2. R. Davis and J. King, An Overview of Production Systems,Memo 36. D. McDermott and G. J. Sussman,The Conniver ReferenceManAINI-271,Stanford AI Laboratory, Ig7S. ual, Memo AIM-259A, MIT AI Laboratory, L974.
218
CONTROI.STRUCTURES
37. D. Waltz,"Understanding line drawings of sceneswith shadows,', in P. H. Winston (ed.), The Psychotogyof Computer Vision, MIT Press,Cambridge,MA, pp. lg-gL, Lg7S. 38. B. Buchanan, G. Sutherland, and E. A. Feigenbaum, Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistty, Machine Intelligence, Vol. 4, American Elsevier, New York, pp. 209 -254, 1969. 39. R. E. Carhart, R. Smith, H. Brown, and A. Djerassi,"Applications of artificial intelligence for chemical inferences XVII: An approach to computer-assistedelucidation of molecular structure." J. Am. Chem. Soc.97, 5755(197b). 40. N. Chomsky, Aspects of the Theory of Syntax, MIT Press, Cambridge, MA, 1965. 4L. R. C. Berwick and A. S. Weinberg, The Grammatical Basis of Linguistic Performance: Language Use and Acquisition, MIT Press, Cambridge, MA, 1984. 42. E. P. Stabler, "Berwick and Weinberg on linguistics and computational psychology:'Cognition 12, 155-179 (1984). 43. D. Maru, Vision, W. H. Freeman, San Francisco,1982.
61. W. F. Clocksin and C. S. Mellish , Programming in Prolog, 2nd ed., Springer-Verlag,New York, 1984. 62. D. E. Smith, Inference Control, Ph.D. Thesis, Computer Science Department, Stanford University, August 198b. 63. D. Warren, An Abstract Prolog Instruction Set, Technical Note 309, AI Center, SRI International, Menlo Park, CA, 1988. 64. D. P. Siewiorek, C. G. Bell, and A. Newell, ComputerStructures: Principles and Examples, McGraw-Hill, New York, Lg8Z. 65. E. Organick, Computer System Organization: The 85700186700 Series,Academic Press,New York, 1973. 66. C. G. Bell, J. C. Mudge, J. E. McNamara, ComputerEngineering: A DEC View of Hardwe,re SystemsDesign, Digital, Bedford, MA, 1978. 67. G. Radin, "The 801 minicomputer," IBM J. Res.Deu. 27, 237-246 (May 1983). 68. A. Bawden, R. Greemblatt, J. Hollow&y,T. Knight, D. Moon, and D. Weinreb, Lisp Machine ProgressReport, Memo AIM-444, MIT AI Laboratory, August L977.
69. K. Taki et al., Hardware Design and Implementation of the Personal Sequential Inference Machine (PSI), Proceedingsof the In1983. ternational Conference on Fifth Generation Computer Systems, 45. Z. W. Pylyshyn, "Computation and cognition: Issues in the founICOT, Tokyo, Japan, 1984,pp. 398-409. dation of cognitivescience,"Brain Behau.Sci. 3, 111-169 (1930). 70. R. Onai, M. Aso, H. Shimizu, K. Masuda, and A. Matsumoto, 46. R. C. Schank, ConceptualInformation Processing,North-Holland "Architecture of a reduction-based parallel inference machine: Publishirg, New York, 1975. PIM-R," New Generat.Comput. 3, L97-228 (1985). 7I. V. Singh and M. R. Genesereth, A Variable Supply Model for 47. R. C. Schank, Identification of Conceptualizations Underlying Distributing Deductions, Proceedings of the Ninth ICJAI, Los Natural Langudge, in R. C. Schank and K. M. Colby, Computer L Angeles,CA, 1985,pp. 39-45. Models of Thougltt and Languag€,W. H. Freeman, San Francisco, pp. L87-247 , 1973. 72. E. Shapiro, A Subset of Concurrent Prolog and its Implementation, Report TR-003, ICOT, Tokyo, Japan. 48. R. C. Schank and C. J. Rieger III, "Inference and the computer understanding of natural language:' Artif. Intell. 5, 393-4L2 73. L. P. Deutsch and A. M. Schiffman, Efficient Implementation of (Winter I97 4). the Smalltalk-8O System, Proceedings of the Eleuenth ACM SIGACT-SIGPLAN Symposium on the Principles of Progra,rrl49. R. C. Schank, "Lantguage and memory," Cog. Sci. 4, 243-284 ming Languages, Salt Lake City, UT, January 1984. (July*September 1980). 74. D. Ungar, R. Blau, P. Foley, D. Samples,and D. Patterson,Archi50. W. A. Woods,What's In a Link?: Foundations for Semantic Nettecture of SOAR: Smalltalk on a RISC, Eleuenth Symposium on works, in D. G. Bobrow and A. Collins (eds.),Representationand ComputerArchitecture,Ann Arbor, MI, June 1984,pp. 188-L97 . Understanding, Academic Press, New York, pp. 35-82, L975. 75. J. Batali, E. Goodhu€,C. Hanson,H. Shrobe,R. M. Stallman, and 51. R. J. Brachman, On the Epistemological Status of Semantic NetG. J. Sussman, The Scheme-8l Architecture: System and Chip, (ed.) Representation Networks: works, in N. V. Findler , Associatiue Proceedingsof the 1982 MIT Conferenceon Aduanced Researchin and (Jseof Knowledge by Computers,Academic Press,New York, VLSI, Cambridge,MA, L982,pp. 69-77. pp. 3-50, 1979. D. W. Clark and C. C. Green, "An empirical study of list structure 76. KL-ONE 52. R. J. Brachman and J. G. Schmolze,"Anoverview of the in LISP," CACM 20, 78-87 (Februaty Lg77). knowledgerepresentationsystem," Cog.Sci. g, l7l-ZL6 (1985). 77. H. Lieberman and C. Hewitt, "A real-time garbagecollectorbased 53. D. G. Bobrow and T. Winograd, An Overview of KRL: A Knowlon the lifetimes of objects,"CACM 26, 419-429 (June 1983). edge Representation Language, Xerox PARC Report CSL-76-4, 78. D. A. Moon, Garbage Collection in a Latge Lisp System, 1984 Palo Alto, CA, July L976. ACM Lisp and Functional Programming Conference,Austin, TX, 54. P. J. Hayes, The Naive Physics Manifesto, in D. Michie (ed.), August L984,pp. 235-246. Expert Systemsin the Micro-electronic Age, Edinburgh University L979pP. 242-270, K. E. Batcher, "Design of a massively parallel processor,"IEEE 79. Press,Edinburgh, Trans. Comput. C-29(9),836-840 (September1980). Memo, Centre Liquids, for Ontology I: Physics 55. P. J. Hayes, Naive pour les Etudes Semantiqueset Cognitives' Geneva,1979. 80. E. Arnould, H. T. Kung, O. Menzilcioglu, and K. Sarocky,A Systolic Array Computer,Proceedingsof the 1985IEEE International bO. T. Winograd, Frame Representationsand the Declarative/Proce(eds.), RepreConferenceon Acoustics,Speech,and Signal Processing,March A. Collins and Bobrow in D. G. dural Controversy, 1985. sentation and (Jnderstanding, Academic Press, New York, pp. 185-208,1975. 81. W. D. Hillis, The ConnectionMachine, MIT Press, Cambridg", MA, 1985. 57. P. N. Johnson-Laird, "Procedural semantics," Cognition 5, 1892L4 (September 1977); Jercy Fodor's reply is in Ref. 58. 82. A. L. Davis and S. V. Robison, The Architecture of the FAIM-I Symbolic Multiprocessing System, Proceedings of the Ninth b8. J. Fodor, "Tom Swift and his procedural grandmother," Cognition to Ref. 57. Los Angeles, 1985,PP.32-38. (September reply is a IJCAI, this 1978); 6,229-247 IJCAI, the Fifth The NON-VON Supercomputer,Technical Report, of Shaw, Proceedings E. D. 89. Logic, of 59. P. J. Hayes, In Defence Department of Computer Science,Columbia University, August Cambridge,MA, t977 , PP.559-565' 1982. 60. R. A. Kowalski, Predicate Logic as a Programming Language' 4' E. Shaw, SIMD and MSIMD Variants of the NON-VON SuperL97 D. 84. North-Holland, IFIP, Proceedings of the
44. J. A. Fodor, The Modularity of Mind, MIT Press,Cambridge, MA,
COROUTINES
85.
86.
87.
88. 89.
comput er,Proceedingsof the COMPCON, Spring 1984, San Francisco,February 1984. BBN, Deuelopmentof a Butterfly Multiprocessor Test Bed, Report 5872,Quarterly Technical Report No. 1, Bolt Beranek and Newman, Cambridge,MA, 1985. C. Rieger, R. Trigg, and R. Bane, ZMOB: A New Computing Engine for AI, Proceedingsof the SeuenthIJCAI, Vancouver, 8.C., August 1981,pp. 955-960. M. Weiser, S. Kogge, M. McElvany, R. Pierson, R. Post, and A. Thareja, Status and Performance of the Zmob Parallel Processing System, IEEE CompCon Conference,San Francisco, California, February 1985. C. L. Seitz, "The cosmiccube," CACM 28,22-32 (January 1985). S. Fahlman, NETL: A System for Representingand Using RealWorld Knowledge, MIT Press, Cambridg", MA, L979. P. Acnp MIT
COROUTINES The word "coroutine" is attributed to Conway (1), who describescoroutines as a set of autonomousprograms communicating with adjacent modules as if they were input and output routines. Coroutines (sometimes called mutual subroutines) can be consideredas subroutines at the same level. Each of them acts as the main program, although there is no main program. The best known example of coroutinesis the interaction between a parser (seeParsirg) and a lexical anal yzer (see Morpholory). The parser calls the lexical analyzereach time it needs a token; the lexical analyzer calls the parser to dispose of tokens extracted from the input sequence. Figure 1 illustrates a typical control flow through coroutines. However, a number of differing views on coroutining have emerged and have been implemented in experimental or practical programming languages. They illustrate relationships with backtracking (qv), multipass algorithms, network processes, Iazy evaluation, concurrent programming languages,and object-orientedprogramming languages.Applications of coroutining can be found in the domains of business data processing,text processing,simulation, operating systems, and AI. Concepts Coroutines:Explicitversuslmplicit Sequencing.In the simplest and most conservativeview of coroutining the sequencing is explicitly controlled by the programmer (2). It has the following characteristics: the control is explicitly passedfrom one coroutine to another; the execution of a coroutine is resumed from the point of deliberate suspensionof control (resume); only one coroutine is executing at any given time; and the values of the local data of a coroutine persist between successivereactivations (own variables). Considerthe two coroutinesA and B (Fig. 1), which cooperate in such a way that A sendsitems to B: B runs until it needs an item fromA. At this point the control is passedto A, while B suspends.After supplying the item B asked for, A suspends and B resumes and continues from the point where it suspended,and so on. Coroutining is appropriate whenever the algorithm to be
219
Resume B Resume A
Resume B
ResumeA
Figure 1. Control flow through coroutines. implemented can be divided in two or more conceptually distinct subalgorithms that have to be executedalternatively and where it is difficult to impose a hierarchy between the different subalgorithms. The technique of coroutining can be used to split up the different conceptsrelated to an algorithm in different modules.This modularity and the locality of data can facilitate the verification and the debugging of programs. Communication between coroutines is possible either via global data, accessibleto both, or by also passing data when control is passed. A more liberal (and not generally accepted)view considers coroutines as a means of simulating parallelism on a singleprocessormachine. This generalization of coroutining has primarily been used in AI domains to express an implicit sequencing of conceptually parallel programs. One goodexample of this approachis the principle of lazy evaluation in LISP (qv) (3,41,the LISP evaluation system consistsof coroutines,each evaluates part of the program, and they are synchronizedby the mutual need for data, that is, the call by need. This is further generalized in the logic programming paradigm: control knowledge is addedto the program to enable the system to decidewhich of its coroutineshas to be executed(b,6). These conceptshave finally led to the definition of concurrent logic programming languages(7,8).This implicit sequencingof conceptually parallel programs is a language feature in which the flow of control is not explicitly specified by the programmer but is dynamically determined by data dependencies. Relationwith Procedures.Procedurescan be consideredas a special kind of coroutine with restrictions on their behavior: on completion of their tasks, proceduresalways return control to their caller, proceduresusually start with a fresh set of local data, and execution starts at the first statement. When calling another procedure, their execution is suspendedto resume with the same local data when the called procedure returns control. The difference between coroutines and proceduresis characterized by the difference in control flow. With procedures the caller decideson the return address,the instruction immediately following the call; this address is saved and always used to resume execution when the callee finishes. With coroutines the callee decideson the control flow (eventually it calls another coroutine); the caller saves the return address, that is, the address where it resumes execution when, at a
220
COROUTTNES
later stage, it regains control either from the callee or from another coroutine.This addressis called the reactivation point of the coroutine.
needs some synchronization mechanisms to prevent a process from reading an empty queue or pushing an item into a full buffer and to wake up processeswaiting for input. This kind of parallelism has been generalizedin different implementations ReactivationPoint. Some definitions of coroutining impl y a of concurrent programming languagesand object-orientedlanfixed reactivation point: each time the coroutine is passed guages. Because of the strong relationship with coroutining, control, it resumes execution starting from a fixed instruction this particular kind of parallelism is much easier to program (like a called procedure). Other (more conventional) definithan the more general mechanisms of concurrent processes tions have a dynamic reactivation point; execution is resumed proposedand implemented elsewhere.In this special casethe at the point where control left on invoking another coroutine synchronization between processesis handled by the system (coroutinesthat behave as read or write modules).A dynamic and is invisible to the user insofar as the processesdo not reactivation point asks for a different implementation than a accesssome common data. In the latter case the user has to static reactivation point. Indeed, not only the values of the synchronize the processesusing the primitives offered by the local variables have to be retained between the different reac- programming language. The "pipes" of the operating system tivations of the coroutines but also the address where exe- Unix are a well-known example of pipelines between procution of the coroutine has to be resumed when it regains cesses. control. SymmetricversusSemisymmetric.In symmetric coroutining each coroutine acts as a main program; it can pass control to any other coroutine. The routine in control takes over the role of main program and thus has complete freedom to pass control to any existing coroutine without being tied to the caller. In semisymmetric coroutining a monitor module controls the flow through the coroutines. On return of a coroutine, control is returned to the monitor module, which decides to pass it eventually to another coroutine. However, each coroutine can be a monitor for some other coroutines, thus establishing a hierarchy. MultipassAlgorithmsand Pipelines.Many multipass algorithms lend themselvesnaturally to coroutining. In a classical approach the first pass of a multipass algorithm is applied to the input of the algorithm, the results are savedin an intermediate storage and supplied to the next pass, and so on. In a coroutining approach the different passesare interconnected as coroutines,the first passexecutesuntil it has a token available to transmit to the next pass, execution is suspended,and control is passedto the next coroutine, which implements the next pass in the algorithm. This routine, on terminating its execution in turn calls the next pass,and so on. On delivery of the final result, the last pass gives control back to the first pass,which resumesexecution.This mechanism continues until the input sequenceis completely processed.Note that not aII multipass algorithms can be implemented in this way. Algorithms that need input tokens that appear after the actual processedinput token in order to producesingle-output tokens obviously do not lend themselvesfor coroutining, for example, if one step is a compiler for a language that allows forward references. This coroutining of multipass algorithms has given rise to the conceptof pipelining. In this approachthe different passes of the algorithm are implemented as separateprocessesinterconnected via FIFO queues. The first pass of the algorithm producessubsequenttokens that are put on the output queue to the next pass;the execution is not interrupted; and the next pass consumestokens from its input queue, processesthem, puts result tokens on its output queue, and so on until the last pass, which produces the final results. The processesrun in parallel, and the control is not passedexplicitly; but processes are triggered by tokens arriving on their input queues.This data-driven process(seeProcessing,bottom-up and top-down)
lmplementationof Coroutining SemanticDifferences. The different implementations of coroutining in different programming languages attribute a different semantic meaning to the conceptsof coroutines. Some definitions of coroutines separate the creation, referencing, and execution of coroutines in explicit steps, whereas other definitions only provide one mechanism that groups the three conceptual steps. Some implementations suspend automatically the execution of the coroutine that passesthe control to another coroutine, whereas others provide an explicit suspend command and thus allow parallel execution of coroutines until all but one are explicitly suspended.Coroutines never terminate in some implementations; other ones, where coroutines can termin ate, have different specificationsfor the target of the control when a coroutine stops to exist: to the main prograffi, to the creator of the coroutine, or to the activator of the coroutine. Some language definitions consider coroutines as distinct from procedures,and they must be declared as such. Others treat proceduresas having the potential to becomean instance of a coroutine when referenced properly during run time without any explicit declaration. The Spaghetti Stack. When implementing a coroutining mechanism, the problem arises that some information concerning the suspendedcoroutine has to be retained between the subsequentinvocations of the coroutine: the values of the local data structures and the addressof the reactivation point. The lifetime of this information is not tied to the control flow through the different coroutines. The information must remain available during the whole lifetime of the coroutine.This retentive control can be implemented with a heap: activation records are allocated in a storage pool and contain the local data of the coroutine, including sufficient spacefor temporaries, and the return address.This is a very simple schemebut has the drawback that there is wasted spacedue to the preallocation of stor age for temporaries, the fragmentation of the storage, and the overhead of maintaining a free space list; these disadvantages also occur whenever a program does not need the retentive control. Another well-known implementation of retentive control is called the spaghetti stack. It was proposed by Bobrow and Wegbreit (9) and can also support other control structures where the lifetime of frames is not associated with the nesting of control as backtracking and multitasking. The spaghetti stack is a generalizedstack where
COROUTTNES
gaps can exist between different stack frames. The main advantage of this spaghetti stack lies in its ability to operate as a normal stack whenever retentive control is not neededso that no overhead is paid for programs not using coroutining. The spaghetti stack stores activation records that include the storage for local data structures, the necessarycontrol and access links, and sometimes storage space for temporaries. When control leaves a module, the activation record is only reclaimed if the module cannot be reactivated in the future. Frames retained on the stack can block the growth of temporary storage (which is a substack) of the activation frame of the routine where the control resides.In this casethe complete activation record has to be copiedto another part of the stack, leaving a gap that can be used for the substack of the temporaries of the frame immediately below it. This scheme is rather complex,and there is an overheaddue to multiple reference counts and copying. Becausethe original technique was also designed to cover backtracking and dynamic scoping, apart from coroutines, it is sometimes too general. Different techniques have since been implemented that have different advantages and disadvantages,among others the Berry heap (10), which includes two heaps and one stack, and the stack heap, which has one stack and one heap. A comparative study and an in-depth analysis of different techniques is given in Ref. 11. Depending on the definition of coroutinesused to define the programming language, the implementation possibly provides mechanisms for parallel execution of coroutines.One problem related to parallel execution of modules is the synchronization between different processesexecuting concurrently. There are two main techniques to synchronize processes:the first is basedon shared variables and the secondon messagepassing. Refationwith Backtracking.One of the methods to exhaustively searcha problem spaceis backtracking (qv). It is a stratery that explores all possible paths of the search space in a depth-first way (seeSearch, depth first). Whenever a path is a dead end, the algorithm goesback until the latest encountered choicepoint and exploresanother possiblepath. This processis continued until a solution for the problem is found or the search space is exhausted. To implement this search technique, a specialkind of retentive control is needed;eachchoice point has to be retained on the run time structures to facilitate the backtracking. Moreover, these choicepoints are stored in a stackwise manner; the most recent choice point is retrieved first. When all related paths are explored,ihe system backtracks to the previous choice point, and so on. The spaghetti stack could be used to implement this retentive .orrtrot, <hough this mechanism is too general for the restricted retentive control needed by backtracking. A normal stack that storesactivation recordsfor the currently executing procedure is sufficient; activation frames from modules that L*pr.r, a choice point are retained o{, the stack. These frames contain return addressesand local values neededfor resuming execution on backtrackittg. On backtracking the frame on top of the stack is revisited. When all choicesare exhausted, the frame is poppedfrom the stack and the previous stack frame is considered. A backtracking search can also be expressedwith coroutines. Consider, for example, an algorithm consisting of two parts A and B to find a solution to gi'o.n proble*. s;ppose A " The has n andB has lzzpossiblesolutions. backtrackinj search
z2r
consistsof combining the n solutions ofA with the m solutions of B. AII combinations of a solution ofA and B that are compatible are solutions to the overall problem. The coroutining approach consistsof writing the algorithm to find all solutions of A. After each solution control is passedto coroutine B, which checksthe consistencywith all solutions to B. Then A regains control to find the next solution, and so on until the search spaceis exhausted or until one solution to the global problem is found. Applicationsof Coroutines In Ref. 11 a survey was made of current literature to get a representative sample of coroutine usage. Examples were selected from various application fields, a brief description was given, and the characteristics of the algorithms using coroutining were analyzed.Given below is a summary of sometypical application domains for coroutining. In Ref. 2 onecan find some short descriptions of general-purposelanguages that include features for coroutining like COROUTINE PASCAL, an extension to FORTRAN, cLU, EXTENDED ALGOL, and so on. Compilers.Compilers are usually multipass algorithms that often, as already stated, lend themselves to coroutining. In the parsing step of a compiler coroutinescan be used in yet another way. Usually, part of the specificationconsistsof the grammar of the language to be parsed.When using a top-down parsing stratoW,mechanisms have to be providedlo.op. with the eventual alternatives in the grammar rules. Backtracking is an appropriate technique in such a case, although coroutines can be used too. A coroutine containing the uppropriate stratery is defined for each alternative in the grammar. This top-down parsing technique can be extended to include ,,nonforgetful" backtrackin g, (12). Receiver-sender Communication.Receiver-sender communication problems are a typical application for coroutining. A "sender" coroutine retrieves information until a buffer is full or a meaningful token has been produced.Then a,,receiver" coroutine is activated that processesthe input data and, upon completion, reactivates the sender. Sometimes the sender transforms the input sequencebefore putting it in the buffer. The same applies to the receiver. An example can be found in Ref. 19. Operating Systems.A processscheduler of a multitasking operating system can be seen as semisymmetric coroutining. The purpose is to assign scarce, nonsharable resources like processortime to competing processesthat have to run on a single-processormachine. The scheduler acts as the monitor module and passescontrol to the different processes for a limited amount of resources.on exhaustion of th. resources control comesback and the scheduler decideson the next process to become active depending on eventual dynamic priorities, waiting time, and so on. On the other hand system services like input-output operations are typical .*u*p1es of symmetrical coroutines.The user consid.r, lh"se services as subprocedures to his program. However, for the operating system the reverse is true. Although on the same levei, they act as mutual input and output routines and retain values for their local variables and thus obey the definition of coroutines. Examples
222
COROUTINES
of this view on operating systems can be found in Ref. g. A description of BLISS, a systems-programming language including facilities for coroutining, can be found in Ref. 14.
from a recursive procedural approach to a coroutining approach. In the conventional approach the evaluator is called recursively to evaluate subexpressions;these are evaluated completely before control returns. Th e lazy evaluation system Simulation.Simulation of systems involving different ob- acts as a coroutine for each subexpressionand delivers results jects in interaction with each other, be it mechanical, soft- incrementally on subsequent invocations. More details about ware, economical,or other real-time systems,requires, on a lazy evaluation in functional languagesand someuseful refersequential machina, & simulation of the parallel processesgo- encesare given in Refs. 3 and 4. ittg on in the real system. This can be expressedwith corouThe same result can be achieved in the logic programming tines. Dependingon the application, the external influencesare (qv) paradigm. In Clarck's IC-PROLOG (5) PROLOG has been represented by random generators or by sequencesof typical extended with mechanisms to describe a different order of examples of external impacts on the system. The different ob- evaluation than the standard, strictly left-to-right one. The jects in the system are implemented by coroutines, and the conceptsof alazy producer and an eager consumerare defined interactions between the objects and the external world are for respective input or output variables of the calls in a PROcoded as mutual invocations. See Ref. 15 for some examples. LOG program. These language features allow a lazy evaluaThe best known language for simulation that includes corou- tion of programs in much the same sense as LISP-Iike lazy tine features is SIMULa (qv). A short description of the con- evaluation, but also an eager evaluation (which starts execucepts of this language and some further references can be tion with the producer, wh ere lazy evaluation starts with the found in Ref. 2. consumer, but further contains essentially the same data transmissions and requests), and a mixed-mode evaluation. Besides the work on lazy evaluation, the paper of Kahn and Artificial lntelligence McQueen (18) served as an important source for the ideas Lazy Evaluation.The conceptsof Iazy evaluation have ini- developedin IC-PROLOG.The language,describedin this patially been investigated in the context of a purely applicative p€r, provides conciseand flexible means for creating complex networks of parallel processesthat may evolve during execulangUage,that is, a language without assignment. The intuiis to method evaluation of nonstandard this tion. Channels, which are built as FIFO queues,interconnect idea behind tive perform an evaluation step only when it is absolutely neces- and buffer the communication between the processes,which sary and never to perform any evaluation twice. The conceptof thus behave as coroutines in the pipelining approach. Iazy evaluation is a generalization of the call-by-need mechaConcurrent Languages.Generaltzations of coroutining have nism of Wadsworth (16) and the delay rule of Vuillemin (L7), in the sensethat list structures are evaluated incrementally; been included in AI languagesto be able to describethe executhat is, an element of a list in the context of a functional tion of conceptually concurrent processeson sequential malanguage like LISP is not evaluated until and unless it is chines. Logic programming has been an important subject for selectedand examined by some later operation. This method this research since the language is inherently heavily oriented allows a significantty different style of programming; for the toward parallel execution due to the logic-based formalism comparison of two lists a lazy evaluation system evaluates and semantics,which allow both or- and and-parallellism. The pairs of elements of the two lists, comparesthem, and proceeds most popular concurrent logic programming languages are (7). Both build on to the next elements. When encountering an inequality, the CONCURRENT PROLOG (8) and PARLOG call in the proEach IC-PROLOG. of approach not concoroutining the lists are different and the tails of the two lists are the variables a coroutine; of instantiation new a gram the two creates compute would evaluation structed. A conventional chanlists before starting to compare them . Lazy evaluation allows shared between calls denote a pipelining communication to the processes similar running nel between the concurrent the efficient processing of huge and even infinite structures. It algomultipass a passes of different the between combines the advantages of call by value and call by name. pipelines of the processesis The expressionsare only evaluated when referenced,whereas iiin* as described before. The scheduling a concurrent logic of definition The driven. data implicitly The expresnot all. at or even once only is done the evaluation the and-paralby concurrency handles sions are treated symbolically (by name) as long as they are progrumming language communicaand or-parallellism, the by not referenced. When needed, the expressions are evaluated. iellism, indeterminacy Moreover, variables. shared the by synchronization partial, and is tion only In the caseof structures or lists this evaluation to the lanadded are variables read-only and guarded clauses form. in symbolic parts kept are and the unevaluated cut operation and The relation with coroutines is best exemplified by consid- guug.r, which serve as a modification of the respectively. PROLOG, sequential of ering a producer-consumer problem. In the normal evaluation tft. standard unification of the execution the of control better a allow features generate its These mode of a LISP-Iike language, the producer would is proglam. list this Then output first, possibly a huge list of elements. fed as input to the consumer,which is evaluated next. ApplyObject-OrientedProgramming.The building blocks of obing lazy evaluation, the first element of the output list would ject-oriented (call languages are objectsthat are grouped in hierarneed).At by ottly be producedon demand of the consumer Objects can have properties, and these are obproducer, classes. ctrical the of evaluation this point control passes to the between different objects. which supplies the first element. The consumer resumes and jects themselves and can be shared sending messages,and by objects the with procesr.r ihis first element and eventually proceedsto refer- The user interacts internal state and by their changing by respond to producer objects the Ln.. the second;this triggers the evaluation of the can be thought of An object objects. other to messages of kind a sending indeed is This so on. and deliver the secondelement; active when it It becomes state. coroutining behavior. The evaluation mechanism is changed as a processwith an internal
CREATIVITY
223
receives a message (compare with pipeline coroutining); the 18. G. Kahn and D. B. McQueen,"Coroutines and networks of parallel processi.g," Proc. IFIP 77, ggg-gg8 egTT). internal state (i.e., the own variables of the coroutine) can only be changed by the object on receipt of a mess&ge,which speci- 19. A. Goldberg, D. Robson, and D. Ingalls, SMALLTALK-\|: The language and its implementation, Addison-Wesley, Reading, MA, fies the operation to be done; the object can send messagesto 1983. other objects (invoke other coroutines); and any number of instances can be generated from a definition of an object (mul- 20. E. Shapiro and A. Takeuchi, "Object oriented programming in CONCURRENT PRoLoG," New Gen. comput. l,2s-49 (1gg3). tiple instantiations of the same coroutine). Although there exist some purely object-orientedprogram- 2l- D. Weinreb and D. Moon, LISP Machine Manuol, Symbolics Inc., Cambridg", MA (1981). ming languages, from which SMALLTALK (19) is the best known, some argue that object-oriented programming is more 22. c. Zaniolo, object-oriented Programming in pRoLoG, in proceedingsof the International Symposium on Logic Programming, a question of style than of language; see,for example, Ref. 20, Atlantic City, pp. 26b-270, 1984. where CONCURRENT PROLOG is used in an object-oriented style. Others have added an object-oriented layer (2I,22) to existing languages like PROLOG or LISP. M. BnuyNoocHE Catholic University Leuven and
BIBLIOGRAPHY 1. M. E. Conway,"Designof a separabletransition-diagramcompiler." CACM6(T),g96-408,(1963). 2. C. E. Marlin, Coroutines, A programmingMethod,ology, a Lan_ guage Design and an Implementation,Springer-Verl"g,New York, 1980. 3. D. P. Friedmanand D. E. Wise,"ConsShouldNot EvaluateIts
R. VnNKEN BIM
CREATIVITY
Arguments," in Michaelson and Milner (eds.),Automata, Lan_ guages and Programming, Edinburgh University press, pp.257294, L976. 4. P. Henderson and J. H. Morris, A Lazy Evaluator, in Conference Record of 7rd ACM Symposium on Principles of Progra-mming Languages, pp. gb-108 , LgT6. 5. K. L. Clark and F. G. McCabe,The Control Facilities of IC-PROLOG, in Expert Systemin the Micro-ElectronicAgr,D. Michie (ed), Univ. of Edinburgh, UK, pp. t22-LS2,1g7g. 6' H. Gallaire and C. Lasserre, Metalevel Control for Logic programs, in clark and rarnlund, Logic programming, pp. tzs_lgg, 1982. 7. K. clark and s. Gregory, Notes on systems programming in PARLOG, in Proceedingsof the International Conference on Fifth Generationcomputer systerns,pp. 299-806, 1gg4. 8.-.E. Shapiro, A subset of CONCURRENT pRoLoG and Its Interpreter. Technical Report Weizmann Institute of Science,Rehovat, Israel, 1988. 9' D' G. Bobrow and B. Wegbreit. "A model and stack implementation of multiple environments,',CACM 16(10),lb3_12 4 (Ig7B). 10' D. Berry, L. Chirica, J. Johnston,D. Martin, and A. Sorkin, ,,Time required for reference count management in retention blockstructured languages,part \." Int. J. Comput. Inf. Sci. ZGl), 911 1 9( 1 9 7 5 ) .
A creative act may be defined as one that is viewed both as valuable and as novel and one that, in addition, reflects well on the cognitive abilities of the actor (whether human, animal, or machine). There are a number of conflicting views as to the nature of creative acts. Cumently, the most attractive view is that championed by simon and his co-workers (1,2). Accordingto this view, creative acts are problem-solving acts of a ,pr.i"l sort. First, they are problem-solving acts that meet criteria such as those above; that is, they are seen as novel and valuable, and they reflect credit on the cognitive abilities of the problem solver. second,they often, though not always, involve ill-defined probleffis, that is, proble*r ihut cannot be solved unlessthe problem solver makes decisionsor addsinformation of his or her own. Ill-defined problems occur frequently, for example, in architecture where the client typically specffies a few of the properties of a building to be desig"ea but the architect must supply many more before the design problem can be solved.For a more complete discussionof ill-dennea problems see Refs. 3-6. Simon and his co-workers interpret problem solving broadly so that it includes not only the sciencesbut the arts and humanities as well. A corollary of the Simon view is that there is no special 11' W' Pauli and M. L. Soffa,"Coroutine behaviour and implementa- creative mental processto be found in creative acts, that is, Do tion," Softwr. Pract. Exper. 10, 1gg_204 (1gg0). processthat is not also found in more mundane problem-solv12' G. Lindstrom, Non-forgetful Backtracking: An Advanced Corou- ing acts. This view is consistent with observations of patrick tine Application, Technical Report lG-Li, Department of Com- on poets (7) and on painters (8) (see also Art, AI in) and of puter science, LJniversity of pittsburgh, Lg76. Reitman (3) and of Simon and Sumner (9) on musical composi13' P' Brinch Hansen, "The programming language CONCURRENT tion (seealso Music, AI in). These authors examirrea .r"uti"" PASCAL," IEEE Trans. Softwr. Erg. SE_l(2), Sgf _O0g(1gZB). performancescarefully and failed to find any process that was L4' W. Wulf' D- B. Russell, and A. N. Habermann. ,,BLISS: A lan- not also a part of everyday problem solvi"s ?qr). Some have guage for systemsprogramming," CACM l4(IZ), Tg0_7g0(1921). claimed that creative acts are in principle orr"nulyzable. The 15. G. Birtwistle, o.-J. Dahl, B. Myhrhaug, and K. Nyggard, simula philosopher of science, Popper notAs such a view about the Begin, Wiley, New york, LgT4. invention of scientific theories. In The Logic of Scientific Dis16' C' Wadsworth, Semanticsand Pragmatics of the Lambda-Calcu- couery(10) Popper says: /us, Ph.D. Thesis, Oxford, 1921.
17' J' vuillemin' "correct andoptimal implementations of recursion The initial stage-,the act.of conceiuing or inuenting a theory, in a simple programminglanguage,"J. comput. sys. sci. 9(B), seernsto.men liti", to cail for bgical analysis nor to be suscep_ 332-354(t97o' ilble of it. . ' . My uiew of the mstter, for what it is worth, is
224
CREATIVITY
that there is no such thing as a logical method of hauing new ideas, or a logical reconstructionof this process.My uiei may be expressedby saying that euerydiscouerycontains "an iryational element," or "a creatiue intuition," in Bergson's sense. In their book, Scientific Discouery:An Account of the Creatiue Processes,Langley, Simon, Bradshaw, and Zytkow (11) present a position directly challenging Popper's view. These authors argue that it is indeed possibleto accountfor scientific discovery in terms of well-specifiedheuristics proceduresand that vague terms such as "inspiration" or "creative intuition" are unnecessary.In particular, they hold that discoveriesare achieved when the scientist applies sensible heuristic procedures in drawing inferences from data. They argue quite convincingly for the adequacy of this view by incorporating such heuristics (qv) in computer programs, for example, BACON (L2), and showing that these programs can induce well-known scientific laws, such as Kepler's laws of planetary motion, from data. Lenat had demonstratedearlier (13) that a well-specified set of heuristics, incorporated in his program AM, could make interesting discoveriesin mathematics. For example, AM discoveredde Morgan's laws, the unique factorization of numbers into primes, and Goldbach'sconjecture. An early but still quite influential attempt to characterize creative processesis that of Wallas ( 14). Wallas analyzedthe testimony of creative individuals (notably that of the mathematician Poincar6) and proposed that the creative process could be describedas a sequenceof four stages: Preparation: a stage in which the creator works intensively, acquiring information and attempting to understand the problem. Incubation: a stage in which the creator is not attending to the problem but during which progress toward solution occurs none the less. Illumination: a stage in which important insights about the problem occur to the creator suddenly and unexpectedly. Verification: the final stage in which the creator works out the implications of the insights gained during illumination. On these four stages,the first is the least controversial. Most commentators agree that creative acts involve a great deal of work. Pasteur's famous statement, "chance favors only the prepared mind" (15) represents these views well. Work by Hayes (16) has extendedthe notion of preparation to make it a necessarystage not just for individual creative acts but also a necessarystage in the careers of creative individuals. Hayes studied 76 outstanding composers(including Mozart and Mendelssohn)and 131 famous painters. He found that the careers of these individuals typically include a 6-10-year period of preparation before they began to produce world-class work. Among the composers,only three composedoutstanding works earlier than the tenth year of their careers, and these three were produced in years 8 and 9. These results parallel the observationsof Simon and Chase (17) on chessmasters (see Computer chess methods). Together, the results suggest strongly that creators have to acquire very large amounts of knowledge before important creative activity can occur. Wallas's secondstage, incubation is considerablymore con(20) troversial than the first. cook (18,19) and Ericksen doubted that incubation occurred becausemany attempts to demonstrate it experimentally had failed. More recently,
though, Fulgosi and Guilford (2I), Murray and Denny (22), and Silviera (29) have succeededin demonstrating the phenomenon experimentally. They have shown that subjectswho are interrupted for a period of time late in the courseof solving a problem solve it in less total time on the problem than subjects who are not interrupted. Although the controversy over the existence of incubation appears to have been resolved, the nature of the processes underlying incubation remain controversial. A view one might hold is that humans have two processors,each of which is capable of solving probleffis, the familiar consciousone and another unconsciousone that can carry out the problem-solving work when the consciousprocessoris distracted.This dualprocessorposition is generally not supportedby other observations of cognitive processesin problem solving. Most work is consistent with the view that human problem solving is aecomplishedwith a single serial processor.As an alternative to the dual-processorview, Simon (1) has proposed that the progress that results from incubation can be attributed to forgetting. He holds that during the interruption period inefficient plans are forgotten. When problem solving is resumed, new, more effective plans-plans based on knowledge of the problem gained during the earlier solution attempts-are formed and lead to faster solution. The following additional alternative is plausible but not inconsistent with Simon's proposal:In the courseof problem solving the solver may establish a number of search goalsgoals to find facts, or relations, and/or operatorsthat might be useful in solving the problem. In effect, the solver sets up "watchers" for relevant information. If these watchers continue to be active during the interruption period, they could discover information that is useful for solution and would speedsolution of the "unattended" problem. Wallas (14) suggestedthat his four stagesare characteristic of creative acts generally. However, a reanalysis of his data (5) reveals many instances in which creative acts proceededfrom beginning to end without any pause that would allow for incubation, without any evidence of illumination, and thus without any opportunity for verification. It appears, then, that aIthough some creative acts do exhibit Wallas's four stages, many, and perhaps most, do not. It is often assumedthat creativity is closely related to IQ. Indeed,both Roe (24), studying eminent physicists,biologists, and social scientists, and McKinnon (25), studying distinguished research scientists, mathematicians, and architects, found that the creative individuals they studied had IQs ranging from L20to 177,well abovethe general average.However, these higher than average IQs cannot be taken as an explanation of the observedcreativity and indeed may be unrelated to it. Several studies indicate that highly creative individuals in a field do not have higher IQs than matched individuals in their field who are not judged to be creative. Harmon (26) rated 504 physical and biological scientists for research productivity and found no relation between creativity and either IQ or schoolgrades.Bloom (27) studied two samplesof chemists and mathematicians. One sample consistedof individuals judged outstandingly productive by colleagues.The other conritt"d of scientists who were matched in dge, education, and experience to the first sample but who were not judged outstandingly productive. Although the first group outpublished the secondat a rate of 8 : 1, there was no difference between
CYBERNETICS
them in IQ. In a similar study McKinnon (24) comparesscientists, mathematicians, and architects who had made distinguished contributions to their fields with a matched group who had not made distinguished contributions. There was no difference between the two groups in either IQ or schoolgrades. It may appear puzzling that creative scientists and architects have higher than average IQs when IQ doesnot predict which of two professionals will be the more productive. One explanation for this paradox may be that, in many fields, obtaining the opportunity to display creativity dependson getting through college or graduate school. Since school performance is well predicted by IQ, it may be that one'sopportunity to be, say, a biologist depends on IQ because of the degree requirement. Once one is certified as a biologist, whether one will be creative is unrelated to IQ or schoolgrades. Although IQ and school grades do not predict creativity, neither do pencil-and-paper 'creativity' tests. The Westinghouse ScienceTalent Search is the only organization that has demonstrated the ability to predict creativity. The Westinghouse Science Talent Search has selected 40 high school students each year since L942 on the basis of projects rather than written tests. In the group of 1520 students selectedbetween L942 and L979, there are 5 Nobel prize winners, 5 winners of MacArthur Fellowships, and 2 winners of the Fields Medal in Mathematics. It is interesting to ask if the successof the Westinghouse ScienceTalent Search in identifying creative individuals depends on its use of performance measures (e.9.,projects)rather than pencil-and-papertests.
225
Mathematics as Heuristic Search, SAIL AIM-286, Artificial Intelligence Laboratory, Stanford University, July, 1976. L4. G. Wall as,The Art of Thought, Harcourt, Brace, New York, L926. 15. R. Vallery-Radot, The Life of Pasteur, Doubleday Page, Garden city, N.Y., L923,p. 79. 16. J. R. Hayes, Three Problems in Teaching Problem Solving Skills, in S. Chipman, J. Segal, and R. Glaser (eds.),Thinking and Erlbaum Learning Skills, Vol. 2: Researchand Open Questio,rzs, Hillsdale, NJ, 1985,pp. 391-406. L7. H. A. Simon and W. Chase,"Skill in chess,"Am, Scl. 61, 394-403 (1973). 18. T. W. Cook, "Massed and distributed practice in puzzle solving," Psychol.Reu.41, 330-335 (1934). 19. T. W. Cook, "Distribution of practice and sizeof mazepattern i' Br. J. Psychol. 27, 303-3L2 (1937). 20. S. C. Ericksen, "Variability of attack in massedand spacedpractice," J. Exper. Psychol.3l,339-345 (L942). 2L. A. Fulgosi and J. P. Guilford, "Short term incubation in divergent production,"Am. J. Psychol.T, 1016-1023(1968).
22. H. G. Murray and J. P. Denny, "fnteraction of ability level and interpolated activity (opportunity for incubation) in human problem solving," Psychol.Rep. 24,27I-276 (1968). 23. J. M. Silviera, "fncubation: The effectsof interuption timing and length on problem solution and quality of problem processing" (Doctoral dissertation, University of Oregon, 1971),Dlss . Ab. Int. 32,55008 (1972). 24. A. Roe, The Making of q Scientisf, Dodd Mead, New york, lgbg. 25. D. W. McKinnon, Selecting Students with Creative Potential, in P. Heist, (ed.), The Creatiue CollegeStudent: An (Jnmet Challenge, Jossey-Bass, San Francisco,1968,pp. 104-110. 26. L. R. Harmon, The Development of a Criterion of Scientific ComBIBLIOGRAPY petence,in C. W. Taylor and F. Barron (eds.),ScientificCreatiuity: Its Recognition and Deuelopment,Wiley, New york, pp. 44-s2, 1. H. A. Simon,ScientificDiscoveryand the Psychology 1963. of Problem Solving, in R. G. Colodry (ed.),Mind and Cosmos:Essaysin con- 27. B. S. Bloom, Report on Creativity Research by the Examiner's temporaryscienceand philosophy,Yol. S, University of Pittsburgh Office of the University of Chic&go,in C. W. Taylor and F. Barron Press,Pittsburgh, PA, 1966, pp. 22-40. (eds.),Scientific creatiuity: Its recognition and deuelopment,Wiley, 2. A. Newell, J. C. Shaw, and H. A. Simon, The Processof Creative New York, 1963. Thinking, in H. Gruber, G. Terrell, and M. Wertheimer (eds.), Contemporary Approaches to Creatiue Thinking, Atherton, New York, L962,pp. 63-119. 3. w. R. Reitman, Cognition and rhoughf, wiley, New york, lg6b. CYBERNETICS 4. H. A. Simon, "The structure of ill-structured problems," Artif. InteII. 4, 181-201 (1979). The phrase "control and communication in the animal and the 5. J. R. Hayes, CognitiuePsychology:Thinking and Creating, Dorsey machine" cart serve as a definition of cybernetics. Although Press,Homewood,IL, 1978. this term was used by Andr6 Marie Ampdre about 150 years 6. J. R. Hayes, The Complete Problem Soluer, Franklin Institute. ago (1) and its concepts were used by Heron of Alexandria Philadelphia, 1980. more than 1500 years ago (2), tt was the mathematician Wie7. C. Patrick, "Creative thought in poets," Arch. psychol. 26, L-74 ner who, in 1948, with the publication of Cybernetic.s(3), gave (1e35). name and meaning 8. C. Patrick, "Creative thought in artists," J. psychol. 4, Bb-zB (1e37). 9. H. A. Simon and R. K. Sumner, Pattern in Music, in B. Kleinmuntz (ed.),Formal Representationof Human Judgment, Wiley, New York, pp. 219-250,1968. 10. K. R. Popper, The Logic of Scientific Discoue4y,Hutchinson, London, pp. 3t-32,1959. 11. P. w. Langley, H. A. simon, G. L. Bradshaw, and J. M. zytkow, Scientific Discouery: An Account of the Creatiue Process,MIT Press,Cambridge,MA, 1980.
L2. H. A. Simon,P. w. Langley,and G. L. Bradshaw, Synthese 47,L (1e81). 13. D. Lenat,AM: An Artificial Inteltigence Approachto Discouery in
to this notion in the modern context. The name cybernetics is derived from the Greek word for steersman, xuBepvnrns,which in Latin becamegubernator, governor in English. The concept associatedwith this term was to characterize a mode of behavior that is fundamentally distinct from the customary perception of the operations of machines with their one-to-onecorrespondenceof cause-effect, stimulus-response, input-output, and so on. The distinction arises from the presenceof sensorswhose report on the state of the effectors of the system acts on the operation of that system. Specifically, if this is an inhibitory action that reduces the discrepancybetween the reported state of the effectorsand an internal state of the system, the system displays goal-oriented behavior (4), that is, if perturbed by any outside -"u.rr, it will
226
CYBERNETICS
return to some representation ofthis internal state, the goal. Although this scheme does not specify the physical nature of the states alluded to, nor ofthe signals reporting about these states-whether they are electric currents, mechanical or chemical agents, abstract symbols, or whatever-the biological flavor ofthe language used is apparent. This is no accident; in the formative years ofthis concept the close cooperation of Wiener with the neurophysiologist Rosenblueth created a physiological context. Moreover, this cooperation stimulated the philosophical inclination of these two men, and together with Bigelow they set the stage for still ongoing epistemological inquiries with the publication in 1943 of "Behavior, Purpose and Teleology" (5). Another fruitful m€na6e d trois of philosophy, physiology, and mathematics was the collaboration first of McCulloch, philosopher, logician, neurophysiologist, or "experimental epistemologist," as he liked to call himself, with a young, brilliant mathematician, Pitts, who published together two papers of profound influence on this emerging mode of thinking. The title of these papers almost give away their content: "A Logical Calculus of the Ideas Immanent in Nervous Activity" (6), written in 1943, and "How We Know Universals: The Perception of Auditory and Visual Forms" (7), published in 1947. Then von Neumann's fascination with seeing a parallelism of the logical organization of computations in nervous tissue and in constructed artifacts (8) brought him closeto McCulloch (9) and the people around him. The underlying logic of these various ideas and conceptswas the topic for 10 seminal conferencesbetween 1946 and 1953, bringing together mathematicians, biologists, anthropologists, neurophysiologists, logicians, and so on, who saw the significance ofthe notions that were spelled out in the title of the conferences: "Circular Causal and Feedback Mechanisms in Biological and Social Systems" (10). The participants became the catalysts for the dissemination of cybernetic concepts into the everyday vernacular (e.g., "feedback"), for epistemological inquiries regarding mentality, and of course "mentality in machines" (11). Should one name one central concept,a first principle, ofcybernetics, it would be circularity. Circularity as it appears in the circular flow ofsignals in organizationally closed systems, or in "circular causality," that is, in processes in which ultimately a state reproduces itself or in systems with reflexive logic as in self-referenceor self-organi'?ecursiveness" may be substituted zation, and so on. Today, for "circularity," and the theory of recursive functions (see Recursion), calculi of self-reference (qv) (12), and the logic of autology (13), that is, concepts that can be applied to themselves, may be taken as the appropriate formalisms. Mechanisms Consider again systems with a functional organization whose operation diminish the discrepancy between a specific state and a perturbation. The system's tendency to approach this specific state, the "goal," the "end," in Greek r6los (hence"telsology"), may be interpreted as the system "having a purpose" (14). The purpose of invoking the notion of "purpose" is to emphasize the irrelevance ofthe trajectories traced by such a system en route from an arbitrary initial state to its goal. In a synthesized system whose inner workings are known, this irrelevance has no significance. This irrelevance becomeshighly significant, however, when the analytic problem-the machine identification problem-cannot be solved, because, for instance, it is transcomputational (15) in the sensethat with
known algorithms the number of elementary computations exceeds the age of the universe expressed in n.tror"conds. Hence, the notion of purpose can becomeeffective when dealing with living organisms whose goals may be known but whose behavioral trajectories are indeterminable. Aristotle juxtaposes the "efifrcient cause," that is, when "because" is used to explain the flow of things, with the "final cause,"that is, when "in order to" is used for justifying actions.In the early enthusiastic stages of cybernetics language appropriate for living things like desires,wants, ethics, thought, information, mind, and so on were sometimesused in talking about synthesized behavior. Traces of this are found today in terms tike "computer memory," "processingof information," "artificial intelligence," and so on. The fascination with "bio-mimesis," that is, "imitating life" keepsthe present-dayfollowers of Aristotle searching for a synthesis of aspectsof mentation by using the powers of the large mainframe computers. On the other hand, the analytic problem "what is mind?" and "whence ideas?" in the Platonic sensekeeps cyberneticians searching for principles of computation and logic underlying sensorimotor competence, thought, and language. Although in the early phases of this search the notion of purpose appearedin many studies of these processes,it is significant that a completely purpose-freelanguage can be developed for the same type of systems by paying attention to the recursive nature of the processesinvolved. Of interest are circumstancesin which the dynamics of a system transforms certain states into these very states, where the domain of states may be numerical values, arrangements (arrays, vectors, configurations, etc.), functions (polynomials, algebraic functions, etc.), functionals, behaviors, and so on (16). Depending on domain and context, these states are in theoretical studies referred to as "fixed points," "eigenbehaviors," eigenoperators," and lately also as "attractors," a terminolory reintroducing teleology in modern dress. Pragmatically, they correspondto the computation of invariants, may they be object constancy, perceptual universals, cognitive invariants, identifications, namings, and so on. Of course,the classical casesof ultrastability and homeostasisshould be mentioned here (17). Epistemology In thermodynamically open systems a significant extension of circularity is closure, either in the senseof organizational closure as, for example, in the self-organizing system, or in the senseof inclusion as, for example, in the participant observer. Self-organizing systems are characterized by their intrinsic, nonlinear operators, (i.e., the properties of their constituent elements: macromolecules, spores of the slime mold, bees, etc.), which generate macroscopically (meta-) stable patterns maintained by the perpetual flux of their constituents (18). A special case of self-organization is autopoiesis (19). It is that organizationwhich is its own Eigen-state: the outcome of the productive interactions of the componentsof the system are those very components. It is the organizatton of the living, and, at the same time, the organization of autonomy (20). The notion of "org antzation" carries with it that of order and then, of course, of disorder, complexity, and so on. It is clear that these notions are observer dependent,hence the extension of cybernetics from observed to observing systems and with this to the cybernetics of language (2I). Here language is thought to be precisely that communication system that can talk about
COMPUTERFOR ARTIFICIAIINTELLIGENCE DADO: A PARALLEL itself: a langUage must have "langu age" in its lexicon. Autology is the logic of concepts that can be applied to themselves (13). Among these are consciousness and conscience: Their corollaries, epistemolory and ethics, are the crop of cybernetics.
227
13. L. Ldfgren, Autolory for SecondOrder Cybernetics,tnFundarrlentals of Cybernetics, Proceedings of the Tenth International Congresson Cybernetics,AssociationInternationale de Cybernetique, Namur, pp. 17-23, 1983.
14. G. Pask, The Meaning of Cybernetics in the Behavioral Sciences (The Cyberneticsof Behavior and Cognition: Extending the Meaning of "Goal"), in J. Rose (ed), Progress of Cybernetics,Vol. 1, BIBLIOGRAPHY Gordon and Breach, New York, pp. 15-44, t969. 15. H. J. Bremmermann, Algorithms, Complexity, Transcomputabil1. M.Zeleny,"Cybernetics and general systems:A unitary science?" ity, and the Analysis of Systems,in W. D. Keidel, W. Haendler, M. Kybernetes8(1), L7-23 (1979). Spreng, (eds.), Cybernetics and Bionics, R. Oldenbourg, Muen2. O. Mayr, The Origins of FeedbackControl, MIT Press,Cambridge, chen, pp. 250 -263, 1974. MA, 1969. (eds.),Self-Organization and Man3. N. Wiener, Cybernetics: Or Control a,nd Communication in the 16. H. Ulrich and G. J. B. Probst agementof Social Systems,Spring€r, New York, 1984. Animal and the Machine, Wiley, New York, 1948. L7. W. RossAshby, An Introduction to Cybernetics,Chapman & Hall, 4. R. Conant (ed.), Mechanisms of Intelligence: Ross Ashby's WritLondon, 1956. ings on Cybernetics, Intersystems Publications, Seaside, UK, 18. P. Livingston (ed.),Disorder and Order, Stanford Literature Stud1 9 8 1. ies 1, Anma Libri, Stanford, 1984. 5. A. Rosenblueth, N. Wiener, and J. Bigelow, "Behavior, purpose 19. H. R. Maturana and F. J. Varela, Autopoiesisand Cognition, D. and teleology,"Philos. Sci. 10, 18-24 (1943). Reidel, Boston, 1980. 6. W. S. McCulloch and W. H. Pitts, "A logical calculus of the ideas 20. F. J. Varela, Principles of Biological Autonomy, Elsevier, Northimmanent in nervous activity," BuIl. Math. Biophys. E, llb-lgg (1943). Holland, New York, 1979. 7. W. Pitts and W. S. McCulloch, "How we know universals: The 2I. H. R. Maturana: Biology of Language: The Epistemologyof Realperception of auditory and visual forms," Bull. Math. Biophys. 9, ity, in Psychology and Biology of Language and Thought, Academic Press,New York, 1978. 127-L47 0947). 8. J. von Neumann, The Computer and the Brain, Yale University Press,New Haven, CT, 1958. 9. J. von Neumann, The General and Logical Theory of Automata, in L. A. Jeffress (ed), Cerebral Mechanisms in Behauior, the Hixon Symposium,Wiley, New York, pp. 1-41, 19b1. 10. H. Von Foerster et al. , Cybernetics:Circular Causal and,Feed,back Mechanisms in Biological and Social Systems,Proceedingsof the Sixth, Seventh, Eighth, Ninth, and Tenth Conferenceson ,,Cybernetics: Circular Causal and Feedback Mechanisms in Biological and Social Systems," (b vols., The Josiah Macy Jr. Foundation, New York, 1950-1955. 11. D. M. MacKay, Mentality in Machines,inProceedingsof the Aristotelian Society,Supplement LgS2,pp. 61-g6, IgEz. L2. F. J. Varela, "A calculus for self-reference,"Int. J. Gen.Sys/. 2, 524 (L97il.
General Referenees K. Gunderson, cybernetics, tn The Encycloped,iaof phitosophy, Macmillan, New York, Vol. 2, pp. ZB0-284, Lg7Z. B. P. Keeney, Aestheticsof Change, Guilford, New york, 1ggg. w. S. McCulloch, Embodiments of Mind, MIT press, Cambridge, MA, 1965. W. T. Powers, Behauior: The Control of Perception,Aldine, Chicago, 1973. H. vox Fonnsrnn University of Illinois
DADO: A PARAILEICOMPUTER FoR ARTIFICIAL mance. Thus, parallel processing has assumed an important INTELTIGENCE position in current AI research. This entry outlines the develA considerableamount of interest has been generatedrecently in specialized machine architectures designed for the very rapid execution of AI software. The Japanesefifth-generation machine project (see Fifth-generation computing), for examPle, promises to deliver a device capable of computing solutions of PROLOG programs at execution rates on the order of many thousands of logical inferencesper second.Such a device will require high-speedhardware executing a large number of primitive symbol manipulation tasks many times faster than today's fastest computers. This rather ambitious goal has led some researchers to suspect that a fundamentally different computer organizatuon is necessary to achieve this perfor-
opment of a specific parallel machine architecture that has come to be called DADO (1,2). DADO is a binary tree-structured multiprocessor architecture incorporating thousands of moderately powerful processingelements (PEs). Each PE consists of a fully programmable microcomputer with a modest amount of local memory. Architectureof DADO DADO distinguishes itself from other parallel architectures in several ways. First, although DADO is designed as a massively parallel system, the granularity (storage capacity and functionality) of each PE remains an open theoretical issue.
228
DADO: A PARATLET COMPUTERFOR ARTIFICIATINTELLIGENCE
Studying real-world applications executed on a DADO prototype will shed more light on the granularity of a production version of the machine. Second,DADO is designed for a specralizedset of applications implemented in production system (PS) and logic programming (qv) form. Third, the execution modesof a DADO PE are rather unique. Each pE may operate in slave mode whereby instructions are executedas broadcast by some ancestor PE in the tree. Alternatively, a pE may operate in master mode by executing instructions from its local RAM. This rather simple architectural principle allows DADO to be fully partitioned into a number of distinct "subDADos," each executing a distinct task. Finally, DADO has been designedaround commercially available, state-of-the-art technology rather than designing everything from scratch. A 15 PE prototype DADO1 machine constructedfrom Intel 875I microprocessorchips has been operational at Columbia University since April 1983. A 1023 PE DADO2 prototype was completedin December1985. DADO2 is not viewed as a performancemachine but rather as a laboratory vehicle to investigate fine-grain processors. Although DADO2 is expected to achieve significant performance improvements in AI software (indeed, DADO2 will deliver over 570 x 106instructions per second),more important it will provide a test bed for the next-generation machine. The performance of an Rl-like rule system running on DADO2 has been studied. Analytical projections indicate that DADOZ can achieve85 cycles(rule firings) per secondusing the Intel 8751based PE design. Present statistics for Rl implemented in a variant of OPSSexecutedon a DEC Vax-11/780 indicate that 30-50 cycles per second can already be achieved. Thus, DADO2 performs 50Vobetter than the projected performance of a serial machine much larger and more complex. If a 32-bit PE design were used, DADO2 could be expectedto achieve a factor of 16 better performance, or nearly 1360 cyclesper second! Granularitylssues
The advantages of processing WM in parallel have been ignored, however. In a manner analogousto partitioning rules to a set of PEs, WM elements may also be distributed to a set of independent PEs distinct from those storing rules (8,4). The grain size of a PE may then directly affect the number of WM elements that may be processedconcurrently. Thus, with a larger number of smaller PEs wM may be operatedupon more efficiently than with a smaller number of larger PEs. It follows that a "tug of war" between production-level and WM-level parallelism provides an interesting theoretical arena to study the trade-offs involved between parallel processorsof varying granularity. Languagefor Parallelism However, the reported statistics for R1 are basedon a problemsolving formalism that has been fine tuned for fast execution on serial processors,namely OPS5.Thus, the inherent parallelism in Rl may bear little resemblanceto the inherent parallelism in the problem R1 solvesbut rather may be an artifact of current OPS5 production systems programming on serial machines. An alternative approach is to provide other formalisms that allow one to explore and implement much more parallelism than OPSSencodesor encourages.Toward that end, the development of HerbAl (named in honor of Herbert Simon and AIIen Newell) has been undertaken. HerbAI is a production system language upward compatible with OPS5 but providing constructs to manipulate WM in parallel and execute multiple rules in parallel. HerbAI thus provides additional constructs that make more effective use of the underlying DADO architecture, potentially producing more dramatic speedup of AI computation than may be possiblewith OPSSor specialtzedOPS5 processors.The developmentof a logic-based progTamming formalism, called LPS, for logic programming system,has also been undertaken. LPS is somewhat similar to HerbAt but provides a more powerful logical unification pattern matching (qv) operation as in PROLOG.
Many issueshave arisen while studying the granularity question. For example, when the amount of RAM increases,the BIBLIOGRAPHY number of distinct PEs decreasesfor a fixed-sizemachine, thus reducing the potential parallel execution of code. However, 1. S. J. Stolfo and D. P. Miranker, DADO: A Parallel Processorfor Expert Systems, in Proceedings of the 1984 International Parallel decreasingthe RAM affects the size and resultant complexity ProcessingConference,IEEE,Ann Arbor, Michigan, 1984. of codethat may operate at an individual PE, thus restricting 2. S. J. Stolfo and D. E. Shaw, DADO: A Tree-Structured Machine the scopeof applicability of the architecture. Architecture for Production Systems, in Proceedingsof the Second A simple illustration using the Rl expert system may clarNational Conferenceon Artificial Intelligence, Carnegie-Mellon are that rules of ify matters. A PS consists of a number Pittsburgh, PA, August 1982. University, matched against a database of facts called working memory g. D. P. Miranker, Performance Estimates for the DADO Machine: A ffim. As the size of RAM is increased, more rules and WM, Comparison of TREAT and RETE , in Proceedingsof the Internaelements may be stored and processedby an individual PE. tional Conferenceon Fifth GenerationComputer Systems,Institute However, since fewer PEs are available, less work may be for New Generation Computing, Tokyo, Japan, November, 1984. performed in parallel. Conversely, by reducing the size of 4. S. J. Stolfo, Five Parallel Algorithms for Production System ExecuRAM, fewer rules and WM elements may be located at a PE, tion on the DADO Machine, in Proceedingsof the Fourth National but the additional PEs may be able to perform more operations Conferenceon Artificial Intelligence, Austin, TX, August 1984in parallel (seeRule-basedsystems). Recent statistics reported for Rl- indicate that of a total of S. Srolpo Columbia UniversitY 2000 rules and severalhundred WM elements,on average3050 rules need to be matched on each cycle of operation. Thus, even if 2000 finer grain PEs were available to process the DATA DRIVEN PROCESSINC. See Processing, bottom-up and rules, only 30-50 PEs would perform useful work. Instead, if' say, 30-50 coarser grain processorswere used, each storing top-down. many more rules, all of the inherent production matching parallelism would be captured, making more effective use of the DECISIONAIDS. See MilitarY, APplications in. hardware.
THEORY DECISION
THEORY DECISION
229
cost-effectiveness analysis). The outcomes scales can reflect objective measures (e.g., survival) or can reflect the preferencesofthe decision maker (or the client or even the patient). If the outcome metric is preferential, a variety of techniques can be used to assessthe attitudes ofthe decision maker, but all depend on the principle ofsubstitution, whereby a decision model with many outcomes is reduced to a preferentially equivalent model with only two outcomes.The purpose of such a reduction is obvious-the decision rule can then become "choosethe strategy with the highest chance ofproducing the better outcome." The most theoretically straightforward assessment technique is the lottery or standard gamble. The decision maker puts the outcomes in an ordinal scale and creates a standard two-state lottery with probabilityp ofgetting the best outcome and probability 1 - p of getting the worst. Each intermediate outcome is then considered, and the decision maker decidesthe value ofp for which that intermediate outcome is preferentially equivalent to the standard gamble. The utility of the intermediate outcome is then proportional to the indifference value ofp. Utilities can reflect only preference for outcome (value) but usually also reflect the attitude of the decision maker toward risk and even regret about poor outcomes.
Decision theory provides a formal, prescriptive framework for making logical choicesin the face of uncertainty. Although its origins can be traced back to the eighteenth century in the writings of Jakob Bernoulli, it was not axiomatically developeduntil the mid-twentieth century, by von Neumanh, Borel, Morgenstern, Luce, Raiffa, and Savage. Continued research has focusedaround three main streams: descriptive (the use of decisiontheory to describebehavior), normative (the use of the axiomatic theory to select actions),and prescriptive (the use of axiomatic systems or corrective techniques to improve decision making). Conflict continues to be provoked by discrepancies between observed,potentially erroneous, human behavior, and what would be predicted if decision makers were acting on the basis of a consistent set of axioms. Diverse data about the world are often combined using Bayes's rules as a mechanism of inference (see Bayesian decision methods); hence,the field is sometimescalled Bayesian decisiontheory. These techniques have been applied to such diverse fields as business, engineering design, medicine, military strategy, public health, public policy, and resourcemanagement.A variety of expert computer programs have used this basically probabilistic mechanism of reasoning. Relatively few programs have successfully combined it with categorical apStep 4. Once probability and utility values have been asproachesemploying frame- or rule-basedinference (seeFrame signed, the decision tree is evaluated by calculating the expectheory; Rule-basedsystems). tation ofthe utility at each chance node and by applying the maximization operator at each decision node. Evaluation begins at the distal end of the tree and proceeds by backward FormalDecisionAnafysis induction, averaging out and folding back until the root node Formal decision analysis involves five basic steps. is reached. The branch ofthat node with the highest expected utility corresponds to the optimal course ofaction. Step 1. First, the decisionmaker must structure the problem at hand, generating a list of possibleactions,events, and Step 5. The final and perhaps most important step of deciattributes/states of the world to consider. Although decision sion analysis is to perform sensitivity analyses by varying the analysis provides methods to manipulate this list, its genera- assumptions of the model in a systematic fashion to explore tion is largely a creative process.A convenient representation what the optimal choice would be under different condiiions for this structure is a decision tree. Three types of data ele- and to determine whether the best choice is robust or sensitive ments or nodes appear in such trees: decision nodes (corre- to reasonable variations. Such sensitivity analyses are often sponding to actions over which the decision maker has con- performed on computer systems and are expressed using a trol), chance nodes (corresponding to events that can be variety of standard graphical formats. describedin terms of probabitities that are beyond control or states of the world that are unknown to the decision maker), A Medical Problem and terminal nodes or outcomesstates that provide summary descriptions of the present and future world (prognosis),be- As an example, consider a simple generic medical problem (see yond the time horizon of the decision tree but conditioned on also Medical advice systems), represented as a simple tree in each path through the tree. Figure 1, corresponding to the problem of choosing between treating (action 1), performing a diagnostic test (action 2) Step2. Once a decisionproblem has been structured, proba- (gathering additional information), and withholding treat_ bilities (either point estimates or distributions) are associated ment (action 3) in a patient who may or may not have a given with the branches of each chancenode. Becauseobjective data disease,where the test is imperfect and the treatment is asso_ are fundamentally descriptive of past events and the decision ciated with both risk and benefit. Decision nodes are repre_ model uses its probabilities to predict future events, objective sented as squares, chance nodes as circles, and terminal nodes data can only serve as anchor points for the required subjec- as rectangles. If treatment is given or summarily withheld, tive estimates. For example, the probability of diseasein a prognosis is determined by the probability of disease. If the given patient must be modified to reflect the other diagnostic test is performed, it may provide either correct or incorrect information already obtained, and prognosticdata -nrt ,eflect results, but those results will determine whether treatment is the presenceof other diseases(comorbidities). gr-v-en. The probability of a positive test result in the presence ofdisease is called the sensitivity ofthe test; the probability of Step3. The next step in the decisionanalysis is to assign a a negative test result in the absence of disease is called the consistent set of cardinal values to each of the outcome states. specificity. The selection of the optimal action (among these Frequently, outcomes are described in terms of multiple at- three) depends on five factors: the probability of diseise (p), tributes that are condensedinto a single scale,but alternative the sensitivity ofthe test, the specificity ofthetest, the benefii techniquesallow analysis with disaggregatedattributes (e.g., of treating patients with disease( Ut*"t ai"- Unotreat dis),and the
230
DTEPSTRUCTURE present Disease Treat Disease
TREATMENT Disease absent Diseasepresent Treatment Disease absent
Diseasepresent Negative
Notreatment Disease absent
present Disease NO TREATMENT Disease absent
Treat
No disease
Test
ThenTreat
Disease
Test
ThenTreat
Nodisease
Test
Then No Treat Disease
Test ThenNo Treat Nodisease
No Treat
Disease
No Treat
No disease
Figure 1. Decision analysis.
risk of treating patients without disease ( [/r,ot.""tnodis f/t."tnodis).The expected utility of empiric treatment equals Although sensitivity analyses p . Ut eatdis+ (f -p)Ut eutnodis. may be performed on any parameter or combination of parameters, in this casethe "sofbest"datum is usually the probability of disease.Thus, it can be useful to divide the domain of p into three regions: If the probability of diseaseis high, treatment is best; Lfp is low, withholding treatment is best; and tf p is intermediate, testing is the optimal action. The values of p that delineate the transitions from treating to testing and from testing to withholding treatment are called thresholds. These values can be found by a variety of techniques. Simple algebraic solutions to this generic tree are available and have been applied to a broad variety of medical problems (seeUncertainty and probability in AI, representation of). General References T. Bayes,"An essaytowardssolving a problemin the doctrineof Philos.Trans.Roy.Soc.(Lond.),53, 370-375(1763). chances,"
o
ctiues:Prefer' ",#:":il';:h! fi:I3',?;:,";; X::{,!"{+'J:1,',"?lle
R.D. Luce and H. Raiffa, Games and Decisions, Wiley, New York, L957. S. G. Pauker and J. P. Kassirer, "The threshold approach to clinical decisionmaking," .Af.Eng. J. Med., 302, 1109-1117 (1980). H. Raiffa and R. Schlaifer, Applied Statistical Decision Theory, MIT Press,Cambridg", MA, 1968. L. J. Savage,The Foundations of Statistics,Wiley, New York, 1964. P. Szolovitsand S. G. Pauker, "Categorical and probabilistic reasoning in medical diagnosis,"J. Artif. Intell. 11, 1L5-L44 (1978). V. von Neumann and J. O. Morgenstern, Theory of Games and Economic Behauior, Princeton University Press,Princeton, NJ, L944.
DEEPSTRUCTURE Deep structure is central to the description of natural-language syntax within the framework of transformational grammar (qv) (I,2) (see Grammar, transformational). It plays two key roles: to relate the words of a sentenceto the meaning and to help express generalizations about grammatical structure. What ls Deep Structure? The motivation for deep structure is the fact that the surface order of words in a sentenceis only a partial indication of its relation to other sentences:Pairs of sentencesthat look alike are sometimes unrelated, and pairs of sentencesthat look different can be closely related. An example is a pair of active and passive sentences,such as 1 and 2 below, that are different in form but are quite similar in meaning. 1. John saw Mary. 2. Mary was seen by John. In these two sentencesthe predicate-argument relations expressedare the same:There is an act of seeingdescribed.John did the seeing,and Mary was the person seen.Yet, the order of words and the gTammatical structure of the two sentencesare different. John is the subject of the first sentence;Mary is the subject of the second.The structural difference between the two sentences is best expressed in terms of tree diagrams (Figs. 1 and 2).
M. C. Weinstein, H. V. Fineberg,B. J. McNeil et al., Clinical Decision Analysis, Saunders,Philadelphia, 1980.
S. PauKER and J. HoILENBERG New England Medical Center
\
NP
I
J o hn Figgre
I
I saw
NP
I
Mary
1. Phrase structure tree for active sentence.
DEEPSTRUCTURE q
// NP-
ll\/\
I
vp
\pp
lnuxYPNP
lrlll
Mary
was seen
by
John
231
This last approachis closestto current transformational grammar known as Government-Binding Theory (2), where the input to semantics is through a kind of annotated surface structure, which includes indications of the transformations that have applied and therefore what the deep structure would be.
Figure 2. Phrasestructuretree for passivesentence. This relationship between the active and passive versions of the same sentencedoesnot dependon the particular words; it is not a fact about John or Mary or see.Rather there is a general, systematic relation between active and passive sentences: namely, the subject of a passive sentence plays the same role as the object of an active sentence.In transformational grammar these facts are describedby saying that each sentence is associatedwith two distinct syntactic structures. One is the surface structure description shown in Figures 1 and 2 and the other a deep structure, which is related to the surface structure by a set of transformations, or tree-to-tree mappings. In the active-passive example both sentenceshave the same deep structure, a structure similar to the surface structure of the active form: The passive and active sentences have the same deep subject and object. A passive transformation maps the deep structure onto the surface structure by moving the deep subject into a by phrase and moving the deep object into subject position. In the "standard theory" of transformational grammar (1) the syntactic componentconsistsof a context-free base, which generates deep structures, and a set of transformations, which map these transformations onto the surface (seeRef. 3, chapter 4, for an introduction). More recent versions of transformational grammar (2), where the power and variety of transformations has been severely limited, would assign to active and passive sentences deep structures differing in certain aspects.fn particular, the sites from and to which objects are moved are noted in the structure. Nevertheless, the deep subject and deep object would still be the same for active and passive sentences.In addition to the active-passive relationship, transformations of similar deep structures ean relate questions to statements, and various subordinate clauseswhere arguments are missing on the surface to a deep structure where the arguments are present. Natural-languageProcessing The notion of deep structure has been used in natural language processing systems in three ways.
Meaning Deep structure is where predicate-argument relationships are expressed. The idea that deep structure can be extended to capture all aspectsof meaning was pursued in the 1970sunder the label generative semantics,but it is now consideredunsuccessful.As an example of the problems in treating deep structure as the sole input to semantics, note that in sentences1 and 2, the paraphrase relation holds if John and Mary are simply people;but if Mary is a doctor, and seeis being used in the senseof "consulted with," then "John saw a doctor" is not equivalent to "A doctor was seen by John." Similar inexact paraphrases can be easily found, for example, when quantifierc (each,all, a, etc.) (see Logic, predicate) are introduced in the noun phrases. Such problems make the notion that deep structure expressesall aspectsof meaning difficult to hold. The regularities of langu age expressedby deep structure can be expressedin other ways. For example, caseframes (see Grammar, case) are expressionsof the meaning of predicates and arguments; they need not be seen as deep structures. Thus, the key claim of deep structure is not simply that there are relationships not evident on the surfacebut also that these are to be expressed in the same terms as the surface description (i.e., &s phrase structure trees). The term "deep structure" is sometimes used metaphorically, not related to any theory of transformational grammar, to describesystematic structures that are not directly obvious and that relate more closely to meaning.
BIBLIOGRAPHY 1. N. Chomsky,Aspectsof the Theoryof Syntax,MIT press,Cambridge,MA, 1965. 2. N. Chomsky,Lectureson Gouernment and Bind,ing,ForisPublications,Dordrecht,The Netherlands, 1991. 3. T. Winograd,Languageas a CognitiueProcess, Addison-Wesley, Reading, MA, 1983. 4. s. R. Petrick,Transformational Analysis,in R. Rustin(ed.), Natu_ '27-4r, ral LanguageProcessirg,Algorithmics,New york, pp. 1973. 5. w. A. woods,R. M. Kaplan,and B. Nash-webber, TheLunar scj-
1. A parser may attempt to directly implement a transformaencesNatural Language Information System: Final Report, BBN tional grammar, analyzing a sentenceby in effect running Report No. 2378, Bolt, Beranek and Newman, Cambridge, MA, transformations in reverse (4). L972. 2. A parser may produce a deepstructure representation (".g., 6. M. Marcus, A Theory of Recognition for Natural Language, MIT as input to the semantics) without directly implementing Press,Cambridge,MA, 1980. transformations. For example, the LUNAR understanding system used an augmented-transition network (qv) to build D. Hnrnlp a deep-structure representation that the semantic compoAT&T Bell Laboratories nent then interpreted (5). 3. The ideas of transformations and deep structure may be assumed in a parser that produces a phrase structure de- DEDUCTION. see Inference; Logic; Reasoning. scription annotated with indications of where elements must have been in the deep structure. The PARSIFAL parser (6) follows this approach. DEFAULTLOCfC. SeeReasonirg, default.
232
DEMONS
DEMONS
Parallel-distributed models of this type have been developed for a number of different purposes.Computer scientists A demon is a separate,autonomousprocessthat runs in paralhave found that they provide attractive alternatives to tradilel with other processes(demons)and may interact with them. tional sequential symbol-processingapproachesto a number of The idea was introduced by Selfridge (1) (seeFiS. 1) in a model probleffis, particularly problems that can easily be described called Pandemonium. Pandemonium was a model designedto as relaxation or constraint satisfaction searehes.One version perform automatic recognition of hand-sent Morse *a. by of this type of model is the Boltzmann machine (6). Cognitive means of a large number of demons.The demonswere, essen- scientists have found these models attractive because they tially, detectors for particular properties of the input, and the provide a natural framework for accounting for the human more evidence a demon accumulated for the unit it repre- ability to exploit large numbers of simultaneous constraints sented, the louder it would "shout" to other demons.Through (7). They have also recently begun to enjoy some popularity the ensuing Pandemonium, some demons became more among neuroscientists who seek ways of capturing in explicit strongly activated than others; at the top level of the system a form the computational properties of real neural nets (8). "decision demon" chosethe most strongly activated letter demon and output the correspondingletter. The essential idea of the Pandemonium model-namely to Learning distribute an information processingtask to a large number of Currently the central issue facing the developmentof comcommunicating parallel processors-lives on today in a numputational models of this sort is the learning problem. Hand ber of different guises. One class of models that bears some tuning of such networks is difficult for complex systeffis,and it relation to Pandemonium are models like Hewitt's actor model is desirable to be able to allow the system to find its own set of (2), in which separate, autonomous actors communicate via connection strengths. The perceptron convergenceprocedure messagesof arbitrary complexity and carry out arbitrary comof Rosenblatt (9) is adequate for networks consisting of only putations on these messages. one layer of modifiable connections,but most interesting computational problems require more than one layer of modifiable connections. Four general schemesexist for training multiParallelDistributedProcessing layer networks. Theseare called competitivelearning (10-13), Recently, there has been a strong resurgenceof interest in the Boltzmann machine (qt) learning algorithm (6), the reinthe idea of distribution of processingto large numbers of simforcement learning schemeof Barto et al. (14), and the backple processing units of highly restricted complexity. These propagation learning algorithm (15). models are generally called connectionist (3) or parallel-distributed processingmodels (4,5). In this class of models the individual computational units are very simple processorsindeed. Generally, each of these units takes on an activation BIBLIOGRAPHY that is some monotonic function of its net input from other units and sendsout a (possibly graded) output signal basedon A Paradigmfor Learning,in The 1. O. G. Selfridge,Pandemonium: its activation. The net input to the unit is generally weighted Her Majesty'sStationeryOfof ThoughtProcesses, Mechanization fice.London,1959;Figure1 is alsoin U. Niesser(ed.),Cognitiue simply as the sum of the outputs of other units; the weights Norwalk,CT, 1967,p. 75. Appleton-Century-Crofts, Psychology, may be excitatory (positive) or inhibitory (negative).
ITIVE COGN DEMONS
IONAL COMPUTAT DEMONS
DATAOR IMAGE DEMONS Figure
1. Parallel processing in Selfridge's (1) "Pandemonitlm"
model.
UNDERSTANDING 233 DISCOURSE 2. C. Hewitt, Viewing Control Structures as Patterns of Passing Messages,AI Memo 4L0, MIT AI Laboratory, Cambridge, MA, 1976. 3. J. A. Feldman and D. H. Ballard, "Connectionistmodelsand their properties," Cog.Scl. 6, 205-254 (L982). 4. D. E. Rumelhart, J. L. McClelland, and the PDP research group (eds.),Parallel Distributed Processing:Explorations in the Microstructure of Cognition, YoL 1, Bradford Books, Cambridge, MA, 1986. 5. J. L. McClelland, D. E. Rumelhart and the PDP research group (eds.),Parallel Distributed Processing:Explorations in the Microstructure of Cognition, Yol. 2, Bradford Books, Cambridge, MA, 1986. 6. D. Ackl.y, G. Hinton, and T. Sejnowski,"Boltzmann machines: Constraint satisfaction networks that learn," Cog.Sci. 9, 113-147 (1985). 7. J. L. McClelland, D. E. Rumelhart, and G. E. Hinton, The Appeal of Parallel Distributed Processing,in D. E. Rumelhart, J. L. McClelland, and the PDP research group (eds.), Parallel Distributed Processing:Explorations in the Microstructure of Cognition, Vol. 1, Bradford Books,Cambridge,MA, pp. 3-44, 1986. 8. E. T. Rolls, Information Representation,Processing,and Storage in the Brain: Analysis at the Single Neuron Level, in J.-P. Changeux and M. Konishi (eds.),Neural and Molecular Mechanisms of Learning, Springer-Verlag, Berlin, 1986. 9. F. Rosenblatt, Principles of Neurodynamics, Spartan Books, Washington, DC, 1962. 10. C. von der Malsburg, "Self-organizing of orientation sensitive cells in the striate cortex," Kybernetik L4, 85-100 (1973). 11. S. Grossberg, "Adaptive pattern classification and universal recoding,I: Parallel developmentand coding of neural feature detectors," BioI. Cybernet.23, 121-134 (L976). L2. T. Kohonen, Self-organization and AssociatiueMeffiotA,SpringerVerlag, New York, 1984. 13. D. E. Rumelhart and D. Zipser,"Competitive Learningi' Cog.Scj. 9,75-L12 (1985). L4. A. G. Barto, R. S. Sutton, and C. w. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Sys/. Man Cybernet. SMc-lg, 8gb-846 (1e83). 15. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representationsby Error Propagation, ICS Report 8b06, University of California, San Diego, Institute for Cognitive Science,La Jolla, CA, 1985. J. McCLELLAND C arnegie-Mellon [Jniversity
DENDRAL DENDRAL is a rule-based system that identifies candidate molecular structures from mass spectral and nuclear magnetic resonance data, written in 1965 by Buchanan, Lederberg, Sutherland, and Feigenbaum at Stanford Heuristic Programming Project. DENDRAL uses generate and test for its problem solving, and it surpassesall human experts at its task, which has changedthe roles of humans and machinesin chemical research, see B. G. Buchanan and E. A. Feigenbaum, "DENDRAL and Meta-DENDRAL: Their Applications Dimension,"Artif. Intell.,11, 5-24 (1978). M. Tern SUNY at Buffalo
DEPTHMAPS. SeeVision. PROCRAMS.SeeLogic;Reasonitg,Plausible. DETERMINISTIC See Expert Systems; Medical Advice DIAGNOSISSYSTEMS. Systems.
UNDERSTANDING DISCOURSE The term discourse understanding refers to all processesof natural-language understanding (qv) that attempt to understand a text or dialogue. For such processes,the sentencesof natural langu age are elements whose significance resides in the contribution they make to the development of a larger whole, rather than being independent, isolated units of meaning. To understand discourse,one must track the structure of an unfolding text or dialogue, and interpret every new utterance with respect to the proper context-taking into account the real-world setting of the utterance as well as the linguistic context built up by the utterances precedingit (seealso Speech understanding). The problems of discourseunderstanding are thus closely related to those dealt with in the linguistic discipline of Pragmatics which studies the context dependenceof utterance meanings. Researchon natural-language understanding systems has often focused on the problem of analyzing the structure and meaning of isolated sentences.To deal with discourseinstead, a system must have all the capabilities necessaryfor sentence understanding, &s well as be able to apply rules of discourse structure, that specifyhow sentencesmay be combinedto form texts or dialogues. Even with such discourse-level extensions, however, a purely linguistic approachcan only construct the meaning of a text insofar as it follows from the meaning of its constituent utterances and the explicitly stated relations between them. In AI one tends to take a broader perspective,which emphasizesthe role of world knowledge in discourseunderstanding. By taking into account common-senseknowledge about the world, a system may derive semantic relations between constituents of the text that are not stated explicitly but that may be plausibly assumed.By invoking scripts (qv) and frames (see Frame theory), a system may analyze a text against the background of default assumptions about "normal" situations and "normal" coursesof events, thereby filling in information left implicit in the text, and also noticing when something deviates from the usual pattern and is therefore worthy of special attention. In this w&Y, a more complete understanding of tn" intended meaning of the text may be created. A discourse-understanding system worthy of that name should not only deal correctly with what is true or false in the world according to its input text, but should, at the same time, be able to distinguish between more and less important information-between what is crucial and what is mere background. With this capacity, a system would be able to generate adequate summaries of its input texts. A further level of understanding would involve the ability to infer what the "point" of a story or description is-to discover the more abstract, culturally relevant messagewhich is instantiated by the text. Much of the AI research on discourseunderstanding is oriented towards the development of systems to exhibit i.uron-
234
DISCOURSE UNDERSTANDING
able and cooperative behavior in a goal-directed interaction with a human-dialogue partner. Such systemswould do more than understand the literal meanings of the utterances of their interlocutor; they would have to be able to assess,to some extent, the intentions and purposesbehind these utterances.Methodsto achievethis are usually basedon the theory of speechacts (qv): the system recognizesthe goals which are conventionally associated with various types of utterances, such as assertions,questions,commandsand requests.Understanding an utterance at a deeper level is then viewed as establishing what goal the speaker wanted to achieve by performing the speech aet, and what role the speechact plays in achieving that goal. Often the goal can be seen as a subgoal that plays a role in achieving a higher level goal, and so on. By invoking plausible hypotheses about the goals the speaker may have, and about the methods he or she may employ to achieve them, a system may infer the intention behind a speechact. Empirical studies of human discourse usually deal with real-time oral communication or with written texts. Discourse-understandingcomputer progIafrs, however, will usually employ a video display terminal to communicate with their users in real time. They will thus use a new natural language interaction mode that did not exist before. It is therefore of some interest to study how the properties of discoursedependon the interaction mode-e.g., oh the amount of shared environment between the participants and on the sensory modality of the communication medium. Each of the main topics mentioned above are discussedbelow in somedetail: the structure of discourse,the semanticsof discourse,speechacts and pragmatics, and different I/O modalities.
tions", "stories", "plans" or other structural units which one may know have been there when the interaction was happening. With the move to the analysis phase, structural units becomelost in all the talk. The problem of locating a coherent discoursesemantic unit in natural talk is illustrated by the following example from a corpus of spatial-planning dialogues.There are five people involved: two primary speakers,A and B, who are jointly playitrg a game which involves a journey in Europe; C and D who are researchersand E who is a secretary who came by. A. We are in Spain, O.K. So, let's go to France next. I love France anyway. We had a great time there last year. And then Italy; did I tell you about the little restaurant we went to in Florence? B. Yeah. I think you did. It was better than the placein Rome we ate at before we took the plane. But, anywaY, tro. Let's go to Belgium next. Then C. Could you move closer to the camera,please. D. You're out of range A. O.K. yeah. But not if we have to go through Antwerp. B. Then Holland. A. When do we do Italy then? We can't miss it? B. On the way back to E. Sorry. I was looking for Dave. C. He's not here. We're running an experiment, I'll talk to you later. You are still out of camera range, by the way. A. Good.
The Structureof Discourse lntroduction. To understand a text or dialogue, one must understand the relations between its parts. Clearly, these parts are not just the individual sentences; sentences are joined together to form larger units, which in their turn may be the building blocks of yet larger units. It is important to discern these units correctly, becausea discoursemay assert specific semantic relations between the meanings of its constituent units: the content of one discoursesegment may, for instance, present a more detailed version, a justification, or a series of consequencesof an adjacent discoursesegment. The structure of a discourse also affects the interpretation of the individual sentences:it defines the semantic contexts that must be invoked in order to determine the interpretation of a pronoun, a definite description, or, in fact, any descriptive term. The formal description of the structure of spontaneousspoken discourseis even more complex than the formal description of the structure of written text. Everyday spoken discourse is characterized by interruptions, resumptions, backtracking and jumping ahead of oneself.Somehow,despite ,,disfluency"of everyday discourse,speakersand the apparent hearers manage to follow what is going on and to produce responsesto one another which are situationally appropriate ..rd which demonstrate an understanding of all of the "underspecified,,items of meaning which are found in sentences. Faced with the transcripts of a natural interaction, it is surprisingly difficult to identify the "descriptions", "explana-
B. Anyway. I saw the tulips last year. What about Italy? A. On the way back to Spain. You taking a vacation this year? Or loafing at work as usual? B. Haven't decided,You? A. Might go to Spain again. Then Germany's next, right? Competent language users would intuitively segment this discourseinto sectionsin which A and B are planning-actually developing their plan-and other sectionswhere they are commenting on placesthey have been, making small talk, or conversing with the researchers.In one exchange,neither A nor B are talkin g at all, but are listening in while C exchanges some quick words with the secretary who is looking for someone who is not there. In order to make it somewhat easier to find the "plannitg", one may arrange the text gfaphically as an outline, showing the "planning talk" in leftmost position and moving further to the right to represent the embeddedor secondary status of the comments and other interruptions to the development of the plan. It should be noted that when "other types of talk" are completed,A and B return to developing the plan which remains their central concern throughout this excerpt. A. We are in SPain,O.K" So, let's go to France next. I love France anyway. We had a gteat time there last Year.
UNDTRSTANDINC 235 DISCOURSE And then ItalY: did I tell you about the little restaurant we went to in Florence? B.
Yeah. I think you did. It was better than the place in Romewe ate at beforewe took the plane. (But, anyway, no.) Let's go to Belgium next. Then
C.
Could you move closer to the camera,please.
D. A.
You're out of range O.K. yeah. But not if we have to go through Antwerp
B . Then Holland. A. B. E. C.
When do we do Italy then? We can't miss it? On the way back to. Sorry. I was looking for Dave. He's not here. We're running an experiment I'll talk to you later You are still out of camera range, by the way
B. (Anyway.) I saw the tulips last year. What about Italy? A. On the way back to Spain. You taking a vaeation this yeafl Or loafing at work as usual? B. A.
Haven't decided,you? Might go to Spain for a few days. Then Germany's next, right?
Oncethe correct structural relations between the sentences in the discourseare established,it is possibleto determine the semantic interpretation of the individual sentences,and of the discourse segments built up out of these sentences.Methods for determining discourse structure and for building up semantic representations are discussedin more detail below. Many important phenomena which demonstrate the influence of discoursestructure on semantic interpretation are illustrated by the example discourseabove: Appropriate material must be available to resolve ellipsis. ("Did I tell you about the little restaurant we went to in Florence?""Yeah. I think you did d.,,) Appropriate candidate referents must be available to resolve anaphora. ("Did I tell you about the tittle restaurant we went to in Florence?" "Yeah. I think you did. It was better than the place in Rome we ate at before we took the plane. . .") Temporal referencepoints must be maintained and, if necessary,updated (to understand when events are assertedto take place.)
Spatial reference points must be maintained and, if necessary, updated (to understand the speaker's orientation in conceptual space) The identity of the speaker and hearer must be available (to recover the intended referents of "I" and "You") The specific "world" in which events are asserted to take place must be known. In the example discourse above one must distinguish between the "game" world and the "real" world: A is planning to vacation in Spain "this year" in the "real" world and had a great time in France "last year" in the "real" world. In the "game" world A and B are in Spain and planning a trip from Spain to France, Belgium etc.) In addition, it must be pointed out that correctly interpreting this discourse involves understanding the form and function ofa number oflinguistic and rhetorical structures, including: Narrative syntax-mechanisms, encoding update of temporal and spatial reference points Sentential syntax and semantics Question/ answer sequences Discourse "operators" such as "O.K.", "yes"r "tto"r "well", "anyway" which do not add independent information but which either (I) affirm or deny information available elsewhere (2) indicate a digression or a "return" to another topic Joking conventions (such as insulting a hard worker by accusing him of "loafing on the job,'.) Discourse embedding and return conventions Recent Directions in Modeling Discourse Structure. Re_ cent advances in understanding the structure of natural_ language discourse make it possible to segment complex talk and recover the integrity of,,discourse units', despite the complexity of the actual talk in which they occur. An important research focus within the past five years has been to capture the semantic, or ,,coherence" relations among the clauses and text segments which function to_ gether to communicate a set of mutually interconnected ideas (1-5). A secondresearch focus has been to understand the structural relations obtaining even in discourseswhich are not coherent but which are characterized by interrup_ tions, resumptions, hesitations, and other complex phenom_ ena arising from the social and processing constraints on actual talk (6-9). Some discussionsofcoherence relations in discourse are reviewed below. The following sections discusssome frame_ works that attempt to characterize the structure of dis_ course-accounting for coherenceand also allowing for digressions and interruptions. Discourse Coherence. It has been observed many times that not every sequenceof sentencesmakes up a ,text,'. fn a well-formed text, the sentences are perceived as working together to build up a unified whole, by expressing proposil tions which are related to each other in a limited number of specific ways. A number of coherence relations which may obtain among the constituents of a well-formed text have been identified, for instance, by Hobbs (2,S). He describeshow a semantic structure for a whole discourse may be built up recursively by recognizing coherence relations obtainin!
236
DISCOURSE UNDERSTANDING
between adjacentsegmentsof a text. He addresseshimself ing in its structure what was going on in the joint endeavor. initially to why it is that one finds discoursescoherent at What was surprisirg, and most significant, however, was all-what are the sourcesof discoursecoherence?Not surthat the choiceof possiblereferents for pronouns in the text prisingly, the ultimate sourceof discoursecoherencelies in reflected the structure of the task as well. In discussing a the coherenceof the world or object described.One can find part of the object involved in the task at hand, one could a text coherent if it talks about a set of objectsor states or refer to it with a pronoun; similarly one could refer to the events which one knows to be coherent.Thus even a gasped entire higher level unit with a pronoun, or even to the out, jumbled narrative of a disaster may appear "coherent" compressoras a whole. It was not possibleto use a pronoun and be "understandable" when one brings to the text the to refer to the objectsand subtasks involved in a part of the belief that the disaster formed a coherent set of events, task which had already been completed.In the tree of the related causally to one another and affecting in various discoursetaskisubtask elements one was blocked from reways the people, objects, and situations described. Disferring to a task element in a branch to the left of the course coherencein the usual, more narrow sense of the branch currently being developed.Grosz' discovery, thereword refers to conventional semantic relation obtaining before, was that discoursehas a structure in which the placetween adjacent discourse segments. For instance, a sement and semantic relations obtainitrg among the clauses quence of two sentences,two stories, or, generally speakmaking up the discourseplays a decisive role in the interitg, two discourseconstituents are found to be coherently pretation of given elements in that discourse. related to one another if one gives more detail about the Sidner (13) has shown that a structurally analogousacother, offers an explanation, or otherwise gives more inforcount of anaphora resolution also applies at a linguistic mation about the proposition expressedby the other. level of discourse structure which is independent of task Hobbs provides a method for allowing the coherencerestructure. In her model the candidates for anaphoric referlations in a discourseto emerge. He suggests segmenting ence are stored in a stack. An incoming discourseconstituthe discourse in an intuitive way and then labelling the ent which is treated as subordinated PUSHes new focused various naturally occurring segments with the coherence elements onto this stack, while the resumption of a susrelation(s) which tie them to immediately preceding conpended discourse constituent POPs the intervening focus stituents. There are two types of relations: coordination and elements off the stack. subordination relations. Coordinate coherencerelations inThe following sections present brief overviews of three clude parallel constructions and elaborations in which one frameworks which build on this seminal work and which discoversa commonproposition as the assertion of the comtry to provide more comprehensiveaccounts of the issues posite segment. Subordination relations obtain when one involved in understanding both "coherent" and "interconstituent provides background or explanatory informarupted" discourse-Reichman's Contest Space Theory (6), DiscourseStructures Theory developedby Groszand Sidner tion with respect to another. Hobbs' ideas of "coherence" (8) and Polanyi and Scha'sDynamic DiscourseModel (5,7, allow one to see how even the subsequentmoves in a conoboutside to an incoherent may appear 9). which and versation, ContexfSpace Theory. Reichman's context spacetheory server, ffiay be appropriate conversational moves for the participants-entirely coherent and describable with the deals with the structure of conversation (6). It associates (10,11). with each topic of discussiona context space-a schematic relations which he has outlined forelations rhetorical on structure with a number of slots. These slots hold the folMann and Thompson'swork lowing information: cuses exclusively on the relations which obtain within a coherent text (4). They assign a phrase-structureanalysis a propositional representation of the set of functionally reto texts, in which two subsequent constituents can be reTheir relations. lated utterances said tcl lie in this context space; lated through each of a number of specific that than detailed is more relations inventory of coherence the communicative function served by the utterances in provided by Hobbs.It lists solutionhood,evidence,justificathis context space; tion, motivation, reason, sequence,enablement, elaboraa marker reflecting the foreground-background status of tion, restatement, condition, circumstance,cause' concesthis context spaceat any given point in the conversation; sion, background,and thesis-antithesis. focus level assignments to the discourse elements in this Discourse Structure and Pronoun Resolution.In early context space; work on the structure of Task Oriented Dialogues, Grosz links to preceding context spacesin relation to which this (I2) provided an important demonstration of the hierarchicontext spacewas developed;and becal structure of natural texts. In the analysis of talk comspecification of the relations involved. an air repairing expert tween an apprentice and an pressor,she showedthat the discoursecould be represented The utterances that constitute the discourse are analyzed as a tree or outline in which the relationships among the the as "conversational moves" that affect the content of the variclauses could be chunked in a way which replicated not Perhaps task. ous context spaces.Reichman has paid specialattention to the original the of goal/subgoal structure surprisingly, in taking apart one part of the compressor'the conversational structures involved in arguments. Among the talk would focus on that operation; when the apprentice conversational moves she identifies, for instance, are assertion interhad finished dealing with that aspectof the job, and moved of a claim, explanation, illustration, support, challenge, develoPment. further reflectand ruption, along, move would the talk on to the next sub-task,
UNDERSTANDING 237 DISCOURSE An important and influential part of Reichman's theory is her treatment of clue-words-devices which speakers use to indicate when their discourseshifts from one structural level to another. Clue-words are commonly divided into PUSHmarkers and POP-markers. PusH-markers are linguistic signals that indicate the initiation of a new embeddeddiscourse constituent. Examples are "like", "by the way", and "for instance". POP-markers have the complementary function. They closeoff the currently active embeddedunit, and signal a return to a higher level of structure. Examples are "Well", ttso", ttanyw&y",
and
ttoKt'.
An extensive study of clue words in spoken French is presented by Guelich (14). Schiffrin (15) presents an extensive study for English. Merritt (16) discussesthe use of "OK" in serviceencounters.Cohen (17) studies clue words from a computational perspective.She draws two important conclusions: clue words decrease the amount of processing needed to understand coherent discourse;and clue words allow the understanding of discoursethat would otherwise be incomprehensible. Although Reichman's work provided much important insight into the functioning of discourse,her Context Spaceformalism fails to distinguish between those casesin which one can return to a previous topic by use of a simple POP, and those casesin which such a simple purely structural return is not possible and one must reintroduce the topic in order to continue talking about it. Reichman's Context Spaces are never "closedoff" and inaccessiblebecauseone can always say anything one wishes and continuing to talk about a matter dropped earlier always remains possible.Discoursestructural relations, in her account, are thus finally obscured by discoursesemantic relations obtaining among the topics of talk in the various units. The work of both Groszand Sidner (8) and Polanyi and Scha (5,7, and 9), incorporateselements of Reichman'swork-particularly her treatment of clue words-and separates structural and semantic relations betweenclauses.This separation allows for a treatment of "interruptions" and "resumptions" that is based on structural properties of the discourserather than being dependenton semanticrelationships among topics of talk. These two frameworks generalize upon Grosz' early work by providing an account of discoursestructure which is not task dependent. The DiscourseStructuresTheory. In the view of Grosz and Sidner (8), the structure of a discourseresults from three interacting components: a linguistic structure, an intentional structure, and an attentional state. These three components deal with different aspects of the utterances in a discourse. Grosz and Sidner have particularly focusedon the intentional and the attentional aspectsof discourse. The intentional structure is a hierarchical structure which describesrelations between the purpose of the discourseand the purpose of discourse segments.These purposes(such as "fntend that a particular agent perform a particular task", or "fntend that a particular agent believe a particular fact.") are linked by relations of dominance (betweena goal and a sub-
goal) or ordering (between two goals which must be achieved in a specificorder). The attentional state is an abstraction of the participants' focus of attention as their discourse unfolds. The attentional state is a property of discourse,not of discourseparticipants. It is inherently dynamic, recording the object, properties, and relations that are salient at each point in the discourse.The attentional state is represented by a stack of focus spaces. Changesin attentional state are modeledby a set of transition rules that specify the conditions for adding and deleting spaces. A focus space is associatedwith each discoursesegment; this spacecontains those entities that are salient-either because they have been mentioned explicitly in the segment or becausethey became salient in the processof producing or comprehending the utterances in the segment (as in Grosz' original work on focusing (18)). The focus spacealso includes the discourse segment purpose; this reflects the fact that the discourseparticipants are focusednot only on what they are talking about but also on why they are talking about it. Discourse Structures Theory provides a unified account of both the intentional and attentional dimensions of discourse understanding and makes explicit important links between the two. The Dynamic DiscourseModel, on the other hand, is more limited in its scope.It provides an account of the discourse segmentation processon an utterance by utterance basis and is thus a more developedtheory of the strictly linguistic aspectsof the discourseunderstanding process. The Dynamic Discourse Model. The bynamic Discourse Model (DDM) (5,7, and 9) is a formal theory of discoursesyntactic and semantic structure which account for how a semantic and pragrnatic interpretation of a discoursemay be incrementally built up from its constituent clauses. The DDM is presented as a discourseparser. The parser segments the discourseinto linguistically and socially relevant units on a clause by clause basis by proceedingthrough the discourse,examining the syntactic encoding fo; of each clause,its propositionalcontent, and its situation of utterance. The Model consistsof a set of recursive rules of discourse formation which specifieshow units may be built up of smaller units, and a set of semantic interpretation rules which assigns a semantic and pragmatic interpretation to each clause, to each discourseunit, and to the discourseas a whole. Each discourseis viewed as composedof discourseunits which can be of many different types: jokes, stories, plans, question/answer sequences,lists, narratives (temporafiy ordered lists), and SpeechEvents (socially situated occasionsof talk such as doctor/patient interactions and everyday conversations; see SpeechEvents, below). In the DDM every discourse unit type is associatedwith its own grammar which specifiesits characteristic constituent structure and is interpreted accordittg to specificrules of semantic interpretation. The basic unit of discourseformation is the discourseconstituent unit (dcu). For the purpose of joini.g with other clausesto createa complexdiscourse,each.1u,5. is considered an elementary dcu. Dcus are of three types: list structures (including narratives, which are sequentially ordered lists of events), expansion structures, in which one unit gives more detail of some sort about some aspectof a precedi"g unit, and binary structures such as "if lthei" , "and'i ,,or,, ,,,but"-rela-
238
DISCOURSEUNDERSTANDING
tions in which there is a logical connective connecting the constituents. Discourse Units (DUs) such as stories, descriptions, argu_ ments, and plans are composedof dcus which encodethe propositions which taken together, and properly interpreted, communicate elaborate semantic structures. Dcus and DUs in their turn, are the means of realization of the information exchangewhich is so basic in SpeechEvents, which are constituents of Interactions. The DDM provides an account of the coherencerelations in texts by means of an explicit mechanism for computing the semantic congruenceand structural appropriatenessof strings of clauses(5 and 9). Simultaneously, it provides an account of the complexities of interrupted or highly attenuated discourse by providing a uniform treatment of all phenomenawhich can interrupt the completion of an ongoing DU: elaborations on a point just made, digressions to discuss something else, interruptions of one SpeechEvent by another or one ongoing Interaction by another. All of these phenomenaare treated as subordinated or embedded relative to activities which continue the development of an ongoing unit-whether it be a list of some sort, a story, or a SpeechEvent or Interaction. The structure which results from the recursive embedding and sequencingof discourseunits with respect to one another has the form of a tree. This Discourse History Parse Tree contains, at any moment in the discourse,a record of which units of what types have been completed, and which, having been interrupted before completion, remain to be completed. To determine at which level of the DiscourseParse Tree an incoming clause is to be added as a subordinated or coordinated constituent, first of all a logical expressionrepresenting the meaning of the clauseis constructed(note that this expression may still contain semantically undetermined, anaphoric elements.)On the basis of this expression,it can be computed whether the preconditions for attachment at any given level are fulfilled. Attachment at the lowest level is tried first: first of all, the system investigates the plausibility of a meaningful subordination or coordination relation between the incoming clause and the previous clause;then, relations at successively higher levels in the tree are considered.If no meaningful relation can be established at any level, the incoming clause is attached at the lowest level as a semantically unrelated interruption If PUSH- or POP-markers occur, the discourse-parsing processtakes them into account in the appropriate way. Interruptions are accommodatedin the tree as discourse embeddingsin a way not dissimilar to their treatment in the Discourse Structures Theory. However, in order to accommodate the fact that what may be an interruption to one participant-or from the point of view of one Interaction-may be the ongoing discoursefrom another perspective;each participant in a discourseis associatedwith a unique DiscourseParse Tree representing the individual's incremental analysis of the discourse.The degreeto which participants'trees are identical determines their ability to understand each other's references to underdetermined elements in the discoursesuch as pronomials, deictics or definite noun phrases. The structural aspects of the DDM just discussedare related to the enterprise of developing an adequate discourse semantics-one which would allow the meaning of a discourse to be built up on a left to right basis along with the structural analysis of the discourse.Developing such a compositional semantics for discoursepresupposesadequateways of representittg the semantics of both sentencesand discourse,as well as
effective ways of dealittg with the context dependenceof utterance meanings. The Meaningsof the Text Truth Conditionsfor Sentenceand Text. Semantic studies in philosophic logic have focusedon one important aspect of the meaning of indicative sentences:the truth conditions of the sentence,i.e., a characterization of what must be the case in the world for the sentenceto be seenas true rather than false. The truth conditions of a sentencecan be mathematically described as a function from states of affairs to truth values. Logical languages, such as First-Order Predicate Calculus or Intensional Logic, provide formulas for expressing such functions (In an extensional logic, states of affairs are represented by "models" of the logical language; in an intensional logic, they are represented by elementary entities, called "possible worlds".) This logical perspective on sentencemeaning has had considerable influence in linguistics and AI. Many theories and systemsaccount for the way in which the truth conditions of a sentencedepend on its surface form, by providing a definition or procedure which translates a sentenceinto a formula of a logical language. The same paradigm can be applied to texts consisting of more than one sentence,since a report or description may also be said to be "understood" (though in a limited sense) by someone who knows what state of affairs in the world would make it "true". Carrying over the logical perspectiveon meaning from the sentencelevel to the text level raises the question how to build up a logical representation for the truth conditions of a text out of the logical representations of the truth conditions of its constituent utterances. To do this, a text-understanding program must be able to recogrrrzethe structure of a text, and to apply semantic operations which build meanings at the levels above the sentence.It must also deal correctly with the sentence-level text constituents: Instead of analyzing the meaning of isolated, independent sentences,it must determine the meaning of particular utterances of sentences,taking into account the context which has been set up by the previous discourse. Processingan individual utterance in a discoursethus entails three distinct operations: determining the utterance meaning in the applicable context; integrating the utterance meaning with the meaning of the text as processedso far; and updating the context setting which will be used to interpret the next utterance. The context-dependence of utterance interpretation is shown by several difficult phenomena.For instance,temporal, locative, or conditional interpretive frameworks may be introduced in the first sentenceof a discourse segment, and have scopeover all other constituents of that segment. The referencetime in a narrative moveson as the narrative proceeds(7, 19, and 20). Anaphoric expressionsmay refer from a subordinate constituent to entities introduced by its superordinate constituent, or from a constituent of a coordinateparagraph to certain entities introduced by an earlier constituent of that same paragraph.
UNDERSTANDING 239 DISCOURSE for Logical Formalisms.Context-dependence. The perspectivejust sketched has been pushed furthest in a Consequences The context-dependenceof utterance-meanings in discourse formalism devised by Hans Kamp (19). The formulas used in can be dealt with by translating a sentencenot directly into a this formalism are called Discourse Representation Structures proposition, but into a function from contexts to propositions, (DRSs). They serve the role of logical formulas, representing where by "context" one means a data structure that contains the meaning of the text so far, as well as the role of contexts all the relevant information that may influence sentencein- which set up the right reference times and anaphoric reference terpretation: speaker, addressee,speechtime, speechlocation, candidates for the interpretation ofnext utterances. DRSs differ from ordinary logical formulas in the way varireference time, candidates for anaphoric reference,topic, etc. Formally, contexts are very similar to indices as employed in ables are used (seeSemantic networks). A DRS is defined to be Montague's systems (21,22). The meaning of a particular ut- true if it is embeddable in a model which correspondsto the terance of a sentence is then constructed by evaluating the actual world. Embeddability of DRSs is recursively defined on the structure of the formulas. sentencemeaning with respect to the proper context. An alternative approach to the problem ofdiscourse anaphIn processingan utterance, a discourse-understandingsysora is described by Webber, where the representation of sentem must therefore determine what its proper context is-and also how this utterance may create a new context, or modify tence meanings is separated from the representation of existing ones,for the interpretation of subsequentutterances. "evoked entities" (25). Polanyi and Scha (7) propose to use Woods' (23) Augmented Transition Network (qv) formalism to formulate a recursive Background Knowledge and Plausible Inferences.Underdefinition of discourse constituent structure which is coupled standing a text involves much more than understanding the with semantic rules that build up meaning representationsfor literal meanings ofits constituent utterances, and their explicdiscourse constituent units; the register mechanism of the itly stated relations. The messageof a text is rarely completely ATNs is used to keep track of the correct contexts in this explicit: the author relies on the fact that the hearer/reader process(see Grammar, augmented-transition-network). will integrate the meanings of the utterances with an independently given set of background assumptions about the domain and about the author. All implications which follow in a simDiscourseAnaphora. Beyond adopting a Montague-style ple and direct way from the combination of the explicit uttercontext mechanism, someother departures from standard logi- ances and the presupposedbackground knowledge are considcal practice may be necessaryto build up meaning representa- ered to be implicit in the text. tions for texts from meaning representations for sentences. For a system to be capable of discourse understanding in Observationson anaphoric referencein discoursehave mo- this more extended sense,its mechanisms must be augmented tivated someproposalsfor significant innovations in represen- with a representation ofthe required background knowledge, tational formalisms, especially concerning the representation and with a system that performs inferences (qv) on the basis of of the denotation of indefinite noun phrases. Several authors explicit text meanings and background knowledge, generating (including Karttunen (24)) have argued that indefinite noun representations of information that was implicit in the text. phrases should be translated into "indefinite entities" of some Different kinds of background information play a role. Ideally, sort, as opposedto existential quantifiers. For instance, a discourse understanding system should have a rather rich, encyclopedic knowledge base, or at least, a knowledge base "John loves a woman." comparable to the user's for the pertinent domain; and it would not be represented as should have particularly good coverage in knowledge which people consider "common sense". How to model common sense Ex: Woman (x) and Love (J, x) domains has therefore becomea research area in itself (26,22) but rather as (see Reasoning, Commonsense). An important set of background assumptions which has Woman (u) and Love (J, u) received a lot of attention concerns the characters in stories: where u is a Skolem-constant-a constant whosedenotation is unless told otherwise, story-recipients must assume the charundetermined, therefore behaving, for all practical purposes, acters to be "normal", rational, purposeful people, and they like a variable which is implicitly existentially quantified. must bring these assumptions to bear on the text in order to Leaving the existential quantifier implicit has an advantage make senseof it. Various systems have been built which emwhen one deals with discourseanaphora. body some knowledge of this sort and bring it to bear on the discourse-understanding process. "John loves a woman. Her name is Mary." SgU (qv) (18,28, and 29), for instance, is a system for uncan be treated simply by conjoining the formula for "Her name derstanding nanatives, which is based on the notion of a is Mary." with the one for "John lovesa womail.", while resolv- script (qv). A script is a knowledge structure which represents ing the pronoun "her" to corefer with the constant for "a a stereotypical sequenceofevents, such as taking a bus, going woman": to a movie theatre, or going to a restaurant for dinner. SAM's representation of a script consists of set of simple actions de(Woman (u) and Love (J, u)) and name (u) _ "Mary". scribed as conceptual dependency structures, together with This proceduredoesnot work if indefinite noun phrasesare the causal connections between those actions. The actions in a represented by existential quantifiers: script are further organized into a sequenceofscenes,which in the caseofthe restaurant script includes entering the restau(Ex: Woman (x) and Love (J, x)) and name (x) - "Mary" rant, ordering food, eating, paying, and leaving. Each script is infelicitous becausea variable is used outside the scopeof its also has a set ofroles and props characterizing the people and defining occurrence. objects that are expected to appear in the sequenceofevents.
240
DISCOURSE UNDERSTANDING
In processing a narrative about eating in a restaurant, SAM first has to recognize that the restaurant script is the relevant context for interpreting the narrative. Once the script is chosen,SAM will try to interpret each new sentence as part of that script. It doesthis by matching the conceptual representation of the new sentence against the actions represented in the script. When it finds a match, it incorporatesthe sentencemeaning into its representation of the narrative. It also fills in the script actions preceding the one matched. By this process,SAM infers actions that are implicit in the narrative it is reading. Thus, when it reads the narrative: John went to the Fisherman's Grotto for dinner. He ordered lobster. The bill was outrageous. it includes in its representation that John actually ate his lobster, that he received a large bill, and that he paid it. A later system, FRUMP (qt) (30,31), pushes the idea of expectation-driven understanding a little further and dispenseswith script-independentmeaning representationsaltogether; it parses its input text directly into script-slots, and anything which does not fit is ignored. (FRUMP is presented as a model of human text skimming.) IPP (62,72)in its turn modifies the FRUMP approach by mixing script-based text skimming with a somewhat more careful semantic analysis of selectedparts of the text. Its meaning representationscontain not only scripts with filled-in slots, but also representationsof ttunexpectedevents". In a realistic application of the script approach,the scripts to be invoked must be selectedfrom thousands of candidates; SAM chosefrom only three or four candidates. Furthermore, one will have to drop SAM's assumption that each script contains one event that is always explicitly mentioned in the text in order to invoke the script. The task of findittg which of the many candidate scripts matches the input sequencebest, thus presentscomputational problems which deservefurther study. The idea of a script is usually associatedwith the description of predefined sequencesof events which constitute the "building blocks" of everyday life. Almost by definition, scripts are not sufficient to understand interesting stories. Real stories tend to involve somewhat more complex plots, arising from conflicts between the perceptions,ideas and goals of the different characters. A program that interprets its input reports in terms of the goals and subgoalsof the protagonist, is PAM (qv) (Plan Applier Mechanism), designedby Wilensky (32). Later work derives plot structure from "interacting plans", that is, plans involving two or more participants in cooperative or competitive interaction. Such plans differ from single participant plans in several ways (33); the most significant being that they are produced, interpreted and executed in a betief context,i.e., what participants belieueabout the interaction is significant, rather than any putative objective account of the events. Thus, for example, in order for a system to make senseof a children's story such as "Hansel and Gretel", it must monitor the evolution of the childrerl's, the parent's, and the witch's beliefs about events as well as the events themselves (34). When the parents tell the children that the family is going to "fetch wood", the system must note that the actions the parents subsequently take are designed to be interpretable by the children as simple wood fetching, but are simultaneously effecting the abandonment of Hansel and Gretel. Moreover, it
must be able to computeembeddedbeliefs, e.g.,the parents do not know that Hansel ha.soverheard their plan and hencethat he believesthat they intend him to believe the actions contribute to wood fetchitg, but in fact are intended to lead to his and Gretel's death. Central to this belief monitoring is the computation of mutual belief (L7,34,and gb), i.e., thosebeliefsfully shared and known to be shared among the participants (see Belief systems). Mechanisms for interacting plans calculations have been outlined in somedetail (34),but not fully implementedin any current systems.Analyses in terms of interacting plans have proved useful in studies of conversations(33),classroominteractions, skits (36), and written stories (34,37, and 38). SummarizingStories.tlnderstanding a story as a communicative object requires more than dealing with its explicit content and the associatedplausible inferences.When someone tells a story, not all the information reported is equally important. Truly understanding the story would mean, among other things, being able to seethe distinctions betweenmore important and less important information. Evidence of this kind of understanding would be a system'scapability to generate adequate summaries of input texts. Many approachesto the story summanzatton problem have been proposed.Four of them are discussedbelow; they are based,respectively,on surfacetext phenomena,on plot structure, on affective dynamics, and on the author-reader relationship. The first approach implements the ideas formulated by Polanyi concerning the way in which human storytellers encode their information. She maintains that people explicitly mark the relative salienceof different piecesof information in a text; they make sure that an important pieceof information "stands out" against the surrounding information. They do this by means of various evaluative devices:meta-comments,explicit markers, repetition, and the use of encodingforms which deviate from the "local norm" in the text (long vs. short sentences; direct discoursevs. narrated events; colloquial vs. formal register, etc.) (39). Based on these ideas, & system was developedthat simply counts the number of evaluative devicesusedto highlight each proposition in a story, and then puts the most highly evaluated states and the most highly evaluated events together in a summary of the input story. The system thus managesto construct a reasonable summary on the basis of the surface appearance of the story, without understanding it in any sense; it showsthat one must be careful in ascribing "understanding" capabilities to a system which performs a specifictask. The relevant work on plot structure originates with Propp (40) and Rumelhart (41).L,ehnett(42-45) developeda summarizatton algorithm based on the causal relations between the events and states reported in a story. By inspecting the network of causal connections, it concludesthat certain events play a crucial role in the development of the narrative, by moving the plot from one place to an essentially different place. Closely related to Lehnert's work is Dyer's (46,47)system, called BORIS (qr), which attempts "in-depth understanding" of narratives. Such understanding should include being able to summarize the point or moral that the author intended the narrative to represent. This work moves beyond earlier work on plan-basedunderstanditg, such as Wilensky's (32), by abstracting the communicative intent.
UNDERSTANDING DISCOURSE
241
application to verbal problems of general abilities for interpreting the everyday world (seeMorgan (54) for fuller discussion). Peopletend to interpret the behavior of other humans in terms of the situation and the actor's intention and beliefs. Much of what has been discussed under the rubric "pragmatics" is most reasonably seen as the interpretation of linguistic behavior in similar terms. The pragmatic perspective on language has three important implications for discourse understanding research. The first is that the meaning of a linguistic messageis only partly represented by its content; its meaning for a hearer also dependson the hearer's construal of the purposethat the speaker had for producing it. The second is that the attribution of intentions to a speaker must be an integral componentof the listener's comprehensionprocess.The third is that a theory of language comprehension should determine the extent to which the same strategies people use to arrive at satisfactory explanations of the physical behavior of others can be employed in their comprehensionof speechacts. The way the meaning of a messageis shaped by its producer's goals and beliefs is most obvious in a case such as propaganda, but it is no less critical for apparently straight forward utterances. For example, a colleague at the office might say, "I brought two egg salad sandwichestoday." Although the referential meaning of this statement might be simple to compute, its full meaning dependson whether the speaker'sintention was, for example, to offer one of the sandwiches, to decline a luncheon invitation, or to explain why the office smelled bad. Whatever the speaker'sgoals,the meaning conveyedby the statement dependson the hearer's correctly inferring what they are (55). Thus, understanding discourserequires inferring the intentions and beliefs that led the speaker to produce the observed Plan Recognition behavior. But as Grice (56) points out, simply recognizing an The PragmaticPerspectiveon Discourse.Langu dge, espe- actor's plan, as an unseen observer might do (seeRefs. 32 and cially written language, is often viewed as a codefor packag- 57),is insufficient as a basis for communication. Instead, hearing and transmitting information from one individual to an- ers should attribute to speakers intentions that the speakers other. Under this view, a linguistic message is fully intend for them to infer. To ensure successfulcommunication, representedby the words and sentencesit comprises;texts are speakersattempt to maximize the likelihood that hearers will thus objectsthat can be studied in isolation. By taking such a make the inferencesthey were supposedto make by relying on stance, one is led naturally, for instance, to regard words as what Lewis (58) terms "conventions". Conventions are solureferring back to other words. Conceptslike coherence,rele- tions to coordination problems-where any participant's vance, and topic are then regarded as properties of texts, lead- actions depend on the actions of others-and themselvesrely ing researchersto confine their search for these properties to on "mutual knowledge" held amongst the parties involved. words and sentences. Mutual knowledge (see also Ref. 59) occurs when two people A contrasting view, proposedby Strawson (50),Austin (b1), know that a proposition P holds, that the other person knows Searle (52), and others is that speakersor writers use words to as well that P holds, that the second knows that the first do things, for instance to refer to things, or to get a hearer or knows that P holds, and so on. In ordinary conversation,parreader to believe or do something. They are produced by a ticipants make assumptions about mutual knowledge, sigRal person, who is attempting to use them to produce certain ef- their assumptionsthrough the pragmatic presuppositions(60) fects on an audience (perhaps an imagined audience).Accord- of their utterances, and negotiate misunderstanding of the ing to this view, utterances are tools used in social interaction developing mutual knowledge. and should be studied in that light. Morgan and Sellner (bB) suggest that properties like coherence,relevance, and text SpeechActs. From a pragmatic perspective,the goal of disstructure are likely to be obtained from a theory of plans and courseunderstanding should not be to merely assessthe truth goals appropriately extended to linguistic actions. Properties conditions of one's interlocutor's utterances. Instead, one like "relevance" would be epiphenomenal byproducts of the should be concerned with the goal which is being pursued appropriate structuring of actions. through these utterances, and with the way in which every Pragmatics is the study of communication as it is situated utterance contributes to that goal. From this perspective, relative to a particular set of communication demands,speak- every language utterance is viewed as a social act: it chang€s, ers, hearers, times, places,joint surroundings, linguistic con- be it perhaps on a small scale,the social relation between the ventions, and cultural practices. Includitrg language in a the- speaker and his interlocutor. A simple assertion puts me unory of action, this suggests that "pragmatics" is just the der the obligation to defendit if challenged.A questioncreates BORIS embodies thematic patterns, called Thematic Abstraction Units (TAUs). For example, TAU-DIRE-STRAITS encodesthe pattern: x has a crisis goal; x can't resolve the crisis alone; x seeks a friend y to help out. TAUs arise from errors in planning or plan execution.They refer to a plan used, its intended effect, why it failed and what can be done about the failure. As such, they allow BORIS to organuzethe narratives at an intentional level, which leads naturally to an appropriate summarrzation or even drawing of a moral. A contrasting approach is that of Brewer and Lichtenstein (48,49). They argue that stories are a subclassof narratives whose purpose is to entertain. Thus, plan-basedanalysesultimately miss the point of a story if they are not augmented by an affective component, one that shows how structural elements of the text influence the reader. For example, suspense is created when the author reveals that a negative outcomeis in store for a central character and that the character is unaware of his or her fate. Thus, relations among the author's, the reader's, and the characters' belief states becomeessential to understanding, or being affected by, the story. In the line of the Brewer and Lichtenstein approach,Bruce (38) outlines a central model of the author-reader relationship. The model makes explicit not only the author and the reader as participants in the communicative act, but also a constellation of other implied participants. For instance, in an ironic text, the author establishesan apparent speakerwhose beliefs and intentions conflict in some respectswith the author's. It is noteworthy that to date attempts such as those of Brewer, Lichtenstein and Bruce have been purely theoretical; no working system addressesthe interactions of author's and reader's goals at that level.
242
DISCOURSE UNDERSTANDING
for my interlocutor the obligation to answer it, or to be prepared to justify his lack of an inclination to do so. And vows, promises, and threats clearly extend beyond the micro-sociology of the interactional situation, creating commitments in the social world at large. The social acts performed by means of linguistic utterances are called SpeechActs (qv) (s2). The SpeechActs types which play a role in current experimental dialogue systems are as follows: requests, typically formulated as questions of the form "Could you do X?"; commands,directly expressedas imperative sentences("Do X".) (Notice that for most programs, which slavishly try to satisfy every whim of their human dialogue partner, there is no distinction between a request and a command. The program takes no responsibility for its actions); assertions, directly expressedas indicative sentences(Assertions are usually interpreted as commandsto store and/ or evaluate the assertedinformation); and questions,directly expressedas interrogative sentences.(A question is usually interpreted as a commandto provide the answer.) plan Recognition. If a system analyses its input utterances as speechacts and has at it, di.po."i" repertoire ofplausible goals that its dialogue partner may pursue, it may be able to inderstand the purpose behind its input utterances by using a method which is reminiscent of the way in which a ryst"* like PAM (32) understands reports about goal-oriented blhavior: it tries to guess the more encompassing goal that the speaker may be trying to accomplish by executing a plan which has the surface speechact as one of its constituent actions. A system that tries to derive the deeper intentions behind surface speechacts in exactly this way was developedby Allen (35). His system exploits knowledge about what constitutes a rational plan, as well as beliefs about what goals the speaker is likely to have. Allen specifiesthe plan inference processas a set of inference rules and a control strategy. Rules are all of the form "If agent S believes agent A has a goal X, then agent S may infer that agent A has a goal Y." Examples of such rules are If S believes A has a goal of executing action ACT, and ACT has an effect E, then S may believe that A has a goal of achieving E; and If S believesA has a goal of knowing whether a proposition P is true, then S may believe that A has a goal of achieving P. Of course, given the conditions in the secondrule, S might alternatively infer that A has a goal of achieving not P; this is treated as a separate rule. Which of these rules applies in a given setting is determined by control heuristics (qv), as follows. The plan-inference process can be viewed as a search through a set of partial plans. Each partial plan consistsof two parts: one part is constructed using the plan inference rules from the observed action, and the other is constructed using the plan construction rules on a expectedgoal. When mutually exclusive rules can be applied to one of these partial plans, the plan is copied and one rule is applied in each copy. Each of
these partial plans is then rated as to how probable it is to be the correct plan. The highest rated partial plan is always selected for further expansion using the inference rules. The rating is determined using a set of heuristics that fall into two classes:those that evaluate how well-formed the plan is in the given context and those that evaluate how well the plan fits the expectations.An example of a heuristic is Decreasethe rating of a partial plan if it contains a goal that is already true in the present context. Allen argues that whenever the intended plan can be derived from mutual knowledge, i.e., from knowledge which is knowingly shared between speaker and hearer, the hearer is assumedto perceivethe intended plan, and is expectedto react to that plan, rather than to the surface speechact. The paradigm examplesof such situations are known as indirect speech acts (61); sentenceslike: "Can you pass the salt?"
"Is the salt near you?" uttered at the dinner table where the simple answer "Yes," without an accompanying action would be experienced as a joke or an insult' The idea also applies, however, to casesthat are normally not classified as indirect speechacts. For instance, when at the information counter of a train station someoneasks: ,.Doesthe 4:20 train go to Toronto?" the answer "No" is less helpful than the answer "No, but the 5:10 train does." which responds to the speaker's perceived goal of going to Toronto. Allen's plan-recognition paradigm has been developedin work by Sidner (13,62,and 63). Pollack (64) has refined it to deal with situations where speaker and hearer have conflicting ideas about how certain goals may be achieved. Litman (65,66)has introduced meta-plans which allow for clarification subdialogues and plan corrections; she also integrates an awareness of the surface structure of discourse,as discussed above,into to the plan-recognitionprocess. SpeechEvents.An "unframed" interaction between "uninterpreted" people is a rare event. People use a refined system of subcategorization to classify the social situations they engage in. These subcategories,called Speech Event types (67,68)often assign a specificpurposeto the interaction, specify roles for the participants, constrain discourse topics and conversational registers, and, in many cases,specifya conventional sequenceof component activities. An awarenessof what kind of SpeechEvent one is engaged in, thus helps the plan-recognition process:the overall goals of the interaction, and often the steps to achieve them, are shared knowledge among the participants. The most precisely circumscribed kinds of SpeechEvents are formal rituals. Speech-Eventtypes characterizedby grammars which are less explicit and less detailed include service encounters (16), doctor-patient interactions (69), and casual conversations. Schegloff (70) has shown that the process of
UNDERSTANDING 243 DISCOURSE terminating a telephone conversation is a jointly constructed ending sequenceunit with a predictable course of development. The structure of tatk which is exchanged in order to perform a task may follow the structure of some goal/subgoal analysis of this task (35). In SpeechEvent types which involve a more or less fixed goal, this often leads to a fixed grammar of subsequentsteps taken to attain it. For instance, as described by Polanyi and Scha (7) transcripts of the activities in Dutch butcher shops consistently display the following sequential structure in the interaction between the butcher and a customer: 1. It is established that it is this customer'sturn. 2. The first desired item is ordered, and the order is dealt with, . . , the nth desired item is ordered and the order is dealt with. 3. It is established that the sequenceof orders is finished. 4. The bill is processed. 5. The interaction is concluded. Each of these steps is filled in in a large variety of wayseither of the parties may take the initiative at each step, question/answer sequencesabout the available meat, the right way to prepare it, or the exact wishes of the customer may all be embeddedin the stage 2 steps,and clarification dialogs of various sorts may occur. An important Speech-Event type with characteristics slightly different from the types mentioned so far, is the casual conversation. In a casual conversation, all participants have the same role: to be "equals"; no purposesare preestablished; and the range of possible topics is open-ended,although conventionally constrained. Dialogue Systems,Many dialogue systems have been designed to partake in specific types of speechevents, in which the computer system and its human interlocutor each play a well-defined role. The assumption that every dialogue must fall within the patterns allowed by the speech event type makes it possible to resolve ambiguities in its input (anaphora, ellipsis) and to react to the intentions behind it, also when these are not explicitly stated. Most systems of this sort play the role of the "professional" in a consultation interaction of somesort, e.9., a system that teaches an assembly task (I2) an information system at a train station (35) a travel budget manager (75) Such speechevent types involve the participants cooperating towards a common goal. In doing this, they decomposethe common task into subtasks, and, eventually, into elementary subtasks that can be executed by one or both of the participants without requiring further dialogue. For instance, as discussed above in Recent Directions in Modeling Discourse Structure, Grosz'soriginal investigation of dialoguesbetween a human instructor and an apprentice who was being told how to repair an air compressor,showed that the structure of such dialogues correspondsclosely to the structure of the task
(r2).
One should notice, however, that the description of the task
structure doesnot predict one fixed tree structure (12). A task may involve subtasks that must all be done,but can be done in any order. It is not difficult to imagine further complexities: alternatives, preconditions, etc. When a task doesspecify one fixed sequenceof subtasks,the task structure degeneratesinto a script (seeBackground Knowledge and Plausible Inferences above). Modesof Natural Language One tends to think of language in two forms: oral and written. Thus, AI researchon DiscourseUnderstanding is conveniently divided between research on understanding text and research on participating in interactive dialogues, which, although most often written rather than spoken,are thought of as analogous to oral conversations. That this division is inadequate and at times misleading, is shown by Rubin (71), who postulates eight dimensions of variation among "language experiences." The eight dimensions-(/) oral vs. written modality, (2) interactiveness, (3) spatial commonality, @) temporal commonality, (5) possibility of para-linguistic communication,(6) concretenessof referents, (7) audiencespecificity, and (8) separability of participants-define a range of communication modalities out of which AI research has focusedon only a few, albeit significant ones. From the perspective of this dimensional analysis, the research directed at the implementation of interactive computer programs that display reasonable behavior in conducting a dialogue with a person amounts to the development of a new mode of natural language, rather than the analysis of an existing one: real-time alphanumeric interaction, usually without shared awarenessof physical context. Most AI research (notable exceptions being speech-understanding (qv) work and someefforts at modeling real conversations (6,9-11, and 72))has focusedon written language,and is thus clustered on one pole of Rubin's first dimension. What distinguishes the AI dialogue work from the AI text work then is that the former is interactive, and usually implies spatial and temporal commonality. On the other hand, neither of the two modesof language use includes para-linguistic communication, such as gestures, facial expressions,or body position cues. In some of the dialogue work, but not the text work, there are concretereferents, in the sensethat objectsare perceptually present to the user and the machine. The same holds for audience specificity; some of the dialogue work assumes fairly detailed speaker models of the hearer. Neither of the modalities typically allows separability of participants. Indeed, most of the communication is one to one. Other AI research has focusedon text understanding, usually assuming a nonspecific audience. (In contrast, note the many existing forms of text understanding, such as dealing with letters, memos,persuasive essays,etc., which do assume specific audience beliefs and plans). Somestudies (17,73,and 74) have been devotedto the linguistic consequencesof the use of different communication media. Cohen (I7), for example, used a plan-basedmodel of communication to analyzedialoguesin five modalities: face-toface, telephone, linked CRTs, (noninteractive) audio tape, and (noninteractive) written text. He found that speakers in the face-to-facesituation, for example, attempted to achieve more detailed goals in giving instructions than did users of keyboards.More specifically, requests that the hearer identify the
244
DISCOURSE UNDERSTANDING
referent of a noun-phrase-dominatedspoken instruction giving discourse,but were rare in the keyboard dialogues. Thesestudies suggestthat it is important to understand the constraints of the communication system as well as the texts per se when an AI system is being designed.Moreover, they imply a need for caution in interpreting results of AI research. Any form of langu age use is valid to examine and can be illuminating in a general way, but specificsof language processingmust be interpreted in light of the communication modality in which they arise. BIBLIOGRAPHY 1. M. Halliday and R. Hasan, Cohesionin English,Longman's, London, English Language Series,Title No. 9. t977. CognitiueScience3(L), 2. J. R. Hobbs,Coherenceand Co-references," 67 -82 (1979). 3. Hobbs,J.R. On the Coherenceand Structure of Discourse.Technical Report No. CSLI-85-37,Center for the Study of Language and Information, Stanford, CA, October, 1985. 4. W. C. Mann and S. A. Thompson,Relational Propositionsin Discotrse, Technical Report RR-83-115,Information SciencesInstitute, Marina del Rey, CA, November, 1983. 5. L. Polanyi, "A Theory of Discourse Structure and DiscourseCoherence,"in 21st Regional Meeting of the ChicagoLinguistic Society, pp.306-322, Chicago Linguistic Society,University of Chicago,April, L985. 6. R. Reichman,Plain-speaking: A theory and grammar of spontaneous discourse,PhD thesis, Department of Computer Science,Harvard University, 1981, also, BBN Report No. 4681, Bolt Beranek and Newman Inc., Cambridge,MA. 7. L. Polanyi and R. Scha, "A Syntactic Approach to DiscourseSemantics," in Proceedings of Int'L. Conferenceon Computational Linguistics, pp. 4I3-4L9, Stanford University, Stanford, CA, 1984. 8. B. J. Groszand C. L. Sidner,"Attention, Intentions, and the Structure of Discours€," Computational Linguistics 12(3), L75-204 (1986). 9. E. Hinrichs and L. Polanyi, "Pointing the way: a unified accountof referential gesture in interactive discours€,"rn Papers from the Parasession on Pragmatics and Grammatical Theory, pp. 2983I4, Chicago Linguistics Society, Chicago, 1986. 10. J. R. Hobbs and D. Evans, "Conversationas planned behavior," CognitiueScience4(4), 349-377 (1980). 11. J. R. Hobbs and M. H. Agar, "The Coherenceof Incoherent Discourse," Language and Social Psychology,4(3 and 4), 213-23L (1e85). L2. Barbara Grosz, [Deutsch], "The Structure of Task Oriented Dialogs," in IEEE Symposium on SpeechRecognition: Contributed Papers, pp. 250-253, IEEE, Carnegie Mellon University Computer ScienceDept., Pittsburgh, PA, L974, (reprinted in L. Polanyi (ed.), Th.eStructure of Discourse,Vol. in the Aduancesin DiscourseProcessingSeries,Ablex, Norwood, NJ, 1986.) 13. C. L. Sidner, "What the Speaker Means: The Recognition of Speakers'Plans in Discourse,"International Jourrtal of Computers and Mathematics, Special Issue in Computational Linguistics 9(1),7t_82 (1983).
17. P. R. Cohen, "The Pragmetics of Referring and the Modality of Communication,"ComputationalLinguistics 10, 97-146, 1984. 18. R. E. Cullingford, "SAM," in R. C. Schank and C. K. Riesbeckeds, Inside Computer Understanding: Fiue Programs PIus Miniatures, pp. 75-119, Erlbaum, Hillsdale, NJ, 1981. 19. H. Kamp, "Events, Instants and Temporal Reference,"in U. Egli and A. van Stechoweds,Semanticsfrom a Multiple Point of View, pages 376-471, de Gruyter, Berlin, 1979. 20. E. Hinrichs, "Temporal Anaphora in Discourseof English," Linguistics and Philosophyg(L),63-82 (1986). 2L. R. Montague, "Pragmatics," in R. Klibansky ed, Contemporary Philosophy: A Suruey, pp. t02-L22, La Nuova Italia Editrice, Florence,Italy, 1968. 22. M. Bennett, "Demonstratives and Indexicals in Montague Grammar," Synthese39, 1-80 (1978). 23. W. A. Woods, "Transition Network Grammars for Natural Language Analysrs," CACM 13(10),591-606 (October,1970). 24. L. Karttunen, "Discourse Referents," in J. McCawl.y, ed, Syntax and Semantics, Vol. 7, AcademicPress, New York, I976. 25. B. L. Webber,"So What Can We Talk About Now?," in M. Brady ed, Computational Approaches to Discourse, MIT Press, Cambridge, MA, L982. 26. E. Charniak, "A Framed PAINTING: The Representation of a CommonsenseKnowledge Fragme\t," Cognitiue Science 1(4), 355-394 (L977). 27. J. R. Hobbs and R. C. Moore,Formal Theoriesof the Commonsense World, Ablex, Norwood,NJ, 1985. 28. R. E. Cullingford, "Script Application: Computer Understanding of Newspaper Stories," 1978, unpublished doctoral dissertation, Yale University. 29. R. C. Schank and R. Abelson, Scripts,plans, goals, and under' standing, Lawrence Erlbaum Associates,Hillsdale, N. J. , L977. 30. G. F. DeJong,"Prediction and Substantiation:A New Approachto Natural Language Processitg," Cognitiue Science 3, 25L-273 (1979). 31. G. F. DeJong,"skimming Storiesin Real Time: An Experiment in Integrated Understanding," unpublished doctoral dissertation, Yale University, New Haven, CT, 1979. 32. R. Wilensky, "PAM," in R. C. Schank and C. K. Riesbeckeds, Inside Computer Understanding: Fiue Programs Plus Miniatures, pp. 136-t79, Erlbaum, Hillsdale, NJ, 1981. 33. B. C. Bruce, Robot plans and human plans: Implications for models of communicatior," in I. and M. Gopnick eds,From models to rnodules: Studies in cognitiue sciencesfrom the McGill workshops,pp. 97-LL4, Ablex, Norwood,NJ, 1986. 34. B. C. Bruce and D. Newman, "Interacting plans," Cognitiue Science2, 195-233 (1978). 35. J. F. Allen, A Plan-based Approach to SpeechAct Recognition, Technical Report 131, Department of Computer Science,University of Toronto, Toronto, Canada,Janu&rY, L979. 36. D. Newman and B. C. Bruce, "Interpretation and manipulation in human plans," DiscourseProcesses9, L67-195 (1986). 37. B. C. Bruce, "Analysis of interacting plans as a guide to the understanding of story structure," Poetics9, 295-311, 1980. 38. B. C. Bruce, "Plans and Social Actions," in R. Spiro, B. C. Bruce, and W. Brewer eds, TheoreticalIssuesin Reading Comprehension, pp. 367-384, Erlbaum, Hillsdale, NJ, 1980.
14. E. Guelich, Makrosynta,xder Gliederungssignale im Gesprochenen 39. L. Polanyi, Telling the American Story, Ablex Publishitg, NorFranzoesisch,Wilhelm Fink Verlag, Munich, L970. wood,NJ, 1985. 15. D. Schiffrin, Discourse Markers: Semantic Resourcefor the Construction of Conversation, L982, unpublished Ph.D. Dissertation, 40. B. Propp, Morphology of the Folktale, University of Texas Press, Austin, 1968. University of Pennsylvania. Texas WorkD. E. Rumelhart, "Notes on a Schemafor Stories," in D. G. Bobrow Encounters. 41. Seruice of in the Use O.K. M. Merritt, On 16. and A. Collins ed, Representationand Understanding,pp. zLLing Papersin Sociolinguistics42, SouthwestEducational Develop236. Academic Press,New York, L975. ment Lab., Austin, TX, 1978.
PROBLEMSOTVING DISTRIBUTED
245
Ob. D. J. Litman and J. F. Allen, A Plan RecognitionModel for Sub' dialoguesin Conuersations,Technical Report TR t41-,Department of Computer Science,University of Rochester,November, 1984. 60. D. J. Litman, "Linguistic Coherence:A Plan-BasedAlternative," tn 24th Annual Meeting of the Associationfor ComputationalLinguistics,pp. 2L5-223, New York, 1986. O7. D. Hymes, "Models of the Interaction of Language and Social Setting," Journal of Social Issues23(2), 8-28 (1967). 08. D. Hymes, "Models of the Interaction of Language and Social Life," in J. Gumperz andD. Hymes eds,Directions in Sociolinguistics,pages 35-71 Holt, Rinehart and Winston, New York, 1972. 69. P. S. Byrne and B. E. L. Long, Doctors Talking to Patients, Her Majesty's Stationery Office, London, 1976. 70. E. Schelgloffand H. Sacks,"Opening up closirgs," SemioticoVIII (4),289-327 (1973). 7L. A. D. Rubin, "A Theoretical Taxonomy of the DifferencesBetween Oral and Written Langua ge:'in R. J. Spiro,B. C. Bruce, and W. F. Brewer (editors), Theoretical Issues in Reading Comprehension, pp. 411-438, Erlbaum, Hillsdale, NJ, 1980. 72. J. A. Levin and J. A. Moore, "Dialogue games:Metacommunication structures for natural language interaction," Cognitiue Sciencet(4),395-420, October, 197773. P. R. Cohen, S. Fertig, and K. Starr, "Dependenciesof Discourse Structure on the Modality of Communication: Telephonevs. TeIetype," in Proceedingsof The 20th Annual Meeting of the Assoc.of ComputationalLinguistics, pp. 28-35. June, 1982. b1. J. L. Austin, How to do things with words, Oxford University 4. R. J. Tierney, J .LaZansky, T. Raphael,and P. R. Cohen,"Author's 7 Press,London, 1962. and Readers'InterpretatioDs," in R. J. TierneY, P. AnIntentions language, philosophy of 52. J. R. Searle, Speechacts: An essayin the N. Mitchell (editors), Understanding Readers' Under' J. and ders Cambridge University Press,Cambridge,U.K., 1969. standings,Lawrence Erlbaum Assoc.,Hillsdale, NJ, 1983. 53. J . L. Morgan and M. Sellner, "Discourse and Linguistic Theory,' 75. B. C. Bruce, "DiscourseModels and Language Comprehension," in R. J. Spiro, B. C. Bruce and W. F. Brewer eds,TheoreticalIssues AJCL 35, 19-35 (1975). NJ, Hillsdale, pp.165-200, Erlbauffi, in Reading Comprehension, 1980. R. J. H. ScrA, B. C. BnucE, and L. PoT,ANYI 54. J. L. Morgan, "Two types of convention in indirect speechacts," in BBN Laboratories Inc. pp. 261Pragmatics, 9: Volume P. Cole ed, Syntatcand Semantics, 280, Academic Press,New York, 1978. This research was partially supported by the National Institute of 5 5 . M. J. Adams and B. C. Bruce, "Background knowledge and read- Education under Contract No. 400-81-0030,and by the AdvancedReing comprehension," in J. Langer and M. T. Smith-Burke eds, search Projects Agency of the Department of Defense under Contract Read,erMeets AuthorlBridging the Gap: A psycholinguistic and No. N0014-85-C-0079. sociolinguisticperspectiue,pp. 2-25, International Reading Association, Newark, Delaware, L982. 56. H. P. Grice, "Meanirug,"Philosophical Reuiew 661377-388, 1957. SOLVING PROBTEM 5 7 . D. F. Schmidt, N. S. Sridharan, and J. L. Goodson,"The plan DISTRIBUTED recognition problem: An intersection of artificial intelligence and psycholoW," Artificial Intelligence tO, 45-83 (1979). the researchinterests of
42. W. G. Lehnert, "Plot Units and Narrative Summarization," Cognitiue Science5(4), 293-331, 1981. 43. W. G. Lehnert, J . B. Black, and B. J. Reiser,"Summ arizing Narratives," in Proceed,ingsof the SeuenthInt'L. Joint Conferenceon Artifi.cial Intelligence,Vancouver,B.C., pp. 184-189, 1981. 44. W. G. Lehnert, "An In-depth Understander of Narratives," Artificial Intelligence 20(1), L5-62, 1983. 45. W. Lehnert and C. Loiselle, "Plot Unit Recognition for Namatives," in G. Tonfoni, ed, Artifi,cial intelligence and Text-Understand,ing: Plot (Inits and Sammarization Procedures,pp. 9-47. Ed. Zara, Parma, ItalY, 1985. 46. M. G. Dyer, "The Role of TAUs in Narratives," tn Proceedingsof the Third, Annual Conferenceof the CognitiueScienceSociety,pp. 225-227, Cognitive science society, Berkeley, cA, 1981. 47. M. G. Dyer, In-d,epth Understanding: A Computer Model of Integrated, Processingand Memory for Narratiue Comprehension,MassachusettsInstitute Technology Press, Cambridg., MA, 1983. 48. W. F. Brewer and E. H. Lichtenstein, "Event schemas,story schemas, and story grammars," in J. D. Long and A. D. Baddeley eds, Attention and PerformanceIX., pp. 363-379, Erlbaum, Hillsdale, NJ, 1981. 49. W. F. Brewer and E. H. Lichtenstein, "stories are to entertain: A structural-affect theory of stori€s," Journal of Pragmatics 6, 473486 (1982). 50. P. F. Strawson,"On Referrirg," Mind 59' 320-344 (1950).
58. D. K. Lewis, Conuention:A philosophical study, Harvard University Press,Cambridge,MA, 1969. 59. S. Schiffer, Meanirg, Oxford University Press,London,1972. 60. R. C. Stalnaker, "Pragmatic presuppositions,"in M. K. Munitz and P. K. Unger ed,Semanticsand Philosophy, pp. L97-213, New York University Press, New York, I974. 6 1 . C. R. Perrault and J. F. Allen, "A plan-basedanalysis of indirect speechacts,"American Journal of ComputationalLinguisfics 6(3)' 167-182 (1980). 62. C. L. Sidner and D. J. Israel, "Recogniztngintended meaning and speaker'splans," in Proceedingsof the SeuenthInternational Joint Conference in Artificial Intelligence, Vancouver' B.C., August, L981,pp. 203-208. G3. C. L. Sidner, "Plan parsing for intended response recognition in discourse," Computational Intelligence 1(1), 1-10 (February, 1985). G4. M. E. Pollack, "A model of plan inference that distinguishes between the beliefs of actors and observers,"in 24th Annual Meeting of the Association for Computational Linguistics, pp. 207-2t4, New York, June, 1986.
Distributed problem solving combines AI and distributed processing. Distributed problem-solving networks are broadly defined as loosely coupled distributed networks of semiautonomousproblem-solving nodes (processing elements) that are capable of sophisticatedproblem solving and cooperatively interact with other nodesto solve a sin' gle problem. Each node is a complex problem-solving system that can modify its behavior as circumstanceschangeand plan its own communication and cooperation strategies with other nodes (see Problem solving). Although distributed problem solving borrows ideas from both AI and distributed processing, it differs significantly from each in the problems being attacked and the methods used to solve these problems. Distributed problem-solving networks differ from distributed-processingsystems in both the style of distribution and the type of problems addressed.These differences are most apparent when the interactions among nodes in each of the networks are studied. A distributed-processingnetwork typically has multiple, disparate tasks executing concurrently in the network. Shared accessto physical or informational re-
246
PROBTEMSOTVING DISTRIBUTED
sourcesis the main reason for interaction among tasks. The goal is to preserve the illusion that each task is executing alone on a dedicated system by having the network-operating system hide the resource-sharing interactions and conflicts among tasks in the network. In contrast, the problem-solving procedures in distributed problem-solving networks are explicitly aware of the distribution of the network components and can make informed interaction decisions based on that information. This difference in emphasisis, in part, due to the characteristics of the applications being tackled by conventional distributed-processing methodologies.Traditional distributed-processing applications use task decompositionsin which a node rarely needs the assistanceof another node in carrying out its problem-solving function. Thus, most of the research as well as the paradigms of distributed processingdo not directly address the issues of cooperative interactions of tasks to solve a single problem. As discussedbelow, highly cooperative task interaction is a requirement for many problems that seem naturally suited to a distributed network. Distributed problem solving in turn differs from much of the work in AI becauseof its emphasis on representing problem solving in terms of asynchronous,looselycoupledprocess networks that operate in parallel with limited interprocess communication. Networks of cooperatingnodesare not new to AI. However, the relative autonomy and enhancement of the problem-solving nodes, a direct consequenceof limited communication, sets distributed problem-solving networks apart from Hewitt's work on the actor formalism (qt) (1), Feldman's connectionist approach (see Connectionism) (2), Kornfeld's ETHER language (3), and Lenat's BEINGS system (4). In these latter systems knowledge is compartmentalized so that each actor or "exp ert" is a specialist in one particular aspectof the overall problem-solving task. The advancedbehavior exhibited from these systems stems from predefined interactions betweentightly coupled,simple processingelements.Each expert has little or no knowledge of the problem-solvittgtask as a whole or of general techniques for communication and cooperation. As a result, an expert cannot function outside the context of the other experts in the system nor outside specific communication and cooperationprotocolsspecifiedin advance by the system designer. In contrast, each node in a distributed problem-solvingnetwork possessessufficient overall problem-solving knowledge that its particular expertise (resulting from a unique perspective of the problem-solving situation) can be applied and communicated without assistance from other nodes in the network. This doesnot imply that a node functions as well alone as when cooperatingwith other nodes-internode cooperation is often the only way of developing an acceptablesolutionbut every node can at least formulate a partial solution using only its own knowledge. Each node in the distributed network also possessessignificant expertise in communication and control. This knowledge frees the network from the bounds of designedprotocols and places its nodes in the situation of developing their own communication and cooperationstrategies. Distributed problem solving is an important research area for several reasons.First, hardware technology has advanced to the point where the construction of large distributed problem-solving networks is not only possiblebut also economically feasible.Although the first networks may consistof only a small number of nodes,distributed problem-solving networks may eventually contain hundreds or thousands of individual
nodes.A situation of exciting hardware possibilities is near, unaccompanied by the problem-solving technology required for their effective utilization. Second,there are AI applications that are inherently spatially distributed. A distributed architecture that matches their spatial distribution offers many advantagesover a centralized approach.Third, understanding the processof cooperativeproblem solving is an important goal in its own right. No matter if the underlying system is societal, managerial, biological, or mechanical, competition is better understoodthan cooperation.It is possiblethat the development of distributed problem-solving networks may serve the same validating role to theories in sociology, management, organizational theory, and biolory as the development of AI systemshave served to theories of problem solving and intelligence in linguistics, psychology,and philosophy. Usesof DistributedProblemSolving.There are four general application areas that seemwell suited to distributed problemsolving technology: Distributed Interpretation Distributed interpretation applications require the integration and analysis of distributed data to generate a (potentially distributed) model of the data. Example application domains include distributed sensor networks and network-fault diagnosis. DistributedPlanningand Control. Distributed planning (qv) and control applications involve developing and coordinating the actions of a number of distributed effector nodesto perform some desired task. Example application domains include distributed air-traffic control, gToupsof cooperating robots, remotely piloted vehicles, distributed processcontrol in manufacturing, and resource allocation in transportation and/or delivery systems. Distributed planning and control applications often involve distributed interpretation to determine appropriate node actions. Coordination Networks. Coordination-network applications involve the coordination of a number of individuals in the performance of some task. Example domains include intelligent command and control systeffis,multiuser project coordination, and cooperative work-station environments where work is shared between work stations. CooperativeInteractianamong ExpertSysfems.One means of applying expert-systemtechnolory to larger problem domains is to develop cooperative interaction mechanisms that allow multiple expert systems to work together toward solving a common problem. Example situations include bringing together a number of specializedmedical-diagnosissystems(see Medical-advice systems)on a particularly troublesomecaseor negotiation among expert systems (qt) of two corporations to decideprice and/or delivery time on a major purchase. Initial work in distributed problem solving has focusedon three application domains: distributed sensor networks, distributed air-traffic control, and distributed robot systems (see Robotics)(5-7). AII of these applicationsneedto solvein some form the tasks of distributed interpretation and distributed planning and/or control. Planning in this context refers not only to determining what actions to take (suchas changing the course of an airplane) but also to deciding how to use the resourcesof the network to effectively carry out the interpretation and planning task. This latter form of planning encompassesthe classic focus-of-attentionproblem in AI. In addition to the commonality in terms of the generic tasks
PROBLEMSOLVING DISTRIBUTED
247
being solved,these application domains are characterizedby a is spatially "nearby." (Each node is an "expert" at what is natural spatial distribution of sensorsand effectorsand by the happening in its spatial neighborhood.) The problem could fact that the subproblems of both the local interpretation of also be decomposedalong a mixture of functional and spatial sensory data and the planning of effector actions are interde- lines. Hierarchical yersus HeterarchicaLThe node-interaction pendent in time and space.For example, in a distributed senstructure is another important dimension of task decomposisor network tracking vehicle movements,I vehicle detectedin tion. Hierarchical structures work well when control or results one part of the sensedarea implies that a vehicle of similar type and velocity will be senseda short time later in an adja- need to be concentrated at one point in the network, but they cent area. Likewise, a plan for guiding an airplane must be are sensitive to the loss of a high-level node in the hierarchy. coordinated with the plans of other nearby airplanes in order Heterarchical structures can be more robust to the loss of to avoid collision. Interdependencealso arises from redun- nodes but can exhibit increased communication and control dancy in sensory data. Often different nodes sensethe same problems.A particular problem may be best decomposedinto event due to overlaps in the range of sensorsand the use of combination of hierarchical and heterarchical substructures. Redundantyersus Disjoint Activities. Redundant activities different types of sensorsthat sensethe same event in different ways. Exploiting these redundant and alternative views consumenetwork resources,and efficiency considerationssugand the interdependenciesamong subproblemsrequire nodes gest that redundant activities should be minimized. Howto cooperate in order to interpret and plan effectively. This ever, the lack of redundant activities can leave the network cooperationleads to viewing network problem solving (qv) in open to severely degraded performance if a crucial activity is terms of a single problem rather than a set of independent lost to nodefailure. A more robust approachwould have crucial activities redundantly performed as insurance against node subproblems. failure. The Key lssues The developmentof a distributed problem-solvingarchitecture that can exploit the characteristics of these applications to limit internode communication, to achieve real-time response, and to provide high reliability represents a difficult task. Nodesmust cooperateto exploit and coordinatetheir answers to interdependent subproblems but must do so with limited interprocessorcommunication. This requires the development of new paradigms that permit the distributed system to deal effectively with environmental uncertainty (not having an aceffectors, curate view of the number and location of processors, sensors,and communication channels), data uncertainty (not having complete and consistent local data at a node),and control uncertainty (not having a completely accurate model of activities in other nodes). The developmentof these paradigms has (and will continue to require) research on the three interacting issues discussed below.
Dealing with lncomplete and InconsistentInformation.In many applications communication delay makes it impractical for the network to be structured so that each node has all the relevant information needed for its local computations and control decisions.Another way of viewing this problem is that the spatial decompositionof information among the nodes is ill-suited to a functionally distributed solution. Each nodemay possessthe information necessary to perform a portion of each function, but there is insufficient information to perform any function completely.Thus, a secondmajor issuein distributed problem solving is designing a network to deal with possibly incomplete and inconsistent data and control information cooperatively.
Control. AnObtainingGlobalCoherencewith Decentralized problem solving is distributed issue in cooperative other major developing network-coordination policies that provide sufficient gtobal eoherencefor effective cooperation.Coherent netproblem solving requires the achievement of the followwork Task Decomposition.How a particular task is decomposed ing conditions. for a distributed problem-solving solution can be influenced by reductionist From a how the distributed network is viewed. Coverage. Any necessary portion of the overall problem perspective,a distributed network is viewed as a system that be included in the activities of at least one node. must is decomposedover a number of nodes,each of which is a part Nodes must interact in a manner that perConnectivity. in the overall network. From a constructionist perspective, activities to be developedand integrated the covering mits however, a distributed network is a society of nodes,where solution. into overall an perspectives both Although system. each node is an individual Capability. Coverage and connectivity must be achievable view the same reality, the reductionist viewpoint tends to enwithin the communication and computation resourcelimicouragea searchfor appropriate ways of pulling apart existing tations of the network. centralized systems. The constructionist viewpoint tends to comindividually organizing of ways encourage a search for Achieving coherenceis difficult becausethe use of a global plete systems into a society of cooperating nodes. From both perspectives there are several dimensions of "controller" node is not an option. Such a node is precludedby two considerations: task decomposition: Functional yersus Spatial. In a functional decomposition Internode communication is limited, restricting the view of each node is an "expert" at some part of the basic problemeach node (including the proposed controller) of network solving expertise; problem solving is routed to the appropriate problem-solving activities. A global controller node would decomposition In a spatial required. expert as that expertise is becornea severe communication and computational bottleall the problem-solvingexpertiseand apeach node possesses neck. plies all its expertise to the portion of the overall problem that
248
PROBTEMSOTVING DISTRIBUTED
Network reliability criteria require that the network's performance degrades gracefully if a portion of the network fails. However, if the proposedcontroller node fails, the resulting network collapsewould be anything but graceful. In the absenceof a global controller node, each node must be able to direct its own activities in concert with other nodes based on incomplete, inaccurate, and inconsistent information. Research on these three issues will draw heavily on the work in knowledge-based AI systems and will, simultaneously, make contributions to AI. As Nilsson has noted (8), the challenges posedby distributed AI will contribute to (and may even be a prerequisite for) progTessin "ordinary" Al. The Key ldeas Contract Networks. One approach to the coordination problem is the work of Smith and Davis on the contract net formalism (9). The contract-net formalism incorporates two major ideas. The first idea is the use of negotiation between willing entities as a means of obtaining coherent behavior. Negotiation involves a multidirectional exchange of information between the interested parties, &r evaluation of the information by each member from its own perspective, and final agreement by mutual selection.Negotiation differs from voting in that dissident members are free to exit the negotiation rather than being bound by the decisionof the majority. The second idea is the use of negotiation to establish a network of contracting control relationships between nodesin the distributed problem-solving network. In the contract-net formalism nodes coordinate their activities through contracts to accomplish specificgoals. These contracts are elaborated in a top-down manner; at each stage a manager node decomposes its contracts into subcontracts to be accomplishedby other contractor nodes. This process involves a bidding protocol based on a two-way transfer of information to establish the nature of the subcontracts and which node will perform a particular subcontract.The elaboration procedurecontinues until a node can complete a contract without assistance.The result of the contract elaboration processis a network of managercontractor relationships distributed throughout the network. Smith and Davis have used a model of distributed problem solving in which the network passesthrough three phasesas it solves a problem. The first phase is problem decomposition. The problem-solving task is recursively partitioned into increasingly smaller subtasks until atomic (nondecomposable) tasks remain. Part of this decompositionprocessis assignment of the subtasks to individual nodes. Smith calls this the connection problem. Node assignment is particularly intertwined with problem decomposition. Different assignments may be best suited to different possible decompositions, and vice versa. This node-assignmentaspect of problem decomposition was made explicit by the inclusion of a distinct phase,subproblem distribution, in a later report by Davis and Smith (10). The secondphase in their model is the coordinatedsolution of the individual subproblems.Potential interactions with other nodes during the subproblem-solution phase are specifiedby the elaborating nodes. The third phase is answer synthesis, using the results produced by the secondphase. Part of the
answer-synthesisphase is assignment of synthesis activity to particular nodes. It should be noted that more than one node can have a solution to a particular subproblemand that not all such solutions are equally good. If the best subproblem solutions are to be used in the answer-synthesisphase,the synthesizing nodes must locate and acquire these superior solutions. Therefore, the inclusion of another phase, solution collection, appears appropriate given the inclusion of a distinct subproblem-distribution phase. Functionally Accurate, Cooperative Networks. Lesser and Corkill (11) have approacheddistributed problem solving by developing nodesthat are able to cooperateamong themselves so that the network as a whole can function effectively even though the nodes have inconsistent and incomplete views of the information used in their computations.They call this type of distributed problem solving functionally accurate, cooperative (FA/C). In the FA/C approachthe distributed network is structured so that each node can perform useful processing with incomplete input data, while simultaneously exchanging partial, tentative, high-level results of its processing with other nodes to construct a complete solution cooperatively. The intent is that the amount of communication required to exchange these results is mueh less than the amount of communicated raw data and results that would be required by a conventional distributed-processingapproach.In addition, the synchronization required among nodes can also be reduced, resulting in increased node parallelism and network robustness. Coordinationusing OrganizationalStructuring.Network coordination is difficult in a cooperative distributed problemsolving network becauselimited internode communication restricts each node's view of network problern-solving activity. Furthermore, it is important that network-coordination policies do not consume more processing and communication resourcesthan benefits derived from the increasedproblem-solving coherence.Corkill and Lesser (L2) suggest that even in networks composedof a modest number of nodes, 4 complete analysis to determine the detailed activities at each node is impractical; the computation and communication costs of determining the optimal set and allocation of activities far outweigh the improvement in problem-solving performance. Instead, they argue that coordination in distributed problem-solving networks must sacrifice some potential improvement for a less complex coordination problem. What is desired is a balance between problem solving and coordination so that the combined cost of both is acceptable. The emphasis is shifteld from optimi zing the activities in the network to achieving an acceptable performance level of the network as a whole. These policies must also have enough flexibility to provide sufficient system robustnessand reliability to respond to a changing task and hardware environment. In order for network control to satisfy these requirements, it must be able to tolerate the lack of up-to-date, incomplete, or incorrect control information due to delays in the receipt of information, the high cost of acquisition and processing of the information, and errors in communication and processing hardware. Corkill and Lesser view the balance between local node control and networkwide control as a crucial aspect of the design of such decentralized network-control policies. They
PROBLEM SOLVING DISTRIBUTED
249
suggestit is unrealistic to expectthat network-control policies can be developedthat are sufficiently flexible, efficient, and require limited communication while simultaneously making all the control decisionsfor each node in the network. To accomplish this, a node needsa complexform of local control that permits it to plan sequencesof activities and to adapt its plan basedon its problem-solvingrole in the network, on the status and role of other nodes in the network, and on self-awareness of its activities. An organizational structure is used to provide each node with a high-level view of problem solving in the network. It specifiesa general set of node responsibilitiesand node-interaction patterns that is available to all nodes.Included in the organi zational structure are control decisions that are not quickly outdated and that pertain to a large number of nodes. The advancedlocal-control component of each node is responsible for elaborating these relationships into precise activities to be performed by the node. In this way they have split the network-coordination problem into two concurrent activities (L2):
tributing different problem-solving architectures.The work on contract networks and FA/C networks, for instance, has come out of distributing the HEARSAY-II (qv) cooperating knowledge-sourcemodel of problem solving. The work discussedin this section has comeout of distributed problem-solvingarchitectures that have a formal-logic underpinning. Much of this work has been done on the task domain of multiagent planning, where a group of robots work together. Two important extensions to these formal systems are needed in order for them to work in a distributed system. The first extension is that these systems must be able to represent and reason about the concurrent activities of multiple agents;the work of Corkill (15) on distributed NOAH and of Georgeff (16) addressissues of how agents can synchronize their plans with other agents in order to avoid resource conflicts. The second extension is that these systems must deal with situations where agents have incomplete knowledge or Iimited computational resources.Both caseslead to the possibility of generating incorrect inferences, which in turn may result in agents having inconsistent beliefs about the world. The work of Hewitt (17) on open systemstries to deal with this construction and maintenance of a networkwide organiza- same situation, but he argues that formal-logic systems are tional structure and inadequate for solving this problem. Researchersare following a number of different approaches continuous local elaboration of this structure into precise activities using the local-control capabilities of each node. to extending logical formalism for use in a distributed problem-solving environment. Konolige (18,19)has developedthe deductive belief model in which an agent's beliefs are deThe organi zational structure provides a control framework that reduces the amount of control uncertainty present in a scribed as a set of sentencesin formal language together with node (due to incomplete or errorful local-control information) a deductive processfor deriving the consequencesof those beliefs. This approach can account for the effect of resourcelimiand increasesthe likelihood that the nodeswill be coherent in their behavior by providing a general and global strategy for tations on the derivation of the consequencesof beliefs. Appelt network problem solving. The organizational structuring ap- (20) has used a possibleworld formalism to represent and reaproach to limiting control uncertainty still preservesa certain son about belief. Rosenschein(2L) is working on a more genlevel of control flexibility for a node to adapt its local control to eral theory of multiagent planning that allows for the existence of other agents and their mental states as part of the changing task and environmental conditions. Organizational structuring requires expertise in selecting environment within which plans can be constructed. Additional work by Halpern and Moses (22), though not directly an org anization that is appropriate for the particular distributed problem-solving situation. Malone and Smith (13) have formulated in the planning domain, is relevant to this topic analyzed generic organizational classesmathematically to de- (seeBelief systems;Reasoning,plausible). The work on multiagent planning is closely associatedwith termine their performance strengths and weaknesseswith rethat of dialogue comprehensionin natural-language processspectto processing,communication, coherence,and flexibility. Their analysis has shown that different organizational classes ing (23,24). In both research topics it is necessary to reason are appropriate given different problem situations and perfor- about multiple agents with distinct and possibly contradictory mental states; mental states include not only facts or knowlmance requirements. Kornfeld and Hewitt (14) have proposed that distributed edgebut also beliefs and goals. The reasoning required in both domains is necessaryfor interpreting an agent's communicaproblem solving can be organized analogously to the structure of scientific research. In their scientific community metaphor tion (this includes understanding what the communication imfor problem solving, nodes would posit either "questions" plies about the agent's mental state), for altering another (goals) or "answers" (results) into a mutually accessiblear- agent's mental state through appropriate communication, and chive. The presenceof this information allows a node to draw for taking into account the potential actions of other agents on work already performed by other nodes.They also propose that might help or hinder communication. Another research approachtoward developinga formal theusing the economics of funding as the basis for controlling ory for understanding the nature of cooperation among mulactivity in the network. Although the metaphor is an interesting way of viewing distributed problem-solving networks, tiagents is that of Rosenschein and Genesereth (25). They there remains significant research on effectively implement- have based their model on game-theory techniques and have ing the archival and funding mechanism in a distributed envi- shown the ability of communication to resolve conflicts among ronment. agents having disparate goals. Reasoningabout Beliefsand Concurrency.In part, the very disparate research directions that have characterizedearly research in distributed problem solving have come out of researchers attempting to understand the implications of dis-
Empiricallnvestigations.Becausethis researcharea is still quite new and becauseof the difficulties in engineering a distributed problem-solving system for a real-world application, there are few empirical results on the performance of distrib-
250
PROBTEMSOTVING DISTRIBUTED
uted problem solving systems. The results that are available have come solely from simulations. The earliest empirical results are those by Lesser and Erman 126lon the task of distributed interpretation. They simulated a three node network in which each node was a complete Hearsay-Il speech understanding system. In their experiments, each node received a fragment of the acoustic speech data so as to simulate a spatially distributed collection of sensors. Nodes cooperated with one another by communicating only high level abstract hypotheses of what it had observed basedon the processingof its limited sensordata and hypotheses received from other nodes. This limited communication reflected the limited communication bandwidth in a real system. Their experiments showed that through a cooperative exchange of high level data, the nodes could effectively deal with limited sensor information, duplicating the performance of the centralized Hearsay-Il speech understanding system. They also explored the implications of a noisy communication channel. In these experiments, it was shown that even though important hypotheseswere lost in communication, the system had sufficient robustness to recover. Recovery was due to the partial overlapping of sensor data among the nodes and the ability of a local node to pursue an alternative path to the solution if the current path could not be extended. Their experiments represented the first empirical validation of the concept of distributed FA/C networks. The work by Malone, et. al. I27l on the Enterprise system explored the use of the contract net protocol for allocation of tasks in a network of personal computers. They used the bidding processof this protocol to implement the metaphor of a market place in which the bids represented estimates by the node of when it could complete the processingof the specified task. Bids reflected the processing capacity of the node and what files were currently loaded on its local disk. They showed by simulation that this approachto allocation resulted in quite good performance with relatively low communication. However, their simulation results do not directly apply to scheduling activity in distributed problem solving systems,since the tasks they scheduledwere ind.ependent. Another set of empirical results have been generated by Lesserand Corkill t28l using the Distributed Vehicle Monitoring Testbed.The testbed was designedto be highly parameterrzedso that a wide range of issuesin distributed problem solving systems design could be empirically explored. The testbed simulated a network of nodes attempting to identify, locate and track patterns of vehicles moving through a two-dimensional spaceusing signals detectedby acousticsensors.Each node is an architecturally complete Hearsay-Il system with knowledge sources and levels of abstraction appropriate for this task. The basic Hearsay-Il architecture has been extended to include more sophisticated local control, and the capability of communicating hypothesesand goals among nodes.Goals indicate the node's intention to abstract and extend hypotheses on the data blackboard. Each node has a planner that determines the local problem-solving activities of the node based on its potential processing activities (represented by goals created from local problem-solving activity) and on externally directed requests from other nodes (communicated goals). They have used the testbed to empirically explore the issue of global coherence.Recent results have indicated the crucial role that sophisticated local node control plays in achieving effective coherenceQg). Researchersat RAND (30) have been exploring the task of
distributed air-traffic control. In this task each plane can sense its local environment and plan its trajectory through the air space. In their simulations they have explored the issues involved in resolving conflict among planes whose current courses would result in near-miss or midair crash. Their approach to this conflict resolution, which they call task centralization, is for the planes to dynamically form coalitions to resolve conflicts. Within a coalition one plane is given the responsibility for resolving the conflict by modifyittg its plans. They have explored a number of interesting strategies, called least constrained and most knowledgeable, for how planes negotiate with one another to decide which one in the coalition is in control.
Summary Distributed problem solving is a very new research area in which there are few concrete examples and little empirical data. However, it holds much promise in shedding light on how to design complex AI systems as well as how to exploit the coming generation of parallel and distributed hardware architectures. Early research in this field has already provided a good understanding of the issues that must be faced in the design of distributed problem-solving systems and approaches that may prove fruitful in solving these problems.
BIBLIOGRAPHY 1. C. Hewitt, "Viewing control structures as patterns of passingmessages,"Artif. Intell. S(3),323-364 (Fall 1977). 2. J. A. Feldman and D. H. Ballard, "Connectionist modelsand their properties,"Cog. Sci. 6(3), 205-254 (July-September 1982). 3. W. A. Kornfeld, ETHER: A Parallel Problem Solving System,Proceedingsof the Sixth International Joint Conferenceon Artifi.cial Intelligence,Tokyo, Japan, August t979, pp. 490-492. 4. D. B. Lenat, Beings: Knowledge as Interacting Experts,Proceedings of the Fourth International Joint Conferenceon Artificial Intelligence,Stanford, CA, August L975,pp. 126-133. 5. R. Davis, "Report on the workshop on Distributed AI," SIGART Newslett., 73, 42-52 (October1980). 6. R. Davis, "Report on the second workshop on Distributed AI," SIGART Newslett.80, 13-23 (April 1982). 7. M. Fehling and L. Erman, "Report on the third annual workshop on distributed artificial intelligence," S/GAR T Newslett, 84, 3-12 (April 1983). 8. N. J. Nilsson, "Two headsare better than one," SIGART Newslett. 73, 43 (October1980). 9. R. G. Smith and R. Davis, "Frameworks for cooperationin distributed problem solving," IEEE Trans. Sys. Man Cybernet. SMC11(1),61-70 (JanuarY1981). 10. R. Davis and R. G. Smith, Negotiation as a Metaphor for Distributed Problem Solving, AI Memo 624, AI Laboratory, MIT, Cambridge, MA, May 1981. 11. V. R. Lesser and D. D. Corkill, "Functionally accurate,cooperative distributed systems,"IEEE Trans. Sys. Man Cybernet.SMC11(1),81-96 (January 1981). t2. D. D. Corkill and V. R. Lesser,The Use of Meta-Level Control for Coordination in a Distributed Problem Solving Network, Proceedings of the Eighth International Joint Conferenceon Artificial In' telligence,Karlsruhe, FRG, August 1983, pp. 748-756. Also see B. W. Wah and G.-J. Li (eds-),ComputerArchitecturesfor Artificial IntettigenceApplications,IEEE Computer Society, 1986, pp. 507-515.
DOMAIN KNOWLEDGE 13. T. W. Malone and S. A. Smith, Tradeoffs in Designing Organizations: Implications for New Forms of Human Organizations and Computer Systems, Working Paper CISR WP LL2 (Sloan WP L54L-84),Center for Information Systems Research,MIT, Cambridge, MA, March 1984. t4. W. A. Kornfeld and C. E. Hewitt, "The scientific community metaphor," IEEE Trans. Sys. Man Cybernet.SMC-11(1), 24-33 (January 1981). 15. D. D. Corkill, Hierarchical Planning in a Distributed Environment, Proceedings of the Sixth International Joint Conferenceon Artifi.cial Intelligence, Tokyo, Japan, August 1979, pp. 168-L75. 16. M. Georgeff, A Theory of Action for Multiagent Plannitg, Proceedings of the Fourth National Conferenceon Artificial Intelligence,Austin, TX, August 1984,pp. 121-L25. I7. C. Hewitt and P. de Jong, Analyzing the Rolesof Descriptionsand Actions in Open Systems,Proceedingsof the Third National Conferenceon Artificial Intelligence, Washington, DC, August 1983, pp. 162-166. 18. K. Konolige, Circumscriptive Ignorance, Proceedingsof th.eSecond National Conferenceon Artificial Intelligence, Pittsburgh, PA, August 198'2,pp. 202-204. 19. K. Konolige, A Deductive Model of Belief, Proceedings of the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1983,pp. 377-381. 20. D. E. Appelt, Planning Natural Language Utterances to Satisfy Multiple Goals, Technical Note 259, SRI International, Menlo Park, CA, 1982. 2L. S. Rosenschein,"Reasoning about Distributed action," AI Mag. 84, 7, 1993. 22. J. Y. Halpern and Y. Moses,Knowledge and Common Knowledge in a Distributed Environment, IBM Research Report IBM R.I 4421,IBM, 1984. 23. P. R. Cohen, On Knowing What to Say: Planning SpeechActs, Ph.D. Thesis, University of Toronto, January L978.Also as Technical Report 118, Department of Computer Science,University of Toronto, Toronto, Ontario, January 1978. 24. J. F. Allen, A Plan-Based Approach to SpeechAct Recognition, Ph.D. Thesis, University of Toronto, February t979. AIso Technical Report 13L179,Department of Computer Science,University of Toronto, Toronto, Ontario, February L979. 25. J. S. Rosenchein and M. R. Genesereth,Deals Among Rational Agents, Technical Report HPP-84-44, Stanford Jeuristic Programming Project, Computer Science Department, Stanford University, Stanford, CA, December 1984. 26. V. R. Lesser and L. D. Erman, "Distributed interpretation: A model and experiment," IEEE Trans. Comput. C-29(I2), LL441163 (December1980). 27. T. W. Malone, R. E. Fikes, and M. T. Howard, Enterprise:A Market-like Task Scheduler for Distributed Computing Environments, Working Paper CISR WP 111 (SloanWP 1537-84),Center for Information Systems Research,MIT, Cambridg", MA, 1983. 28. V. R. Lesser and D. D. Corkill, "The distributed vehicle monitoring testbed: A tool for investigating distributed problem solving networks," AI Mag. 4(3), 15-33 (Fall 1983). 29. E. H. Durfee, V. R. Lesser,and D. D. Corkill, Towards Coherent Cooperationin a Distributed Problem Solving Network, in M. N. Huhns (ed)., Distributed Artifi.cial Intelligence, Pitman, London, 1987. 30. S. Cammarata, D. McArthur, and R. Steeb,Strategiesof Cooperation in Distributed Problem Solving, Proceedingsof the Eighth International Joint Conferenceon Artifi.cial Intelligence, Karlsruhe, FRG, August 1983,pp.767-77A.
V. LnssERANDD. ConKrLL University of Massachusetts
DOMAIN KNOWLEDGE Domain knowledge is the collection of problem-specificfacts, goals, and proceduresthat a knowledge-basedsystem needsin order to solve problems. Domain knowledge also includes the concepts,attributes, and relations that make up these facts, goals, and procedures.It contrasts with domain-independent knowledge, such as general heuristics (qv) and strategies of problem solvitg, and theories that cover many different domains and types of problems. Both kinds of knowledge can include declarative and procedural components.In an expert system, the knowledge base usually contains the domain knowledge, whereas the inference engine is domain independent. For example, in a system that diagnosescomputer malfunctions, the domain knowledge would at the very least include typical breakdown patterns, underlying causes,and the relations between the two, in other words, the empirical expertise needed to track down a malfunction. More detailed domain knowledge would include descriptions of the structure and function of the particular machine being analyzed together with procedures for interpreting or understanding these descriptions. Domain-independent knowledge would consist of the general diagnostic heuristics and reasoning strategies for identifying the causesof malfunctions in any kind of computer or even machine. The boundary between domain knowledge and domain-independent knowledge depends on the goals defined for the knowledge-basedsystem rather than on any inherent properties of the knowledge itself. In an expert system for diagnosing problems in any kind of personal computer, the domain knowledge would have to cover enough specifics for all machines falling into this very broad category. It is more effective to design knowledge bases for smaller and more circumscribed probleffis, such as the diagnosis of malfunctions in one specific computer model or at most a single family of personal computers that share the same architecture and operating system and hence similar modes of malfunction. Domain knowledge was distinctively used as a term only after AI systems moved away from general problem-solving paradigffis, such as heuristic search, and started to develop methods of symbolic reasoning that relied heavily on specific, qualitative knowledge of a problem or class of problems.An early example is the MACSYMA (1) system for symbolic integration, which has extensive domain knowledge of calculus, the rules of integration, and proceduresfor simplifying formuIas. Another is heuristic DENDRAL (2), where domain knowledge of molecular structures, the processof mass spectrometty, and the heuristics used by chemists in interpreting mass spectra is used to elucidate the structure of a molecule. These and other early knowledge-basedsystems solved very specialized kinds of problems, and they did not result in any generalized representations for domain knowledge that could be shared by a wide variety of problems in different domains. Research on natural-Ianguage understanding (qv) and problem solving (qr) gave rise to useful general representations of domain knowledg", semantic networks (qv) and production systems (see Rule-based systems), respectively. The former were particularly good for representing conceptsand their relations in a declarative fashion, whereas the latter served to expressthe facts neededto solve a problem in terms of modular, easy to manipulate chunks of knowledge. Other representations, such as frames (seeFrame theory and scripts
252
DOMAINKNOWLEDGE
(qv), were developedto describe more complex, structured relations among concepts,objects,and events in a domain' Consultationprograms It was research on expert reasoning in consultation programs such as MYCIN (3), CASNET (4), INTERNIST (5),PIP (6), and pROSpECTOR (7) that led to the clear division between domain knowledge and domain-independent reasoning procedures. In trying to represent human expertise in ways that could be practically reproduced on the computer, their developers naturally tended to abstract those methods and heuristics of symbolic reasoning that were shared by a broad classof real-life problems. They used various declarative representations for domain knowledge (such as special types of semantic nets for encoding causal and hierarchical relations, frames for grouping these relations, and production rules for representing the rules of expertise). General strategies of symbolic rearoiing (whether goal driven, event driven, or hypothesis driven) were found to be applicable for wide varieties of diagnostic, therapy selection, and advice-giving interpretation probleffis, giving rise to the notion that the knowledge base iould contain the domain-specificknowledge, whereas a separate inference engine could serve to capture the domain-independent strategies. At about the same time research on speech understanding (q"l [the HEARSAY system (8)] showed that domain knowLdge is often naturally grouped accordingto distinct sourcesor ievels of understanding, such as those referring to the signal, syntax, semantics,and pragmatics involved in recognizing a segment of speech.This gave rise to the notion of grouping knowledge in terms of different knowledge sources (whictr can contain groups of rules and networks of relations among concepts used by these rules). They communicate through a blackboard, which servesas the short-term memory for storing partial interpretations. First-GenerationExPertSYstems From the experience with the first generation of expert systems rule-based representations of domain knowledge proved to be the most versatile and effective way of directly encoding expertise for problem solving. Rules can capture inferences about hypothesesfrom patterns of evidence,relations among goals u"a subgoals,or inferences about hypotheses(seeTable i). ht some situations it is valuable to build a discrimination net among hypothesesto explicitly map out the flow of reasoning (as in pROSPECTOR),but it is more usual to let the inference engine select the production rules accordingto its strategies. Several general schemes for representing rule-based knowledge were developedas the result of experiencewith the first generation of expert systems:EMYCIN (9), EXPERT (10)' KAJ (11), and ROSIE if,z). Other general schemes implemented the blackboard type of representation: HEARSAY-III (1g) and AGE (14). They all provide the user with a way of encoding domain knowledge in the form of rules that are then interpreted by an inference engine with a fixed, though general repertoire of reasoning strategies. These systems can be viewed as knowledge engineering tool kits. and Environments Languages A different approach was taken by developers of rule-based languages rn.n as oPs (15) and RLL (16), which permit the
Table l. Examples Contrasting Different Degrees of Domain Dependency in Knowledge within a Rule'Based System 1. Example of domain knowledge for thyroid diseasediagnosis: a. Define primitive reasoning components:Rapid heart beat is a symptom; fast finger tremor is a symptom' b. Define diagnostic statements: Hyperthyroidism is a diagnosis; hypothyroidism is a diagnosis. c. Define reasoning rules relating them: If rapid heart beat is observed in a patient' suspectthe possibility of hyperthyroidism with a confidenceof 0'5' If both rapid heart beat and fine finger tremor are observedin a patient, suspectthe possibility of hypothyroidism with a confidenceof 0.? and proceedto ask for lab tests. This information is only valid for problems involving thyroid disease. 2. Example of a diagnostic heuristic (only partially domain-dependent): If no diagnosis has been assigneda level of confidence sufficient for treatment to be prescribed, and no life-threatening situation is present, continue accumulating data before reaching a diagnosis' This heuristic is reasonablefor any medical domain, but might not be applicable for problems of diagnosing machine failures. 3. Example of a completely domain-independentrule: If two diagnostic rules confirm the same conclusionwith different degrees of confi.dence,a conservative strategy is to ascribe to the diagnosis the lower degree of confidence. This heuristic is applicable to any inference problem of the diagnostic type regardless of domain.
user to specify reasoning strategies as part of the domain knowledge.The control of reasoning in these languagesis very general (the recognize-act cycle in oPs is a simple loop in *fti.ft rules with satisfied antecedents are detected, certain criteria are used to select one of them, and the actions specified in its consequentare executed).The user here has the burden of specifying the structure of goals and methods required to solvl a problem rather than being able to choosefrom a more fixed t"pt.rentation. This is necessaryif the class of problems being solved doesnot fit one of the more traditional categories, such as diagnostic classification or advice giving, for which the expert systemsrepresentationswere mostly devised.A related rppto.rh ir to use logic programming (qv) languages such as pnOlOG (17) to represent domain knowledge in the form of clauses and rules for expert reasoning. The inference engine then consistsof a domain-independenttheorem prover specialized to certain kinds of clauses and inference proceduresfor efficiency. It has the advant age of making reasoning models more easy to check for logical consistency but at the cost of restrictions on the class of inference procedures.This method has been adoptedby the JapaneseFifth Generation Computer project (seefiftfr-generation computing) to developspecialized hardware that will carry out symbolic reasoning tasks very rapidly. In pRoLoG, as in ops and other language systems, domain-specificreasoning strategies must be implemented by the user. For more advanced knowledge-basedexpert systems the representational power of rules can be usefully augmented by inlroducing ways of explicitly describing the objectsand predi-
DOT.PATTERNANALYSIS
253
cates that enter into the antecedents and consequents of a oping Consultation Models, Proc. of the Sixth IJCAI, Tokyo, Jap&h, pp. 942-950, 1979. rule. There are also situations where proceduresare a more natural representation than rules, and it is important to pro- 11. R. Reboh, Knowledge Engineering Techniquesand Tools in the PROSPECTOR Enuironment, SRI Technical Note No. 243, SRI, vide the means of integrating the two. Graphical display of Menlo Park, CA, 1980. knowledge structures is also essential in managing large and complex knowledge bases. Several object-oriented languages t2. J. Fain, F. Hayes-Roth,H. Sowtrzal, and D. Waterman, Program(qv) and environments (seeProgramming environments) that ming in ROSIE: An Introduction by Means of Examples, RAND Technical Report N-1647-ARPA, Rand Corp., Santa Monica, CA, incorporate these features have been developedto facilitate 1982. the building of knowledge bases [e.9. LOOPS (1S) and STROBE (19)1.These are primarily tool kits for representing 13. L. D. Erman, P. E. London, and S. F. Fickas, The Design and an Example Use of HEARSAY-III, Proc. of the SeuenthIJCAI, Vanand manipulating knowledge. Special expert system building couver,British Columbia, pp. 409-415, 1981. packagesare sometimesbuilt using many of the same compoL4. H. P. Nii and N. Aiello, AGE(Attempt to Generalize):A Knowlnents [e.g.,the KEE system (20)]. Alternatively, a first-order Program for Building Knowledge-basedPrograms, edge-Based resolution theorem prover can be used for reasoning with a Proc. of the Sixth IJCAI, Tokyo, Japan, 645-655, 1979. frame-baseddescription language, as in KRYPTON (21). This 15. C. Forgy and J. McDermott, OPS: A Domain-IndependentProducapproach combines logical clarity with descriptive power, altion System Language, Proc. of the Fifth IJCA,I, Cambridge, MA, lowing a more precise definition of the semanticsof the knowlpp. 933-939, 1977. edge, so that the user can know what questions the system is 16. R. Greiner and D. Lenat, A RepresentationLanguage Language, capable of answering. AII these hybrid reasoning systemsadd Proc. of the First AAAI, Stanford, CA, 165-169, 1980. an extra degree of flexibility and complexity in helping strucL7. A. Colmeraurer,H. Kanoui, and M. Van Caneghem,Prolog,Theoture domain knowledge since they usually come with built-in retical Principles and Current Trends, in Technologyand Science mechanisms for the inheritance of properties for hierarchiof Informatics, Vol. 2, No. 4, North Oxford Academic, 1983. cally related objectsas well as facilities for defining classesof 18. M. J. Stefik, D. G. Bobrow,S. Mittal, and L. Conway,"Knowledge objects and the reasoning elements that relate them. In so programming in LOOPS,"AI Mag. 4(3), 4L-54 (1983). doing, it often happens that domain knowledge is strongly 19. R. G. Smith, Strobe: Support for Structured Object Knowledge intertwined with domain-independentreasoning methods and Representation,Proc. of the Eighth IJCAI, Karlsruhe, FRG, pp. heuristics since there is no clear boundary between them. 855-859, 1993. Only as experience accumulates with large numbers of com- 20. T. P. Kehler and G. D. Clemenson, "KEE: The knowledge engiplex knowledge baseswill the generality of certain heuristics neering environment for industry," Sysf. Softwr. 34, 2L2-224 and reasoning schemasbe recognizedand the types of specific (1984). domain-dependentknowledge more clearly defined. 2L. R. J. Brachman, V. P. Gilbert, and H. J. Levesque,An Essential
BIBLIOGRAPHY
Hybrid Reasoning System: Knowledge and Symbol Level Accounts of KRYPTON, Proc. of the Ninth IJCAI, Los Angeles, CA, pp. 533-539, 1985.
C. Kur,rKowsKr 1. J. Moses,A MACSYMAPrimer,MathlabMemoNo. 2, Computer RutgersUniversity ScienceLaboratory,Massachusetts Institute of Technology, 1975. 2. B. G. Buchananand E. A. Feigenbaum, "DENDRALand MetaDENDRAL:Their applications dimension," J. Artif. Intell. 11,5- DOT.PATTERN ANALYSIS 24 (1978). 3. E. H. Shortliffe, Computer-Based MedicalConsultations: MYCIN, Visual perception involves making inferencesabout the threeAmericanElsevier,New York, 1976. dimensional world from images. Among the mechanisms for 4. S.M. Weiss,C. A. Kulikowski,S.Amarel,andA. Safir,"A modelmaking such inferences early in visual processingis the probased method for computer-aided medical decision-making," J. cess of perceptual grouping. The goal of grouping is to put Artif. Int. 11, 145-L72 (1978). items seen in the visual field together or to "org arruze"image 5. H. Pople, Heuristic Methods for Imposing Structure on Ill-Structured Problems: The Structuring of Medical Diagnostics, in P. data such that the detected image-organization captures Szolovits(ed.),Artift.cial Intelligence in Medicine, Boulder, CO, pp. three-dimensional sceneorganization or structure. The items, or tokens, grouped may be blobs, edge segments,or geometric 1 1 9 - 1 9 0( 1 9 8 2 ) . features of image regions. Grouping may reveal organization 6. S. G. Pauker, G. A. Gomy, J. P. Kaissirer, and W. B. Schwartz, "Towards the simulation of clinical cognition: Taking a present in the image at different scales, described in terms of such image entities as regions and curves. The rules for the organiillness by computer,"Am. J. Med. 60, 981-996 (1976). 7. R. Dud&, J .Gaschnig, and P. E. Hart, Model Design in the PROS- zation may be completely stated in terms of intrinsic properties of tokens being grouped and their image plane relationPECTOR Consultant System for Mineral Exploration, in D. Michie (ed.),Expert Systemsin the Micro-Electronic Age, Edinburgh ships. As an example of the relationship between image and University Press,Edinburgh, pp. 153-L67 , 1979. sceneorgani zations, consider an image that contains two par8. L. D. Erman, F. Hayes-Roth,and D. R. Reddy,"The HEARSAY-II allel lines. Then the correspondinglines in the three-dimenspeech understanding system: Integrating knowledge to resolve sional space must also be parallel unless the viewpoint is uncertainty," Comput. Suru. 2(I2), 2L3-253 (1980). carefully chosen and is unstable in the sense that a slight 9. W. Van Melle, A Domain-Independent Production-Rule System change in it will result in a drastically different image (e.g., for Consultation Programs, Proc. of the Sixth IJCAI, Tokyo, Jaone containing nonparallel lines or more than two lines). Since p&D,pp. 923-925,1979. the viewpoint can be assumedto be general for general image 10. S. M. Weiss and C. A. Kulikowski, EXPERT: A System for Devel- interpretation, a rule can be made that if two lines in the
254
DOT-PATTERNANALYSIS
image plane are parallel, they are also parallel in threedimensional space.This eliminates the processof first obtaining the three-dimensional description of the lines to discover that they are parallel in three-dimensional space, and it makes the detection of parallel structures in the image important. Gestalt psychologists undertook the first detailed study of the grouping phenomenon in the first part of this century and proposedcertain rules and criteria to explain the particular way the human visual system groups tokens together. These include proximity, similarity, the factor of common fate, prcignanz (fiWral goodnessor stability) , einstellung (the factor of objective set), good continuity, and closure. For any given stimulus, one or more of these rules might be at work in defining the perceived grouping. In the latter case,the rules might cooperate or compete (1). If there are conflicts among the t.roltr of applying different rules, they must be resolved. The Gestalt psychologistsraised such questions;however, they did not proposeany theory for the reason these rules worked and how general they were, nor did they give any reasonsfor the basic need for the grouping process at all. They did try to explain some of the grouping phenomena in terms of certain nelrological theories that were known at the time and drew parallels to such physical phenomena as electromagnetic helds. Such explanations,however, were at the level of possible implementation mechanisms in humans, and they did not deal with the reasonsfor the need for grouping as a functional entity in percePtion.
Groupingin Dot Patterns
To detect perceptual organization in images, certain structural componentsor tokens must first be detectedto serve as primitives of organization. These tokens are assignedproperties such as position, shape,size,orientation, color,brightness, and the termination (end) points. The roles of some of these properties in grouping are easier to understand than others. Considering the complexity of the interaction of these properties, the first step toward understanding the grouping phenomenon may be to understand the impact of some of the relatively simple properties on the resulting groupings. One way of accomplishing this is to eliminate all but one property at a time and study the effects of that one property on the perception of the stimulus. Dot patterns provide a means for studying the effect of token positions on their grouping. With dots as tokens, the role of nonpositional properties is minimtzed since dots are without size, orientation, color, and shape. Further, dot patterns can be constructed artificially, thus enabling one to have very fine control over the spatial properties of the stimulus. This is helpful both in the psychological experiments for studying the responseof the human visual system and also for the generation of closely controlled data to feed a computational model. Of course,generating dot patterns implies direct availability of the stimulus without the need of extracting it from images through stepsof early visual processing such as edge detection (qv), blob detection, and so on. The initial grouping of dots based only on their positions may be called the lowest level grouping. The perceptual segments defined by the lowest level gfouping may further group RecentWork in PerceptualOrganization hierarchically to give groupings at different levels. The higher Not much happenedin grouping for a long time after the origilevels represent the organi zation at larger scales.The tokens nal insights of the Gestalt psychologists.Recently, however, at higher levels have spatial extent and henceproperties such the processof extracting structural information from the vias orientation, shape,and size.These groups act as tokens to sual stimulus has started getting fresh attention. In the psy- be further grouped. SeeFigure 1 for an example. Therefore,by cholory community Rock (2) aryues the inferential nature of working with dot patterns and examining the hierarchy of perception. He shows that most of the phenomena in perceplion such as organization, figure-gToundperception, and form perception are the result of a processof inference in which the solution picked by the human perceptual system is the one that is not explained by accidental alignments or accidental viewpoints. Similar ideas are proposedby Witkin and Tenenbaum in a recent paper (3). They regard the processof extracting perceptual structure as "a sourceof semantic precursors." They state that the perceptual structures detected by the human visual system capture the underlying causalrelationships among the tokens in the image without the benefit of the semantic information, and these structures turn out to be meaningful. Another version of the same ideas is given in Lowe's work (4). He examines the use of perceptual grouping to construct three-dimensional models. Marr mentions the use of grouping to obtain the full primal sketch from the raw primal ,t"t.ft bv finding perceptual structures such as collinearity, clusterirg, and so or, among the tokens in the raw primal sketch (5). Zucker has examined the extraction of orientation information by the human visual system (6,7). He suggests that there ars two fundamentally different processesat work in the extraction of orientation selection: type I and type II processes.Type I processesare for detecting boundariesof surblobs and 1u.", that are well defined and specific.Type II processesare Figure 1. Dot pattern with groupings of dots perceivedas group hierarchically further groupings level lowest textures; as ."rr"tr The such for obtaining surface-specificinformation "or.r"r. a circular structure. to define they are two-dimensional in nature' a a a a t a '
a a
o 1 ' o
a
DOT.PATTERN ANATYSIS groupings possible, one can study not only the effect of the positional properties of tokens on gtouping but also the effect of the other properties. Work in Psychology Researchin experimental psycholory has been concernedwith the perception of structure in both static and moving dot patterns including detection of dotted lines in noisy background, perception of bilateral symmetry, and the perception of flow patterns. tJttal et al. studied the detection of dotted lines in a noisy background (8) and found that the detection suffered as the dot spacing along the line increased.Recently, Vistnes (9) obtained similar results. Vistnes also found that as the jaggedness of a dotted line or curve segment increased, it became harder to detect. The detection of bilateral symmetry in random dot patterns has been studied by Jenkins (10) and Barlow and Reeves(11). Both studies found that only afraction of the statistical information available in the stimulus is used in the detection of symmetry. Jenkins found that the symmetry information utilized by the human visual system fell within a strip about one degreewide around the central axis of symmetry. Barlow and Reevesfound that only about 257oof the available statistical information is used in symmetry detection and the orientation of the symmetry axis is not important in the perception of symmetry. Glass (12) has studied the perception of Moire patterns. Moire patterns are obtained by super-imposinga transformed (e.g., dilated, rotated, etc.) version of a random dot pattern onto the original pattern. Glass and Perez (13) have observed that if only a small region is seenin such patterns, the correlation of dots disappears.Glass and Switkes (14) have found that there are limits to the amount of transformation beyondwhich the perception of the Moire effect disappears. Borjesson and Hofsten (15,16) have studied moving two- and three-dot patterns and have identified the properties of the motion that gives rise to perception of depth. Such psychologicalexperiments as describedin this section provide important data about the behavior of human visual processing. However, there has been a distinct lack of attempts by the experimenters to explain the observations.The work in computational vision aims at developing computational models of visual processing. Work in Clustering In computational vision, grouping has been an implicit part of the various efforts, and it is only in recent years that the grouping processeshave been investigated as a separateissue. The bulk of the previous research in grouping has been in the field of clustering. This section briefly reviews the work on clustering. Given a set of points, P, clustering is the partition of P into "natural" subsets or classes that maximizes the similarity among members of the same subset as well as dissimilarity across classes(L7). These points act as tokens to be grouped, and the resulting clusters contain groupings of tokens. The tokens usually have vector attributes and are viewed as points in a multidimensional feature space. The issue here is not perceptual organizatton of dots in two-dimensional dot patterns. The spatial nature of the representation is merely an artifact with no direct perceptual relevance. The successof
255
clustering is determined by the power of the measures that define the homogeneity over a cluster. On the other hand, perceptual organizatron addressesthe partitioning or clustering of dots in the original planar or sometimesthree- or fourdimensional spaceof the visual stimulus, and the measuresfor partitioning that lead to perceptually significant clusters. Despite this fact, however, the work done in clustering is very relevant to certain problems in vision. To define specific approachesto clustering, several issues must be addressed.First, the idea of a "similarity" measure within a cluster must be defined in order for the clustering algorithms to work. This usually depends on the particular application and what is considereda natural partition in that particular domain. Second,since the similarity measure can only be basedon relative positions of these points, the concepts of "neighbors" and "neighborhood" of a point becomecrucial in the definition of the similarity measure. Third, the algorithm that uses this information to actually perform clustering is also very important. The concept of the neighbors of a dot has been defined in many ways. Going from simple to complex,the different definitions include a circular neighborhood and the dots that fall into this neighborhood (18), &-nearest neighbors (19,20), O'Callaghan's definition, which, in addition to distances of points, also includes angles betweenpoints and whether a dot is hidden from another dot (2L), the minimum spanning tree used by Zahn (22), in which the two dots are neighbors if they are connectedby an edgein the minimum spanning tree of the points, the relative neighborhoodgraph and the Gabriel graph used by Urquhart (23) and Toussaint (24), and finally the Voronoi tessellation and its dual, the Delaunay graph, discussedby Ahuja (25). Although the first two definitions have been used as ad hoc definitions of neighbors for clustering, the remaining graph-based definitions are motivated by perceptual considerations, especially the last definition in terms of the Voronoi tessellation. A sound approach to extracting global, perceptual organization must have a sound definition of local structure, namely, d perceptually significant notion of neighbors. A detailed discussionof the advantagesand disadvantages of the various notions of neighbors mentioned above can be found in Ref. 25. Given a definition of neighbor, the clustering algorithms perform partitioning using two criteria: a) a measure of similarity indicating if given tokens belong to a single cluster and b) a criterion to decide when a given clustering is a goodfit to the given data. The different measures of similarity used in these algorithms may be based on the distance between dots, or they may be defined as the inner products of feature vectors associatedwith dots, depending on what is appropriate for a given domain. The different criterion functions for deciding when a particular partition is a goodfit to data include sum of squared errors, minimum variance criteria, different scattering matrices (within-cluster, between-clusterscatter matrices) and various scalar measures computed from them. A clustering algorithm typically performs somesort of iterative optimization on the set of data using the above-mentionedcriteria. A review of such clustering criteria and techniques can be found in Ref. L7. Other clustering techniques do not use the standard optimization procedures.Two major classesof such algorithms consist of the hierarchical clustering algorithms and graph-theoretic clustering algorithms. The hierarchical algorithms are
256
DOT.PATTERNANATYSIS
usually implemented in one of two ways: agglomerative algorithms, which start with the individual samples as singleton sets and combine the clusters recursively to get larger sets, which, if repeated indefinitely, results in one cluster, and divisive algorithms, which start with the entire sample set as one cluster and successivelydivide each cluster into smaller clusters, which, if repeated indefinitely, results in each sample point being put in a separate cluster. Of course,the recursive splitting or merging may stop at any stage when a "stable" clustering has been achieved. Graph-theoretic algorithms start with a certain graph structure defined on the data set and eliminate certain of the edges, thus splitting the set of points into subsets.fn this sensethe graph-theoretic clustering algorithms are similar to the divisive hierarchical clustering algorithms. Examples of the applications of these can be seen in Refs. 22 and 23. Zucker and Hummel (20) describean approach to the perceptual segmentation of a dot pattern by identifying the different roles a dot can play in a segment, namely, whether it lies on the border or in the interior. Perception of shapesof contours in dot patterns and perception of subparts of figures resulting from such contours has been studied by Fairfield (26). He proposesthe use of the Blum transform (27) and a fuzzy measure based on the angle ranges between the extreme points of a segment of the Blum transform. When thresholded at different levels, this measure results in the generation of various perceptual contours for the dot pattern that are closely related to the human perception of subparts in such a figure. A perceptually significant definition of local organi zation is captured in the definition of the Voronoi neighborhood of a dot proposedby Ahuja (25), which can be also used to infer global perceptual structure.
digm into the millisecond domain," Percep.Psychophys.8' 385388 (1e70). 9 . R. Vistnes, Detecting Structure in Random-DotPatterns,Proceedings of DARPA Workshop on Image Understanding, December 1985,pp. 350-362. 10. B. Jenkins, "Redundancy in the perception of bilateral symmetry in dot textures," Percep.Psychophys.32, I7L-L77 (1982). 1 1 . H. B. Barlow and B. C. Reeves,"The versatility and absolute efficiency of detecting mirror symmetry in random dot displays." Vis. Res.19, 783-793 (1979). t2. L. Glass, "Moire effect from random dots," Nature 2231578-580 (1969). 1 3 . L. Glass and R. Perez, "Perception of random dot interference patterns," Nature 246, 360-362 (1973)'*Pattern recognition in humans: CorrelaL4. L. Glass and E. Switkes, tions which cannot be perceived."Perception5, 67-72 (1976).
15. E. Borjesson and C. von Hofsten, "spatial determinants of depth perception in two-dot motion patterns," Percep.and Psychophys. 1 1 , 2 6 3 - 2 6 8Q 9 7 2 ) . 16. E. Bo{esson and C. von Hofsten, "Visual perceptionof motion in depth: Application of a vector model to three-dot motion patterrs," Percep.Psychophys.13, 169-I79 (1973). L7. R. O. Duda and P. E. Hart, Pattern Classificationand SceneAnal' ysrs, Wiley, New York, 197318. E. A. Patrick and L. Shen, "Interactive use of problem knowledge for clustering and decision making." IEEE Trans. Comput. C'20, 2L6-222 (February 1971). 19. F. R. Dias Velasco, "A method for the analysis of Gaussian-like clusters,"Patt. Recog.12, 381-393 (1980). ZA. S. W. Zucker and R. A. Hummel, "Toward a low-level description of dot clusters: Labeling edge,interior and noise points," Comput. Graph. Im. Proc. 9,2L3-233 (1979). 2L J. F. O'Callaghan,"An alternative definition for'neighborhoodof a point'," IEEE Trans. Comput. C'24, 1121-1125 (1975). BIBLIOGRAPHY 22. C. T. Zahn, "Graph theor:eticalmethods for detecting and describing Gestalt clusters," IEEE Trans. Comput. C-zO,68-86 (1971). zur LehrevonderGestalt,in W. 1. M. Wertheimer,Untersuchungen 23. R. B. Urquhart, "Graph theoretical clustering based on limited Harcourt, D. Ellis (ed.),A SourceBook of GestaltPsychology, neighborhoodsets,"Patt. Recog.15, 173-187 (1982).
Brace,New York, 1938. 24. G. T. Toussaint, "The relative neighborhoodgraph of a finite plaMIT Press,Cambridg.,MA, 1983. 2. I. Rock,TheLogicofPerception, nar set," Patt. Recog. L2r 26I-268 (1980). 3. A. P. Witkin and J. M. Tenenbaum,On the Roleof Structurein 25. N. Ahuja, "Dot pattern processingusing Voronoi neighborhoods," 4. 5. 6.
7.
8.
Vision, in A. Rosenfeld (ed.),Human and Machine Vision, Aca' demic Press,New York, 1983. D. G. Lowe, Perceptual Organization and Visual Recognition, Ph.D. Thesis, Stanford University, September 1984. D. Marr, Visior, W. H. Freeman, San Francisco,CA, 1982. S. W. Zucker, Early Orientation Selection and Grouping: Type I and Type II Processes,McGill University Technical Report 82-6, Montreal, Quebec,1982. S. W. Zucker, "Early orientation selection:Tangent fields and the dimensionality of their support," Comput. Vis. Graph. Im. Proc. 32, 7 4-103 (1985). W. R. Uttal, L. M. Bunnell, and S. Corwin, "On the detectabilityof straight lines in the visual noise: An extension of French's para-
IEEE Trans. Patt. Anal. Mach. Intell.4, 336-343 (May 1982). 26. J. Fairfi.eld, Contoured Shape Generation: Forms that PeopleSee in Dot Patterns, ProceedingsIEEE Conferenceon Systems,Man, and Cybernetics,Denver, CO, pp. 60-64, 1979. 27. H. Blum, "Biological shapeand visual science(Part I)," J.Theoret. Biol. 38, 205-287 (1973). N. Anu,la and M. TUcERYAN University of Illinois
This work was supportedby the Air Force Office of Scientific Research under Contract AFOSR 82-03L7.
EDCEDETECTION
in the scene. Smoothing of the intensities can remove these minor fluctuations due to noise. Figure 2a showsa one-dimenFor both biological systems and machines,vision (qv) begins sional intensity profile that is shown smoothed by a small with a large and unwieldy array of measurements of the amount in Figure 2b. Small variations of intensity, due in part amount of light reflected from surfaces in the environment. to noise in the digitizing camera, do not appear in the The goal of vision is to recover physical properties of objectsin smoothedintensities. Approximation of the intensity function the scene, such as the location of object boundaries and the by a smooth analytic function can serve the samepurposeas a structure, color, and texture of object surfaces,from the two- smoothing operation. dimensional image that is projected onto the eye or camera. Significant changesin the image can also occur at multiple This goal is not achieved in a single step; vision proceedsin resolutions. Consider, for example, a leopard's coat. At a fine stages, with each stage producing increasingly more useful resolution rapid fluctuations of intensity might delineate the descriptions of the image and then the scene.The first clues individual hairs of the coat, whereas at a coarser resolution about the physical properties of the sceneare provided by the the intensity changesmight delineate only the leopard'sspots. changesof intensity in the image. For example,in Figure 1 the Changes at different resolutions can often be detected by boundaries of the sculpture, the markings and bright high- smoothing the image intensities by different amounts. Figure lights on its surface, and the shadowsthat the trees cast on the 2c illustrates a more extensive smoothing of the intensity prosnow all give rise to spatial changes in light intensity. The file of Figure 2a, which preserves only the gross changes of geometric structure, sharpness,and contrast of these intensity intensity. The differentiation operation accentuates intensity changes convey information about the physical edges in the scene.The importance of intensity changesand edgesin early changesand transforms the image into a representation from visual processinghas led to extensive research on their detec- which properties of these changes can be extracted more eastion, description, and use, both in computer and biological vi- ily. A significant intensity change gives rise to a peak in the first derivative or a zero crossing in the secondderivative of sion systems. The processof edgedetection can be divided into two stages: the smoothed intensities, &s illustrated in Figures 2 d and e, First, intensity changes in the image are detected and de- respectively. These peaks, or zero crossings, can be detected scribed; second,physical properties of edges in the sceneare straightforwardly, and properties such as the position, sharpinferred from this image description. The first section of this ness, and height of the peaks capture the location, sharpness, entry concentrates on the first stage, about which more is and contrast of the intensity changesin the image. The detecknown at this time. The last section briefly describes some tion and description of these features in the smoothed and areas of vision research that addressthe secondstage. Someof differentiated image provides a compact representation that these areas are discussedfurther in other entries of this ency- captures meaningful information in the image. Marr (1) called clopedia (see,e.9., Feature extraction, Sceneanalysis, Stereo this representation the "primal sketch" of the image. Later vision, Texture analysis, Motion analysis, and Optical flow). processes,such as binocular stereo,motion measurement,and This entry mainly reviews some of the theory that underlies texture analysis, whose goal is to recover the physical properthe detection of edgesand the methods used to carry out this ties of the scene,may then operate directly on this description analysis. There is also some reference to studies of early pro- of image features. cessingin biological vision systems. One-DimensionalDetectionof IntensityChanges.The theory that underlies the detection of intensity changes in two-diThe Detectionof lntensityChanges mensional images is based heavily on the analysis of oneThe most commonly used methods for detecting intensity dimensional signals. This section discussesthree topics that changesincorporate three essential operations. First, the im- have been addressedin this analysis: the design of optimal age intensities are either smoothedor approximated locally by operators for performing smoothing and differentiation, the a smooth analytic function. Second,the smoothed intensities information content of the description of signal features such are differentiated, using either a first- or second-derivative as zero crossings,and the relationship between features that operation. Third, simple features in the result of this differen- are detected at multiple resolutions. Studies of these issues tiation stage, such as peaks (positive and negative extrema) or have used a variety of theoretical approachesthat appear to zero crossings (transitions between positive and negative val- yield similar conclusions. ues), are detected and described. This section first describes Some of the early methods for detecting intensity changes briefly the role of these operations in the detection of intensity incorporated only limited smoothing of the intensities and perchangesand then presents in more detail someof the methods formed the differentiation by taking first or seconddifferences used to carry out these operations. between neighboring image elements (examplesof this early The smoothing operation serves two purposes.First, it re- work can be found in Ref. 2-8). In one dimension this is equivducesthe effect of noise on the detection of intensity changes. alent to performing a convolution of the intensity profile with Second, it sets the resolution or scale at which intensity operators of the type shown on the left in Figures 3 b and c. changes are detected. The sampling and transduction of light Additional smoothing can be performed by increasing the spaby the eye or camera introduces spurious changesof light in- tial extent of these operators. tensity that do not correspondto significant physical changes The operators in Figures 3b and c contain steplike changes. 257
258
EDGE DETECTION
tr'igure 1. A natural image,exhibitingintensitychangesdueto many physicalfactors.
Other studies have employed Gaussian smoothing of the image intensities (e.g.,Refs.9-13). Combinedwith the first- and second-derivativeoperations, Gaussian smoothing yields convolution operators of the type shown in Figures 3d and e. Several arguments have been put forth in support of the use of Gaussian smoothing' Marr and Hildreth (1r,12) argued that the smoothing function should have both limited support in spaceand limited bandwidth in frequency. In general terms, a limited support in space is important because the physical edges to be detected are spatially localized. A limited bandwidth in frequency provides a means of restricting the range of scales over which intensity changes are detected, which is sometimes important in applications of edge detection. The Gaussian function minimizes the product of bandwidths in spaceand frequency. The use of smoothing functions that do not have limited bandwidths in spaceand frequency can sometimes lead to poorer performance,reflected in a greater sensitivity to noise, the false detection of edgesthat do not exist, or a poor ability to localizethe position of edges(see,e.9.,Refs.11 and 14). Shanmugam, Dickey, and Green (15) derived an optimal frequency domain filter for detecting intensity changesusing the criteria that the filter yields maximum energy in the vicinity of an edge in the image, has limited frqquency bandwidth, yields a small output when the input is constant or slowly varying and is an even function in space.For the special case of detectirtg step changes of intensity, the optimal frequency domain filter correspondsto a spatial operator that is approximately the secondderivative of a Gaussian (for a given bandwidth) shown in Figure 3e. In a later study Canny (L4) used the following criteria to derive an optimal operator: good detection ability, that is, there should be low probabilities of failing to detect real edges and falsely detecting edgesthat do not exist; good localization ability, that is, the position of the detected edge should be as close as possible to the true position of the edge; and uniqueness of detection, that is, a given edge should be detectedonly once.The first two criteria are related by an uncertainty principle; as detection ability increases, localization ability de-
creases, and vice versa. The analysis also assumed that extrema iri the output of the operator indicate the presenceof an edge.For the particular casein which an "edge" is defined as a step change of intensity, the operator that optimally satisfies these criteria is a linear combination of four exponentials, which can be approximated closely by the first derivative of a Gaussian shown in Figure 3d. Poggio,Voorhees,and Yuille (16) and Torre and Poggio(17) derived an optimal smoothing operator using the tools of regulafization theory from mathematical physics. They began with the obseryation that numerical differentiation of the image is a mathematically ill-posed problem (18) becauseits solution does not depend continuously on the input intensities (this is equivalent to saying that the solution is not robust against noise). The smoothing operation serves to regulartze the imdga, making the differentiation operation mathematically well posed. In the case where the image intensities are assumed to contain noise, the following method was used to regularize the image. First, let /(r) denote the continuous intensity function, which is sampled at a set of discrete locatiors trp, 1 < k - N, and let S(r) denote the smoothedintensity function to be computed. It was assumed that S (r) should both fit the sampled intensities as closely as possible and be as smooth as possible. Using the tools of regularization theory, this was formulated as the computation of the function S (r) that minimizes the following expression: n n=L
ll(x) - S(x)12 +
^ /ls"(r)lz dn
The first term measures how well S (r) fits the sampled intensities, and the second term measures the smoothnessof S (r). The constant ), controls the trade-off between these two measures. Poggio, Voorhees, and Yuille showed that the solution to this minimization problem is equivalent to the convolution of the image intensities with a cubic spline that is very similar to the Gaussian. Torre and Poggio (17) further expanded on the theoretical properties of a broad range of smoothing filters from the perspective of regularizing the image intensities for differentiation. Another approach to the smoothing stage is to find an analytic function that best models or approximatesthe local intensity pattern. An early representative of this approach was the Hueckel operator (5,7). Surface-fitting methods used a variety of basis functions to perform the approximation, including planar functions (19) and quadratic functions (20). More recently Haralirk (2L,22) used the discrete Chebyshev polynomials to approximate the image intensities. In these methods a differentiation operation is then performed analytically on the polynomial approximation of the intensity function. The method of approximation used by Haralick (2L,22)is roughly equivalent to smoothing the image by convolution with spatial operators such as those derived by Canny (L4) and Poggio,Voorhees,and Yuille (16). A rigorous comparisonbetween the performanceof surface-fitting versus direct smoothing methods has not yet been made. A secondissue that bears on the choice of operator for the smoothing and differentiation stages is the information content of the subsequent description of image features. That is, to what extent does a representation of only the significant changesof intensity capture all of the important information in an image? This question led to a number of theoretical studies of the reconstruction of a signal from features such as
2r5, .: ?
b)
(l) +-) c
r20
140
255
Position
427.4956
=a c
-o (b) E -g
o o
E (t 37.1095
r20
140
180
220
255
Position
282.8854
.= a c o c
@;
o -c o o
E
a
r20
140
160
180
Position
F00
220
82.39872 o +. (g
(d)
=
(l) E
180
a lb
i;
-95.82708
33.58935 o
.: {-, \(g
(e)
€ ]J c
o c) o a
-46.22224 Figure 2. Detecting intensity changes. (o) One-dimensional intensity profile; the intensities along a horizontal scan line in an image are represented as a graph. (D) The result of smoothing the profile in (o). (c) The result of additional smoothing of (a). (d, e) The first and second derivatives, respectively, of the smoothed profile shown in (c). The vertical dashed lines indicate the peaks in the first derivative and zero crossings in the secondderivative that correspondto two significant intensity changes. 259
148
E a
(a)
o c
80
100 r20 140 160 180 200 220 240 Position
260 280 300 320
426.0
=a c o
(b) !
20
q)
40
60
100 r20 140 1 6 0
340
260
180
Eo
204 220
280
240
360 380
320
Position
tq)
i.r-
619.0
= a
r20
o
+J
@.s0
20
E o o
\-
60
100
:
:
140
i!
-590.0
,rll
240
200 220
,:o
200
:i-1,
\J
360 320 2?0 280 300 340 v'
75.03277
= a
c c)
+) c
(d) ;
20
\o o
i!
100
'?o140
260 280 300
V
-67.78L944
16.898996
.= a
(l)
rc) .s l r + J
E c) L.
o
i!
300
340
':o 380
EDGE DETECTION
261
over a continuum of scalesdetermines the signal uniquely, up to a multiplicative constant and an additional harmonic function. The proof provides a method for reconstructing a signal I(x) from knowledge of how the zero crossings of /(r)xG"(x) change acrossscales.The use of Gaussian smoothing is critical to the completenessof the subsequentfeature representation, but the basic theorem applies to zerocrossings and level crossings of the result of applying any linear differential operator to the Gaussian-filtered signal. Yuille and Poggio also derived a two-dimensional extension to this result. Careful observation of the contours in the scalespacerepresentation of Figure 4c reveals that the contours either begin at the smallest scale and continue as a single, isolated contour through larger scales(Fig. Ad,A ) or they form closed,inverted bowllike shapes (Fig. 4d, B). Additional zero crossings are never created as scale increases;that is, there are no contours in the scale spacerepresentation of the type shown in Figure 4d (C and D).This observationhas been supportedby a number of theoretical studies (26-28), which have also shown that the Gaussian function is the only smoothing function that yields this behavior of subsequent features acrossscale. This observation applies to zero crossingsand level crossingsof the result of applying any linear differential operator to the Gaussian-smoothed signal. This behavior of features across scale has been exploited successfully in the qualitative analysis of one-dimensionalsignals (25). To summarize, the analysis of one-dimensionalsignals has been important for developing a solid theoretical foundation on which to basemethods for detecting intensity changesin an image. Several theoretical studies attempted to derive an optimal operator for detecting intensity changesusing a variety of 1 criteria for evaluating the performance of the operator. All of G(x) : :r-'2i2o2 these operators essentially perform a smoothing and differentiation of the image intensities. Furthermore, the one-dimengiven is function then The secondderivative of the Gaussian all point to operators whose spatial shape is analyses sional by the expression roughly the first or secondderivative of a Gaussian function. - 1)u-,2no2 Mathematical studies also addressedthe information content : G,,(x) * e W: of representations of image features and the behavior of these features across multiple scales. These latter studies also (x) with is I convolved Supposethat a one-dimensional signal G" (x) for a continuous range of standard deviations a and the stressedthe importance of Gaussian smoothing. (It should be positions of the zeto crossings are marked for each suzeor noted again that someedgedetection methods that perform an scale. Figure 4 shows an intensity profile (Fig. 4a) that is analytic approximation of the intensity function may be equivconvolved with a G" (x) function with large a (Fig. 4li,. The alent to those performing a direct smoothing operation with a positions of the zerocrossingsare marked with heavy dots. In Gaussian function.) Interestingly, the initial filters in the huthe scalespacerepresentation of Figure 4c the vertical dimen- man visual system also appear to perform a spatial convolusion represents the value of o and the horizontal dimension tion of the image with a function that is closely approximated representsposition in the signal. For each value of a the posi- by the second derivative of a Gaussian (29). It is also well tions of the zero crossings of I (x)xG"(x) are plotted as points known that the human visual system initially analyzes the along a horizontal line in this diagram. For example, points retinal image through a number of spatial filters that differ in along the dashed line at a _ crt indicate the positions of the the amount of smoothing that is performed in space and in zerocrossingsof the signal in Figure 4b. The scalespacerepre- time (29). sentation of zero crossings illustrates the behavior of these Two-DimensionalDetectionof IntensityChanges.The probfeatures across scales.For small o the zero crossingscapture all of the changesin the original intensity function. At coarser lems addressedin the one-dimensionalanalysis of intensity scales (larger o) the positions of the zero crossings capture signals also arise for the detection of intensity changesin twodimensional images, although their solution is more complex. only the gross changes of intensity. The scale space representation is visually suggestive of a The design of optimal operators for performing the smoothing and differentiation stages, for example, is complicated by a fingerprint. In fact, in much the same way that a fingerprint larger selection of possible derivative operations that can be person, representation space the scale a identifies uniquely may uniquely identify an image. Yuille and Poggio (24) proved performed in two dimensions. Many of the mathematical that for almost all one-dimensional signals, the scale space results regarding the information content of image features map of the zero crossings of the signal convolved with G" (x) and behavior of features across scale have been extended to
its zero crossings.Although the goal of vision is not to reconstruct the visual image, these results are important because they suggestthat an image can be transformed into a compact representation of its features with little loss of information. An early study by Logan (29 that interested many vision researchers addressed the information content of the zero crossings of a sigRal. Logan proved that if a signal has a frequency bandwidth of less than one octave and no zeros in common with its Hilbert transforffi, the signal can be entirely reconstructed from the positions of its zero crossings up to a multiplicative constant. The secondcondition is almost always satisfied for physical signals. This result has also been extended to two dimensions (1). This analysis is interesting becauseit shows that the zerocrossingsof a signal are very rich in information. Its direct relevance to vision is limited, however, because the initial smoothing and differentiation of an image is typically performed by operators that are not oneoctave bandpass in frequency. Other studies have addressed the information content of features of signals that are more relevant to visual processing. For example, Yuille and Poggio Q4) proved some interesting results regarding the zero crossings (or, more generally, the level crossings)of an image that is convolvedwith the second derivative of a Gaussian over a continuous range of scales. (The level crossingsof a signal are the points at which a value u is crossedby the signal, where u may be nonzero.) Before stating the results, the scalespacerepresentation of zerocrossings used by Witkin (25), illustrated in Figure 4, is introduced. First, let the one-dimensionalGaussian function be defined as follows (where a is the standard deviation of the Gaussian):
r47 !o
(a)
6 c
28 r75
435 475 5 1 5 5 5 5 595 635 675 715 755 805 415 455 495 535 575 6 1 5 6 5 5 6 9 5 735 775 Position
195 235 275 3 1 5 3 5 5 2r5 255 295 335 375
1.609296 E a c o
(b) .*'= io q)
\o
0
+)
55 195 175
'395'435
235
2!5
iI
zss
295
335 375
: 3lq
415
-1.2219638
I I - ,
I I I I -
T I
Position 15 475
455 49
V
675 5 5 5 5e563q 695 6 1 5 65il 535 575
805 735
I I I I I
(c)
(d)
Figure 4. The scale-spacerepresentation. (o) An extended one-dimensionalintensity profiIe. (b) The result ofconvolving the proflle in (o) with a G "(r) operator with large o. The zero crossings are marked with heavy dots. (c) The scale space representation of the positions of the zero crossings over a continuous range of scales (sizes of o). The zero crossings of (6) are plotted along the dashed horizontal line at o : ot. (d) Contours of the type labeled A and B are commonly found in the scale spacerepresentation, whereas those ofthe type labeled C and D are never found.
EDGE DETECTION
two dimensions,but the algorithms for extracting and describing these features in the image are also more complex than their one-dimensionalcounterparts. This sectionreviews some of the techniques used to detect and describe intensity changes in two-dimensional images. Early work on edge detection primarily used directional first- and second-derivativeoperators for performing the twodimensional differentiation (Z-L},L9,20,30-32). A change of intensity that is extended along someorientation in the image gives rise to a peak in the first derivative of intensity taken in the direction perpendicular to the orientation of the intensity change,or azero crossing in the seconddirectional derivative. The simplest directional operators are formed by extending one-dimensional cross sections such as those shown in Figure 3 along some two-dimensional direction in the image. Directional operators have differed in the shape of their cross sections both perpendicular to and along their primary orientations. Macleod (9) and Marr and Poggio(10),for example,used directional derivatives that embodiedGaussian smoothittg. In principle, the computation of the derivatives in two directions,such as the horizontal and vertical directions,is sufficient to detect intensity changesat all orientations in the image.Severalalgorithms, however,use directional operatorsat a large number of discreteorientations (e.g.,seeRefs.4,'l ,8, 14, and 32). A given intensity changeis detectedby a number of directional operators in this case, and the output of the directional operator that yields the largest responseis typically used to describethe local intensity change.Two examples of algorithms of this type are those of Nevatia and Babu (32) and Canny (14). An example of the results of Canny's algorithm is shown in Figure 5. The contours of Figure 5b represent only the positions of the significant intensity changesin Figure 54. Other related differential operators that are used in two dimensionsare the first and secondderivatives in the direction of the gradient of intensity ( L4,I7 ,22).The intensity gradient, defined as follows:
263
the Laplacian yields the function V 2G given by the expression
v2G :
#(#
- r),-'2no2
where r denotes the distance from the center of the operator and a is the standard deviation of the two-dimensional Gaussian. The VzG function is shaped something like a Mexican hat in two dimensions. Figure 6 shows an example of the convolution of an image (Fig. 6a) with a V2G operator(Fig. 6b). The Laplacian is a nondirectional second-derivativeoperation; the elements in the output of the Laplacian that correspondto the location of intensity changes in the image are therefore the zero crossings. The zero-crossingcontours derived from Figure 6b are shown in Figure 6c. In this casethe zero-crossing contours were located by detecting the transitions between positive and negative values in the filtered image by scanning in the horizontal and vertical directions. (The design of robust methods for detecting zero crossingsremains an open area of
g.y\ Y v tz I : \ax' ay/ is a vector that indicatesthe direction and magnitude of steepest increase in the two-dimensional intensity function. Let n denote the unit vector in the direction of the gradient' The differential operatorsd/dn and Azlinzare nondirectionaloperators in the sensethat their value does not change when the image is rotated. They are also nonlinear operatorsand, untike the linear differential operators,cannot be combinedwith the smoothing function in a single filterittg step. Methods such as those of Nevatia and Babu 8D and Canny (14) essentially use the directional derivative along the gradient for extracting features. A secondnondirectional operator that is used for detecting intensity changes is the Laplacian operator V2 (1,5,1113,15,33): vo 2I r
-
dzf | 6x2
a2f
tr
Combined with a two-dimensional Gaussian smoothing function, 1 G(r) _
q €
cr-
* r 2no2
,/7
*(:f; (b) Figure 5. Canny's edge detection algorithm. (o) A natural image. (b) The positions of the intensity changes detected by Canny's algorithm. (Courtesy of J. F. Canny.)
264
EDGE DETECTION
Figure 6. Detecting intensity changeswith the V2G operator. (a) A natural image. (b) The result of convolving the image with a V2G operator. The most positive values are shown in white and the most negative values in black. (c) The zero crossings ofthe convolution output shown in (b).
research in edge detection.) A single convolution of the image with the nondirectional VzG operator allows the detection of intensity changesat all orientations for a given scale.The twodimensional orientation of a local portion of the zero-crossing contour can be computed from the gradient of the filtered image (12). It is not yet clear whether directional or nondirectional operators are most appropriate for detecting intensity changes.Both have advantagesand disadvantages.The use of the Laplacian is simpler and requires less computation than the use of either directional derivatives or derivatives in the direction of the gradient. The directional operators, however, yield somewhat better localization of the position of intensity changes(L4,22),particularly in areas where the orientation of an edge is changing rapidly in the image (34,35). Features such as the zero-crossing contours, when derived with nondirectional operators, generally form smooth, closed contours, whereas features obtained with directional operators generally do not have such special geometric properties (L7). Marr and Hildreth (11) showed that if the intensity function along the orientation of an intensity change varies at most linearly, the zero crossings of the Laplacian exactly coincide with the zero crossings of a directional operator taken in the direction perpendicular to the orientation of the intensity change. Torre and Poggio (17) characterized more formally the relationship between the zeros of the Laplacian and those of the second derivative in the direction of the gradient in terms of the geometry of the two-dimensional intensity surface. With regard to the use of directional versus nondirectional derivative operators, physiological studies reveal that the retina analyzesthe visupl image through a circularly symmetric filter whose spatial shape is given by the difference of two Gaussian functions (see,e.9., Refs. 36 and 37), which is closely approximated by the VzG function. Mathematical results regarding the information content and behavior across scales of image features have some bearing on the choice of differential operators. For example, Yuille and Poggio (28) showed that in two dimensions the combination of Gaussian smoothing with any linear differential operator yields zero crossings or level crossings that behave well with increasing scale in that no features are created as the size of the Gaussian is increased. In the case of the secondderivative along the gradient Yuille and Poggio proved that there is
no smoothing function that avoids the creation of zero crossings with increasing scale.The completenessof the scalespace representation of zero crossings or level crossings in two dimensions also requires the use of linear differential operators (2q. The analysis of intensity changesacrossmultiple scalesis a difficult problem that has not yet found a satisfactory solution. There is a clear need to detect intensity changes at multiple resolutions (2). Important physical changes in the scenetake place at different scales.Spatial filters that allow the description of fine detail in the intensity function generally miss coarser structures in the image, and those that allow the extraction of coarser features generally smooth out important detail. At all resolutions someof the detectedfeatures may not correspondto real physical changesin the scene.For example, at the finest resolutions someof the detectedintensity changes may be a consequenceof noise in the sensing process. At coarser resolutions spurious image features might arise as a consequenceof smoothing together nearby intensity changes. The problems of sorting out the relevant changesat each resolution and combining them into a representation that can be used effectively by later processesare difficult and unsolved problems. Some of the research that has attempted to address these problems is mentioned in the next four paragraphs. Marr and Hildreth (11) explored the combination of zerocrossingdescriptionsthat arise from convolving an image with VzG operators of different size. An example of these descriptions is illustrated in Figure 7. The zero crossings from the smaller V2G operator primarily detect the bumpy texture on the surface of the leaf, whereas the zero-crossingcontours from the larger operator also outline some of the highlights on the leaf surface that are due to changlng illumination (the arrows point to one example). Marr and Hildreth suggestedthe use of spatial coincidenceof zero crossingsacrossscaleas a means of indicating the presence of a real edge in the scene. Strong edgessuch as object boundaries often glve rise to sharp intensity changes in the image that are detected across a range of scales and in roughly the same location in the image. In the one-dimensional scale space representation these edges give rise to roughly vertical lines. (The scale spacerepresentation can be extended to two dimensions, in which the positions of the zeto crossings on the x-y plane are represented across multiple operator sizes.)The existence of contours in the scale
EDGE DETECTION
26s
space representation that are roughly vertical and extend acrossa range of scalescould be used to infer the presenceof a significant physical change at the corresponding location in the scene. Witkin (25) developed a method for constructing qualitative descriptionsof one-dimensionalsignals that usesthe scale spacerepresentation. The method embodiedtwo basic assumptions: the identity assumption, that zerocrossingsdetectedat different scales, which lie on a common contour in the scale
(c)
Figure 7 (Continue0
ji'
Figure 7. Multiple operator sizes.(o) A natural image. (b, c) The zero crossings that result from convolving the image with V 2G operators whose central positive region has a diameter of 6 and 12 image elements, respectively. The arrows in (o) and (c) indicate a highlight in the image that is detected by the larger operator.
spacedescription, arise from a single physical event, and the localization assumption, that the true location of a physical event that gives rise to a contour in the scalespacedescription is the contour's position as a tends to zero. Coarser scaleswere used to identify important events in the signal and finer scales were used to localize their position. Events that persisted over large changesin scale also had special significance. Witkin's method, called scale spacefilteritg, begins with the scale space description and collapses it into a discrete tree structure that represents the qualitative behavior of the signal. Some of the heuristics embodied in this analysis may be useful for analyzing two-dimensional images. Canny (14) used a different approach to combining descriptions of intensity changes across multiple scales. Features were first detected at a set of discrete scales. The finest scale description was then used to predict the results of the next larger scale, assurning that the filter used to derive the larger scale description performs additional smoothing of the image. In a particular area of the image, if there was a substantial difference between the actual description at the larger scale and that predicted by the smaller scale, it was assumed that there is an important change taking place at the larger scale that is not detected at the finer scale. In this case fbatures detected at the larger scale were then added to the final feature representation. Empirically, Canny found that most features were detected at the finest scale, and relatively few were added from coarser scales. Poggio, Voorhees, and Yuille (16) have also begun to explore the issue of detecting intensity changes across scales using the methods of regulafization theory. Recall that their approach was to find a smoothed intensity function S(r), given the sampled intensities I(x), that minimizes the following expression:
t [r(xil - s (x)]2 + ^ / |s'(r) l, d*
h:L
266
EDGE DETECTION
The parameter L controls the scale at which intensity changes are detected.That is, if I is small, S (r) closely approximates I(x), and as I increases, S(r) becomes increasingly more smooth. Regul artzation theory may suggest methods for choosing the optimal L for a given set of data, which may be useful for analyzrng changes acrossmultiple scales(16). To summarize, there has been considerableprogresson the detection and description of intensity changes in two-dimensional images, but there still exists many open questions. A large body of theoretical and empirical work has addressedthe question of what operators are most appropriate for performing the smoothing and differentiation stages. Emerging from this work is a better understanding of the advantages and disadvantages of various operators and the relationship between alternative approaches. It is unlikely that a single method wiII be most appropriate for all tasks. The choice of operators dependsin part on the application, the nature of the later processesthat use the description of image features and the available computational resources.Someinteresting work has begun to addressthe problem of detecting and integrating intensity changes across multiple scales, but a satisfactory solution to this problem still eludesvision researchers.A problem that was not discussedhere is the computation of properties such as contrast and sharpness of the intensity changes. There has been some work on this problem, but it has not yet received a rigorous analytic treatment.
RecoveringPropertiesof the PhysicalWorld
ings, changes in surface reflectance or material composition, und so on. Ultimately, it is necessary to determine the physical source of each edge in the scene. Although some interesting work has been done in these areas, there remain many open problems (examplescan be found in Refs. t, 5, 7, 13, 30, 31, lnd 88-41). The recovery of these physical properties of edges is likely to be a main focus of future research on edge detection. BIBLIOGRAPHY D. Marr, Vision,W. H. Freeman, San Francisco,CA, 1982. A. Rosenfeldand M. Thurston, "Edge and curve detection for visual sceneanalysis,"IEEE Trans. Comput. C'2A,562-569 (1971). 3 . L. Davis, "A survey of edgedetectiontechniques,"Corhput.Graph. Im. Proc. 4,248-270 (1975). 4 . E. Persoon,"A new edgedetectionalgorithm and its applications," Comput. Graph. Im. Proc.5, 425-446 (1976). 5 . A. Rosenfeld and A. Kak, Digital Picture Processing,Academic Press,New York, L976. 6. M. J. Brooks, "Rationalizing edge detectots," Comput. Graph. Im. Proc. 8,277 -285 (1978). W. Pratt, Digital Image Processing, Wiley, New York, 19?8H. Weschler and K. S. Fu, "Image processing algorithms applied to rib boundary detection in chest radiographs," Comput. Graph.
Im. Proc. 7,375-390 (1978). 9. I. D. G. Macleod, "Comments on techniques for edge detection," Proc. IEEE 60,344 Q972). 10. D. Marr and T. Poggio, "A theory of human stereo vision," Proc. Roy. Soc.Lond. B 2O4,301-328(1979). 11. D. Marr and E. C. Hildreth, "Theory of edgedetection,"Proc.Roy. Soc.Lond. B 2O7,L87-217 (1980).
In the opening paragraph it was noted that the goal of vision is to recover the physical properties of objectsin the scene,such as the location of object boundaries and the structure, color, of intensity changes by computer and texture of object surfaces,from the two-dimensional image L2. E. C. Hildreth, "The detection Vis. Graph. Im. Proc.22, Comput. vision systems," and biological inof that is projected onto the eye or camera. The detection r-27 (1983). tensity changes in the image represents only a first, meager 13. B. K. P. Horn , Robot Vision, MIT Press,Cambridge,MA, 1985. step toward achieving this goal. This section briefly mentions L4. J. F. Canny, Finding Edges and Lines in Images. MIT Artificial some of the areas of vision that address the recovery of physiIntelligence Laboratory Technical Report 720, 1983. cal properties of edgesin the scene. 15. K. S. Shanmugam, F. M. Dickey, and J. A. Green, "An optimal The property of edgesthat is perhaps most important and frequency domain filter for edge detection in digital pictures," most studied is their three-dimensional structure. The strucIEEE Trans. Patt. Anal. Machine IntelL PAMI-L,37-49 (1979). ture of edgesis conveyedthrough many sources.For example, 16. T. Poggio,H. Voorhees,and A. L. Yuille, A RegularizedSolution the relative locations of correspondingedgesin left and right to Edge Detection, MIT Artificial Intelligence Laboratory Memo stereo views conveys information about the location of the 773,1984. edgesin three-dimensional space(seeStereovision). The rela- L7. V. Torre and T. Poggio, "On Edge Detectior," IEEE Trans. Patt. tive movement between edges in the image can be used to AnaL Machine InteIL PAMI-8, L47-163 (1986). assesstheir relative position in space(seeMotion analysis and 18. J. Hadamard, Lectures on the Cauchy Problem in Linear Partial Optical flow). Three-dimensional structure can also be inDffirentiat Equations, Yale University Press, New Haven, CT, ferred from the shapeof the two-dimensional projection of edge L923. contours, the way in which edgesintersect in the image, and 19. R. M. Haralick, "Edge and region analysis for digital image data," variations in surface texture. These latter cues are essential in Comput. Graph. Im. Proc. 12,60-73 (1980). the interpretation of structure from a single, static photo- 20. J. M. S. Prewitt, Object Enhancement and Extraction, in B. graph. Many algorithms that analyze these sources are feaLipkin and A. Rosenfeld(eds.),Picture Processingand Psychopictorics, Academic Press,New York, pp. 75-L49, 1970. ture based in that the initial inferences regarding three-dimensional structure are made at the locations of features such 2I. R. M. Haralick, L. T. Watson, and T. J. Laffey, "The topographical primal sketch,"Int. J. Robot.Res.,2 50-72 (1983). as significant intensity changes in the image. Discussion of three-dimensional recovering 22. R. M. Haralick, "Digital step edgesfrom zero crossing of second some of these processes for 1, 7,10,t3,27, 5, in Refs. directional derivatives," IEEE Trans. Patt. AnaI. Machine Intell. example, for be found, structure can PAMr-6, 58-68 (1984). 30, 31, and 38-40. physical 23. B. F. Logan, "Information on the zero-crossingsof bandpasssigAnother important property of edgesis the type of nals," Bell Sys/. Tech.J.56,487-510 (L977)be might edges change from which they arise. For example, oriin A. L. Yuille and T. Poggio,Fingerprints Theoremsfor Zerc-Crosssurface 24. changes boundaries, object of the consequence ings, Proceedingsof the American Association for Artifi'cial Intellientation, shadows,highlights or light sources,surface mark-
EDUCATION APPLICATIONS 267 gence Conference,Austin, TX, 1984. Also appears as MIT Artificial Intelligence Laboratory Memo 730,1984. 25. A. P. Witkin, "Scale-SpaceFiltering," in Proceedingsof the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, FRG, pp. 1019-L022, 1983. 26. J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, Uniqueness of the Gaussian Kernel for Scale-SpaceFiltering, IEEE Trans. Patt. Anal. Machine IntelL PAMI-&,26-33 (1986). 27. J. J. Koenderink, "The structure of images," Biol. Cybern., 50, 363-370 (1984). 28. A. L. Yuille and T. Poggio, Scaling Theorems for Zero-Crossings, IEEE Trans. Patt. Anal. Machine IntelL PAMI-8, 15-25 (1986). 29. H. R. Wilson and J. R. Bergen, "A four mechanism model for threshold spatial vision," Vis. Res.19' 19-32 (1979). 30. T. O. Binford, "Inferring surfacesfrom images," Artif. Intell. L7, 205-244 (1981). 31. T. O. Binford, "survey of model-basedimage analysis systems," Int. J. Robot.Res. l, 18-64 (I98D. 32. R. Nevatia and R. Babu, "Linear feature extraction and description," Comput. Graph. Im. Proc. I3,257-269 (1980). 33. J. W. Modestinoand R. W. Fries, "Edge detectionin noisy images using recursive digital filtering," Comput. Graph. Im. Proc. 6, 409-433 0977). 34. V. Berzins, "Accuracy of Laplacian edge detectors," Comput. Graph. Im. Proc. 27, 195-210 (1984). 35. A. Heurtas and G. Medioni, Edge Detection with Subpixel Precision, Proceedingsof the IEEB Workshop on Computer Vision: Representationand Control, IEEE Computer Society Press, Bellaire, MI, October 1985. 36. R. W. Rodieck and J. Stone, "Analysis of receptive fields of cat retinal ganglion cells," J. Neurophysiol.28,833-849 (1965). 3?. F. M. deMonasterio,"Properties of concentrically organizedX and Y ganglion cells of macaque retina," J. Neurophysiol. 41, 1394r4L7 (1978). 38. M. Brady (ed.), Computer Vision, North-Holland Co., Amsterdam, 1 9 81 . 39. D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, EnglewoodCliffs, NJ, 1982. 40. K. M. Mutch and W. B. Thompson, "Analysis of accretion and deletion at boundaries in dynamic scenes,"IEEE Trans. Patt. Anal. Machine IntelL PAMI-7, 133-138 (1985). 4L. S. A. Shafer, Shadows and Silhauettes in Computer Vision, Kluwer Academic Publishers, Boston-Dordrecht-Lancaster,1985. E. Hu,oRETH MIT
The author is supported by the Artificial Intelligence Laboratory and the Center for Biological Information Processingat the Massachusetts Institute of Technology. Support for the Artificial Intelligence Laboratory's research is provided in part by the Advanced ResearchProjects Agency of the Department of Defense under Office of Naval Research contract N00014-80-C-0505.The Center's support is provided in part by the Sloan Foundation and in part by the Office of Naval Research.
EDUCATIONAPPTICATIONS lnstruction? What is IntelligentComputer-Based Even before computers were available, mechanical devices were being used for delivery of instruction. The programmed
instruction efforts (1) that arose from the behavioral movement were the basis for a variety of presentation machines (seeComputer-aided instruction, intelligent). These machines would characteristically present a display in a window, and the student would make a multiple-choice responseby pressing one of four or five buttons. The machines held a paper roll on which the displays were placed.Holes were punchedin the paper to indicate which display, or fraffi€, should be presented next as a function of which button was pressed.When computers became available, the paper technology was moved to the computer. What was gained was ease of duplicating and recombining items-it was easier to combine program segments than to cut and tape the paper roll. There was also a loss-teletypes did not have the graphic capabilities of the artist with pen on paper. One major strand in the development of computer-based instructional materials has been the effort to make computerized programmed instruction as goodas the paper roll version. In 1972John Kemeny criticizedcomputer-basedinstruction as merely using the computer as "a very expensivesubstitute for a book" (2).This criticism would have applied equally well a decadelater. Although major systemssuch as the PLATO system of Control Data Corporation have gone far beyond the initial methodology,allowing considerablecomplexity of both display forms and rules for analyzing student responses,much of current computer-basedinstruction still is frame-based.A display is presented,a responseis made, it is analyzed,and a next frame is selectedon the basis of the analysis. AII of what such systems can do is preprogrammed into them as a fixed algorithmic specification. A somewhat different approach is to build a game or simulation. Many such instructional programs exist, but they, too, tend to be very rigid. Becausetheir functions are completely preprogrammed, they can adapt only to individual differences that the designer fully anticipated. Over a decade ago the conceptof AI for computer-basedinstruction was advanced(3) as a means of providing more adaptive behavior of the instructional system. Much of the most advancedwork today is concentrated on producing intelligent instructional systems.The particular locus of intelligence that has been consideredthe longest is in diagnosing what the student does and does not know (4), but there are indeed several loci for application of AI principles to instruction: student knowledge, expert knowledge,and instructional principles. In each area there are complex issues of both knowledge representation (qv) and information-processing methods that are at the forefront of AI research.Several reviews of AI applications to education have already appeared (5-7). Although each presents a particular viewpoint, all three are worthy of examination because they are more extensive than the present summary. It is also important to note that progressin intelligent computer-basedinstruction rests on continued efforts in cognitive psychology(qt) and in technologiesfor task analysis and instruction as well as in direct AI efforts.
Componentsof an IntelligentInstructionalSystem There are many architectures for intelligent tutoring systems, and it is not clear that any one of them is suited to all instructional systems. However, there ate several componentsthat seemto be present in all intelligent computer-assistedinstruction (ICAI) systems. These are shown in Figure 1. Moving
268
EDUCATIONAPPLICATIONS
ture. If it were possible to demonstrate certain principles rather than merely talk about them, it is likely that they would be easier to learn. As deviceshave becomemore complex and as school budgets for laboratories and the supply of trained scienceteachers have dwindled, schoolinghas become more oriented toward lectures and rote learning. The computer offers the possibility for overcoming this trend. Complex devicescan be simulated readily on the computer screen,and viewpoints can be offered that might not even be possible in the real world (e.g., how would this engine run if there were zero friction? what would happen to a block sliding down a ramp if F : mu instead of F - ma?). In this sectionwe discuss simulation systems, in which an instructional designer specifies what should be done to imitate a phenomenonthat a student is trying to understand. In the next section we discuss discovery environments, in which the student learns by building simulations as well as by watching them. There are many ways in which intelligence can be incorporated into simulation systems. One important concern is that the environment be reactive (7). That is, the ways in which it respondsto manipulation by the student should both motivate further exploration and guide the student in that exploration. Rule-basedsystems (qv) can permit reactivity that is princiThe LearningEnvironment.Central to an intelligent instruc- pled and adaptive to student actions that might not be wholly predictable in advance. tional system is the interaction of a student with a learning Another important concern is that the system be articulate. environment in which the student can be taught or coachedto teach herself. Certainly there are some instructional systems Simulation systems,like many expert systems,needto be able that simply lecture the student, but the most exciting develop- to explain themselves,and many of the issuesthat apply in the ments have involved the use of the computer to simulate a design of explanation facilities for expert advisor systemsalso device or a task environment, to permit students to assemble apply to instructional simulations. Clancey has demonstrated quite clearly that the expertise that might drive an expert and test their knowledge in some kind of discovery environment, or to provide practice of a skill that is motivated by a system will not necessarily be organized appropriately to progame. vide explanations of that system'sbehavior (10,5).For example, MYCIN, 8D expert diagnoser of infectious disease,was Simulations.Many learning tasks are made difficult by the abstraction neededto present knowledge in a textbook or lec- organized as a backward-chaining system, one that works backward from goals,through subgoals,toward the given conditions. However, if, after it makes a diagnosis,a student asks why that diagnosis is appropriate, the proper explanations Tutoring generally involve going forward from causesto effects.Indeed, issue (c) Clancey had to design the MYCIN knowledge base completely generator and add additional knowledge in order to add a useful instructional capability. The rules in diagnostic manuals that give rise to MYCIN did not contain any accountof pathophysiology or the origins of the disease. More broadly, the kinds of principles that drive simulations nstr uc ti onal Expert Student (b) inter v enti on modeler modeler must be different when instruction is a goal. For example, there are some very nice electronic circuit analysis programs that can be quite useful to engineers but are useless for instruction; they solve engineering problems using efficient quantitative equations, whereas understanding of the princiLearning ples they embody rests on qualitative knowledge. Considerenvironment able work is being done on approachesto the representation of qualitative knowledge about complex systems (11-15), and ,/ ./ ,/ this work is likely to have a heavy impact on the development / of intelligent simulations. In particular a distinction is being drawn between physical fidelity of a simulation to its referent and cognitive fidelity, i.e., the ability to make important aspects of real-world function salient and understandablein a (a) Student simulation. One variant of this effort (13) specifiesa system as \/ a set of constraints on the devicesof which it is composedand on how those devicesare interconnected.Each device within a Figure 1. Basic components of intelligent instructional systems. For system is represented as a series of qualitative constraint description of parts (a)-(c), see text.
from the bottom up, there are (o) an interaction between some sort of instruction environment and the student; ( b) a layer of expert modules that receive data about the instructional interactions or control them; and (c) a tutoring issue generator that comparesthe student's performance in the learning environment with an ideal or expert model, recognizes departures from the ideal, and suggestsinstructional issuesto which the instructional intervention unit should direct its attention. Each of thesemodulescan be intelligent, as is discussedbelow. Further, the componentsshown in Figure 1, although necessary for fully intelligent tutoring, need not be separateunits of program code.Work is beginning on a tutoring system architecture in which the basic units are structured objects.The objects in this new work are more than the sorts of obiects popularized by SMALLTALK sense(8) (seeLanguages,objectoriented). They correspondto nodes in a lattice structure representing the goal structure or prerequisite relationships in a curriculum. That is, each object correspondsto a lessonin a courseor a curriculum subgoal. In this approach (9) the methods associated with a given lesson object correspond to the components shown in Figure 1, but they are distributed throughout the object network.
a\
EDUCATION APPLICATIONS 269 equations.Connectionsbetween devicesimply that their constraints can be satisfied simultaneously. Much of the power of a simulation environment will depend on how the other componentswork. For example, consider a simulation environment in which the job of the student is to find the fault in an electronic device such as a power supply (16). Even if no specialcoachingis supplied,I simulated practice environment does permit practice to occur more quickly (long delays that might occur in the real world can be compressed),more safely, with better recordsof what the student did, and with more flexibility in problem selection.In order for the simulation to do more than merely respond correctly to requests for meter readings and other tests, it should also be allied with some sot of coaching or advising system. Representing a systemas a set of qualitative constraints helps make it possible to explain system activity and responseif there is appropriately organi zedconceptualknowledgeassociatedwith each constraint equation. In addition, it is necessaryto develop techniques for conveying the knowledge that is representedby the qualitative constraints,so that system-wideperformance, rather than merely the input-output behavior of individual components,is explained (13,16). DiscoveryEnvironmenfs.Closely allied with simulation systems are discovery environments, which are essentially programming languages that permit the student to build simulations. Perhaps the best known of these languages is LOGO (qv), which combinesa parenthesis-freeversion of LISP with a set of commands for moving a virtual pen on the terminal screen (commands such as forward, turn, penup, and pendown). LOGO is often used to teach programming concepts such as planning, iteration, recursion,etc., and there has been at least one intelligent tutor built that is able to critique student programs and provide advice toward improving them (1 7 ) . LOGO has been used as the basis for a substantial curriculum for the geometric aspectsof mathematics (18). The approach is based on the notion of a procedural geometry, in which the fundamental domain for theorems is the set of procedures for guiding the LOGO turtle or virtual pen in space. Not only does this approach provide a natural basis for explaining most of the conceptsof high schoolgeometry but it is also extended as far as the special theory of relativity, which may slightly strain the limits of simple LOGO displays. Of course, from another point of view, a course such as geometry deals with proofs and not so much with the domain to which the proofs apply. A very different approach (19) has been taken in a tutoring system developedby John Anderson that allows students to interactively program proofs displayed on the video screen as paths between premises and conclusions; the links in a path coruespondto statements of a proof. Another issue in the development of programmable microworlds is the difficulty peoplehave in expressingalgorithms as formal programs. Recently, Jeffrey Bonar (20) has developeda programming tutor that allows the student to develop a natural-language plan for his program first and then coacheshim through the steps of converting that plan to a formal program. The natural-language planning is done via a menu system in which the choicesin the menu are basedon extensiveanalysis of actual verbal descriptions of various algorithms by programming students. The intelligence in the program includes knowledge of how to follow up a general selectionfrom the toplevel menu, the (possiblyseveral)formal plan componentsthat
the student may have in mind when making a particular choice, and other diagnosis and teaching knowledge. Fundamentally, a discovery environment allows a student to specify a processand then to seea simulation of that process being carried out. There are, of course, strikingly different ways of specifying a process. In addition to procedural approachessuch as those discussedabove,there are such important alternatives as declarative programming (as in PROLOG) and object-oriented programming (as in LOOPS or SMALLTALK). In the previous section the utility of qualitative constraint representationsfor simulation was discussed. Presumably, what is goodfor the instructional designer is also useful for the student. There is at least one extant system (2I) that allows the student to specify the constraints on various devices being described and to specify how those devices are interconnectedinto a system. Given the constraint knowledge, the discovery environment can then simulate the system that the student has implicitly specified.This allows the student to learn qualitatively (13,14)about a domain. A fundamental problem with discovery environments is that students often fail to do the specific experiments that might teach them something. This is an important reason for supplementing such environments with intelligent coaches that assessthe student's behavior and make suggestionsfor activity that is more likely to be instructive. Such systems must be able to model the student and must have the knowledge needed to go from a model of student performance to a plan for improving the rate of learning. Games. Games can be used to motivate the substantial amounts of practice neededto automate and refine basic mental operations.A striking early example of an intelligent game environment for practicing arithmetic is provided by Burton and Brown's WEST system (22). The primary purpose of WEST is to provide practice in the arithmetic activities involved in building and evaluating arithmetic expressions.A game environment is provided that is similar to Chutes and Ladders (the game is called Snakesand Ladders in the U.K.), except that the number of spaces to move is generated by arithmetic operations rather than by rolling the dice. Three dials with numbers on them appear on the screen.On each of them a needlespins,stoppingrandomly on one of the numbers. The student must combine the three numbers in an arithmetic expressionto determine his move. Other recent game environments have involved issues of planning and strategy, rapid retrieval of number facts, etc. The key use of game environments is to provide palatable drill, but they also can be used as discoveryenvironments or laboratories for metacognitive skills. Doing well in a good game requires planning and carrying out relatively complex strategies. To the extent that these strategies can be specified rigorously (via production systems or other algorithmic notations), an intelligent system can be created to coachthe use of in gaming strategy, in arithmetic issues, and more general issuesof learning and problem solving (qv). The Expert Model. The key to tutoring performance in a Iearning environment is to know what the goal of the tutoring is and where the student stands relative to reaching that goal. The goal is generally for the student to perform like an expert. Therefore,the intelligent instructional system will generally contain an expert model that can tell what the expert would have done in the situation the student is now facing. This
27O
APPLICATIONS EDUCATION
expert performancecan then be comparedto the student's performance. Such comparisonsor differential models are essential in diagnosing a student's capabilities; specific errors of omission and commissionare the most diagnostic information available for evaluating student knowledge and competence. For some learning environments the expert model is very straightforward. For example, the WEST tutor's expert (22) simply generatesevery arithmetic expressionthat is possible for a given set of spinner numbers. It then ranks these alternatives according to a criterion of their effect in the game. For example, it might compute how far ahead of the computer the student would be if the student used a particular expression. Whichever expressioncame out best accordingto a given criterion would be deemedthe expert's performance. Note, however, that no human expert would ever work that way. Somehowhumans, more vulnerable to problems of combinatorial explosion,are more likely to employ heuristics (qv) that minimize such problems. If the intelligent tutoring system cannot take account of such capability, its capabilities may be limited. Further, human problem solving and reasoning are extremely flexible and not entirely captured by the brittle sorts of expert systems that have initially appeared (23). To be maximally useful, an expert model must capture some of the human aspectsof reasoning and must be able to explain why it acted as it did (24). It must be an "articulate expert" (25). It should also be noted that human performance is the final product of thinking activity that includes considerable planning. To be completely useful, the expert model must simulate strategic processessuch as planning as well as the final performance (2U28). The StudentModel. Given an expert model,it is often possible to assessstudent performance by noting which aspectsof expert capability are not present in the student. The result is an overlay model (29), which is simply a checklist showing which of the procedural componentsof the expert model have been verified in the student. If, for example, the expert model is expressedas a production system, the student model is simply a subset of the expert productions. Of course,the productions of the expert may be more specializedthan those of the student, so it will not generally be the case that the student can be modeled as a simple subset of the expert model. Further, there are several problems. First, there is some evidence that students sometimes possess"mal-rules" (30), piecesof procedure that are wrong rather than merely incomplete. For example, many students taking physics in high school start with the mal-rule that force is directly related to velocity, when in fact the correct relationship is with acceleration. Second,there is also evidencethat students do not merely grind to a halt when their proceduresare inadequate to a task they face. Rather, they systematically invent some temporary way around the problem, a repair (31). In spite of the problems mentioned, it has been possible to buitd at least a few student-modeling facilities in intelligent tutoring systems. For example, WEST models the student as discussedabove Q2). In the area of subtraction skill, Burton GZ) has created a facility that is often able to analyze student performance on a series of subtraction problems and detect gaps and mal-rules, or bugs, in the student's knowledge. A somewhat different approach to student modeling is to directly analyze the student's problem-solving rather than model his
answers to problems. A system for analyzing students' explanations of complexphenomena(33) seemsto have someof this character. A more critical problem is that there is no single criterion for deciding that a student knows a particular subskill or fact. For example, in studying the errors of students who learn LISP, it was found (19) that 60Voof the errors involved situations in which a subskill had been demonstrated successfully but could not reliably be combinedwith other subskills into an integrated higher-level component. For example, a student might be able to write a procedurethat processedthe elements of a list one after another but, when called upon to solve a problem involving that subskill, might not reliably execute it while simultaneously thinking about other problem issues. Thus, failure to display a piece of knowledge when it is needed may not mean that the knowledge is absent, and successfully demonstrating a subskilt in vacuo may not mean that it can be used in complex situations where it is appropriate. It is critical, therefore, that any student-modeling system be able to represent student knowledge from multiple viewpoints and with appropriate degreesof tentativeness that reflect the variable reliability of knowledge at different stagesof practice and the situation specificity of many skills. Systems that can do this arejust beginnittg to be developed. It would be inappropriate to end this section without at least mentioning that a major thrust of current cognitive science (qv) research involves understanding the kinds of mental models that both experts and novices bring to intellectually demanding tasks. A recent book (34) provides a very good introduction to a number of important efforts in this direction. TheTutoringlssueGenerator. Oncethere is a student model, the next step is for an intelligent component of the instructional system to analyze that model to determine how to set the courseof instruction. One view of this processis to seeit as the determination of which of a set of potential tutoring issues is most critical at the moment (22). An instructional intervention can then be prepared that deals with the most critical issue.In a systemlike WEST (22),this is doneby having each issue evaluator examine the student's most current response, the student's overall responding history, and the possible moves generated by the expert model that rank higher than the student's. From this examination comes a list of issues that should be handled. A variety of rules is then used to pick the most important issue. For example, one might prefer to handle arithmetic problems (like inadequate use of parentheses)first, and one certainly wants to pick the right moment to intervene. That is, the student is unlikely to attend to advice on using arithmetic expressionswith parentheses if, on the current move, use of parentheses cannot produce a significantly better result. Also, it is best not to interrupt the student's play constantly to give advice. An alternative approach to the generation of tutoring issues is to have a curriculum or syllabus (35) to guide the introduction of new topics or the elaboration of old ones. In this approach an instructional subgoal network guides the ordering of instructional interventions; the issue to be tutored is simply one of the next issues in the syllabus that are eligible for instruction becausetheir prerequisites have been met. It is critical, in such a system, that there be some psychological validity to-the prerequisite relationships that partially order
EDUCATIONAPPLICATIONS
the syllabus. If there is none, some of the tutoring will fail because students are not ready for it. The lnstructional Intervention.Once a decision has been made on the issue to be tutored, Br appropriate instructional intervention must be accessedor constructed. These instructional interventions can take many forms, depending on the type of learning environment being used (see above). For example, if the learning environment is a game like WEST (22), the instructional intervention may take the form of coaching. The program might interrupt the game and offer advice that will result in better game performance and will also help the student learn something in the domain targeted for instruction. Three other forms of instructional intervention have received attention as well: choosingappropriate problems for the student, working with the student to solve and explain the solutions for probleffis, and using Socratic dialogues. The selection of appropriate problems and subproblemsfor the student can require considerable intelligence. In earlier approachesto computer-basedinstruction, getting a particular problem wrong resulted in being branched to a specificeasier problem. In contrast, intelligent systems have a number of ways of tuning the decisions regarding what the student should do next. In some cases,the specific method needed to solve a piece of a problem was recently exercisedin a problem that the student solved.In these casesthe tutor can deal with a student impasseby discussingthe earlier effort and perhaps even having the student look at his earlier solution or even rework the problem. Anderson has used this approach in his LISP and geometry tutors (19).It is also possibleto plan intelligent problem selection based on the student model and the goal structure for a curriculum. Jeffrey Bonar and the author have been developing approachesof this sort in their laboratory, and at least one example has been embodied in a tutor (36). A secondapproach to instructional intervention has been developedas an extension of a computer-basedexpert medical diagnosis system (24). Here, the instructional interventions are much more under the student's control. For example, the student can ask the system to explain why it has made the diagnosisit did. Here, tailoring to the student's current knowledgetakes a somewhat different form, which involves an intelligent guessabout what the student already knows and which aspectsof a full explanation need to be stated explicitly in any given case. Work is currently beginning in several laboratories in which an effort is being made to better understand how expert human instructors tailor explanations and apprenticeship interactions to students' existing knowledge. For example, several researchersat the University of Pittsburgh and at Carnegie-Mellon University have begun an effort to understand interactions between a clinical instructor and a medical student or intern on patient rounds (i.e.,while seeingpatients in a hospital). Sometimes the human instructor will make a diagnosis, and sometimes he will ask the student to do so. Sometimes,he will give the explanation for a diagnosis, and sometimes he will leave that to the student. The choice of approachesseemsto be principled, and work is now underway to infer these principles from a collection of taped patient rounds. It is also possiblefor an instructional system to have a Socratic dialogue with a student, using the dialogue both to determine what the student knows and to provide opportuni-
271
ties for the student to discover new knowledge (3). One substantial effort (37) has been made to specify a rule set that is sufficient for guiding such Socratic dialogues.The rule set was developedby studying how goodtutors conduct such conversations. As with the previous examples of instructional intervention, this is another case in which the expert computer tutor has been developedby studying what human experts do and making the apparent principles behind their behavior into explicit rules that can drive an intelligent computer system. One problem with this approach is that it doesnot necessarily capture instructional principles that have any grounding in our knowledge of learning processes,although some of the people doing the work are first-rate psychologistswho bring knowledge of learning theories to their work. In other cases (19) the instructional approach is directly motivated by a theory of learning, even though the specifics of the instruction involve tuning that goes beyond the stated learning theory. BIBLIOGRAPHY 1. B. F. Skinner, The Technotogyof Teaching, Appleton-CenturyCrofts, New York, 1968. 2. J. G. Kemeny, Man and the Computer, Scribner's, New York, p.74, L972. 3. J. R. Carbonell, "AI in CAI: An artificial intelligence approachto computer-aided instruction," IEEE Trans. Man-Mach. Sys. MMS-r l, 190-202 ( 1970). 4. J. A. Self, "Student models in computer-aidedinstruction," Int. J. Man-Mach. Stud. 6,261*276 ft974). 5. W. J. Clancey, Methodology for Building an Intelligent Tutoring System,in W. Kintsch, J. R. Miller, and P. G. Polson,(eds.),Methods and Tactics in Cognitiue Science,Erlbaum, Hillsdale, NJ, pp. 51-83, 1984. 6. A. Barr and W. J. Clancey, Applications-OrientedAI Research: Education, in A. Barr and E. A. Feigenbaum(eds.),Handbook of Artifi.cial Intelligence,Vol. II, William Kaufmann, Los Altos, CA, pp. 223-294, 1982. 7. J. S. Brown, Uses of Artificial Intelligence and Advanced Computer Technologyin Education, in R. J. Seidel and M. Rubin (eds.), Computersand Communications:Implications for Education, Academic Press,New York, pp. 253-28I, L977. 8. A. Kay and A. Goldberg,"Personaldynamic media," ComputerlO, 31-41 (L977). 9. J. G. Bonar and A. M. Lesgold,Work in Progresson Intelligent Tutors, Learning Researchand Development Center, University of Pittsburgh, Pittsburgh, PA, 1985. 10. W. J. Clancey, Transfer of Rule-BasedExpertise through a Tutorial Dialogue, Doctoral Dissertation, Report STAN-CS-769,Stanford University, Stanford, CA, 1979. 11. B. Kuipers, "Commonsensereasoning about causality: Deriving behavior from structure," Artif. Intell. 24, 169-204 (1984). 12. J. de Kleer, "How circuits work," Artif. Intell.24,205-280 (1984). 13. J. de Kleer, and J. S. Brown, "A qualitative physics based on confluences,"Artif. Intell. 24,7-84 (1984). L4. K. D. Forbus, "Qualitative processtheory," Artif. Intell. 24, 85168 (1984). 15. B. C. Williams, "Qualitative analysis of MOS circuits," Artif. Intell. 24, 28L-346 (1984). 16. J. S. Brown, R. R. Burton, and J. de Kleer, Pedagogical,Natural Language and Knowledge Engineering Techniquesin SOPHIE I,
272
ELI
II, and III, in D. Sleemanand J. S. Brown (eds.),Intelligent Tutoring systems, Academic Press,New York, pp. 227-282, 1982. L7. M. L. Miller, A Structured Planning and Debugging Environment for Elementary Programming, in D. Sleeman and J. S. Brown (eds.),Inteltigent Tutoring Systems,Academic Press, New York, pp. 119-135,1982. 18. H. Abelson and A. diSessa,Turtle Geometry:The Computer as a Medium for Exploring Mathematics, MIT Press, Cambridge, MA, 1981. 19. J. R. Anderson, Cognitive Principles in the Design of Computer Tutors, Technical Report, Carnegie-Mellon University, Pittsburgh, PA, 1984. 20. J. G. Bonar, Bridge: An Intelligent Programming Tutor/Assistant, Learning Research and Development Center, University of Pittsburgh, Pittsburgh, PA, 1983. 2I. A. Bornirg, THINGLAB: A constraint simulation laboratory,Doctoral Dissertation Report Stan-CS-79-746, Stanford University, Stanford, CA, 1979. 22. R. R. Burton and J. S. Brown, An Investigation of Computer Coaching for Informal Learning Activities, in D. Sleeman and J. S. Brown (eds.),Intelligent Tutoring Systems,Academic Press, New York, pp. 79-98, 1982.(The original game on which WEST is basedwas programmed at the University of Illinois on PLATO by Bonnie Seiler). 23. H. E. Pople, Heuristic Methods for Imposing Structure on Illstructured Problems: The Structuring of Medical Diagnostics, in P. Szolovits (ed.),Artificial Intelligence in Medicine, Westwood, Boulder, CO, pp. 119-190, 1982. 24. W. J. Clancey, Tutoring Rules for Guiding a Case Method Dialogue, in D. Sleeman and J. S. Brown (eds.),IntelligentTutoring AcademicPress,New York, pp. 20t-225, 1982. System.s, 25. I. Goldstein, The Computer as Coach: An Athletic Paradigm for Intellectual Education,AI Memo 389, MIT, Cambridg", MA, 1977. 26. I. P. Goldstein, The Genetic Graph: A Representationfor the Evolution of Procedural Knowledg", in D. Sleeman and J. S. Brown (eds.),Intelligent Tutoring Systems,Academic Press, New York, pp. 51-77, t982. 27. M. R. Genesereth,The Role of Plans in Intelligent Teaching Systems, in D. Sleeman and J. S. Brown (eds.),Intelligent Tutoring Academic Press,New York, pp. 137-155, 1982. System.s,
37. A. Collins and A. L. Stevens,Goals and Strategiesof Interactive Teachers,in R. Glaser (ed),Aduancesin Instructional Psychology, Lawrence Erlbaum, Hillsdale, NJ, Vol. 2, pp. 65-119' 1982. A. LBscoLD University
of Pittsburgh
A variation of this entry appears in German in H. Mandl and H. Spada (eds.), Wissenpsychologie: Ein Lehrbuch, Urban & Schwartzenberg, Munich. It appears in English in this volume by courtesy of Urban und Schwartzenberg, which owns and reserves all rights to it.
ELI An English-language interpreter for converting English sentences to CD forms (see Conceptual dependency),written in 1975 by C. Riesbeck at the Yale AI Project. ELI differs from other natural-language parsers in that it derives the semantic and memory structures underlying an utterance, whereas a syntactic parser (see Parsing) discovers the syntactic structural representation (see C. Riesback, "An ExpectationDriven Production System for Natural Language Understandi.g," in D. A. Waterman and F. Hayes-Roth (ed.), Pattern-Directed Inference Systems, Academic Press, New York, pp. 399-4L3, 1978). M. Terp SUNY at Buffalo
ELIZA
A progfam that mimicks a "Rogerian" psychotherapist, uses almost no memory and no "understanding" of inputs and creates answers by combining phrases that are stored under certain keywords with transformations of input sentences[seeJ. Weizenbaum, "ELIZA-A computer program for the study of machine," 28. M. R. Genesereth,The Role of Plans in Automated Consultation natural langu agecommunication between man and (January about the Information 1966)1. 36-45 CACM,9(1), ConferJoint Systems, in Proceedingsof the Sixth International domain of discourseis isolated in a "script." By supplying new enceon Artifi,cial Intelligence,Tokyo, Japan, pp. 311-319, 1979. 29. B. Carr and I. Goldstein,Overlays:A Theory of Modeling for Com- scripts, Br improved version of ELIZA has been adapted successfully to other domains [see J. Weizenbaum "Contextual puter-Aided Instruction, AI Memo 406, MIT, Cambridge, MA, 1977. understanding by computers,"CACM,10(8), 47 4-480 (August f967); S. C. Shapiro and S. C. Kwasny, "fnteractive consulting 30. D. Sleeman,AssessingAspectsof Competencein Basic Algebra, in D. Sleemanand J. S. Brown (eds.),Intelligent Tutoring Systems, via natural languago," CACM, 18(8), 459-462 (August 1975)1. Academic Press,New York, pp. 185-199, 1982. 31. J. S. Brown and K. Vanlehn, "Repair Theory: A generativetheory of bugs in proceduralskills," Cog.Scl. 2,379-426 (1980). 32. R. R. Burton, DiagnosingBugs in a Simple ProceduralSkill, in D. Sleemanand J. S. Brown (eds.),Intelligent Tutoring Systems,Academic Press,New York, pp. 157-183, L982. 33. D. Sleeman and R. J. Hendley, ACE: A System which Analyzes Complex Explanations,in D. Sleemanand J. S. Brown (eds.),Intelligent Tutoring Systems,Academic Press, New York, pp. 99118, 1982. D. Gentner and A. Stevens,Mental Models, Erlbauffi, Hillsdale, 34. NJ, 1983. 35. I. Goldstein, "The genetic epistemology of rule systems," Int. J. Man-Mach. Stud. IL,5L-77 0979). 36. C. L. Cosic, Enhanced Learning-by-Discovery,Master's Thesis, School of Library and Information Science,University of Pittsburgh, 1985.
J. Gsr,lnn SUNY at Buffalo
EtuPsrs Ellipsisand Substitution Etlipsis is leaving somethingunsaid, which will, nevertheless, be understood by the listener. It relies on the intelligence of the listener to fill in what is missitg, thus allowing more information to be conveyedin fewer symbols.As such, ellipsis is a form of anaphora. Etlipsis differs from other forms of anaphora in that the primary clues as to what is missing through ellipsis are to be found in the structure of the sentence.
EMOTIONMODELINC 1. Balser was looking for one big mean black bear. 2. He found two. In2 "two" is understoodas "two big mean black bears."Part of the noun phrase has been omitted, which the listener is expected to fill in. The elliptical construction not only is more brief but it focusesattention on the differencebetween the two noun phrases:Only the contrast is explicitly mentioned. Ellipsis is a special case of substitution. In both cases a phrase is replaced by a substitute. A phrase may be replaced by a substitute word such as "one." This is often called "one"anaphora. In the case of ellipsis the phrase is replaced by nothin g at all. 3. Balser was expecting to find a fluffy baby bear. 4. He found a nasty full-grown one. In 4 "one" substitutes for the omitted "bear," drawing a contrast between "a fluffy baby bear" and "a nasty full-grown bear." In 4 "bear" is being repeated (through substitution), whereas "a fluffy baby" is being replaced by "a nasty full grown." Ellipsis is often revealed as an incomplete structure. For example, in sentence2 above,the head of the noun phrase has been omitted, leaving an incomplete noun phrase structure. In sentence 4 the same thing has happened, but the substitute word "one" has been inserted. In sentence6 below, 5. Balser crawled into the cave and what do you supposehe found? 6. Two cuddly baby bears. the fragment "two cuddly baby bears" is a structurally complete noun phrase, but the sentencestructure to which it attaches has been left unsaid ("He found two cuddly baby bears"). Both of these forms of structural ellipsis can be found together as in sentence8. 7. How many baby bears did Balser find in the cave? 8. Two.
273
with four glumps" over "three brown gleeps with seven glumps" becausethe structure of 9 puts greater emphasis on gleeps (it is said to be in sharper focus).Having identified the candidate phrase as "three brown gleeps with four glumps," the structure of the original phrase and the elliptical phrase are compared to determine how much of the original phrase was intended to be camied into the elliptical phrase. Did the speaker intend seven brown gleeps with four glumps or seven gleeps (possibly some other color) with four glumps or seven gleeps (with unknown numbers of glumps). Since the contrast is on the number of gleeps,it is assumedthat all the attributes that follow the number should be assumed to be the same. Hence, the expandedphrase is "seven brown gleepswith four glumps." Contributionsof Ellipsisto Context One of the effectsof ellipsis is to bind sentencestogether into a context. When one sentencerelies on those around it and upon the situation describedby the context, it is more clear that the sentencescontribute to a coherent whole. Other forms of anaphora, such as repeated reference (e.9., pronouns and definite noun phrases), have a similar effect of binding sentencestogether into a coherent context. Repeated referencediffers from ellipsis, however, in that the purpose of repeated reference is to refer again to a conceptthat already exists in the context. The purpose of ellipsis is to provide emphasis and contrast between one concept and another. General Referenees M. A. K. Halladay and R. Hasan,Cohesionin English, Longman, London,1976. G. Hirst, Anaphora in Natural Language UnderstandinglA Suruey, Springer-Verlag,New York, 1981. H. Tennant, Natural Language ProcessinglAn Introduction to an Emerging Technology,Petrocelli Books, Princeton, NJ, 1980. B. L. Webber,A Formal Approach to Discou,rseAnaphora, a part of the series:Outstanding Dissertations in Linguistics, Garland Publishirg, New York, L979. H. TruNANr Texas Instruments
"Two" is a fragment of a noun phrase ("two baby bears"), which is assumed to be attached to a sentence("Balser found two baby bears").
EMOTIONMODELING ContainingEllipsis Sentences Understanding The fragments omitted through ellipsis and substitution can usually be recoveredby an analysis of meaning constraints, an analysis of sentence structure of previous sentencesto locate the most probable candidate phrases,and an analysis of structure to identify what is being repeated and what is being replaced.A nonsenseexample helps to illustrate the process. 9. Are there three brown gleeps with four glumps? 10. No, seven.
The nature of human emotion and the role it plays in cognitive processeshas not been studied extensively within the AI community. Many AI researchers question the necessity of emotional reactions within any system whose purpose is strictly cognitive. Emotions make people irrational: why should irrational thought processes be introduced into systems that would otherwise operate with the cool, detachedsuperiority of unadulterated intelligence? Ironically, some would answer this question by arguing that emotional states are linked to cognitive skills at a fundamental level.
When an elliptical phrase is found, such as "seven" in sentence EarlySimulations 10, the context must be examined to determine "seven what?"-the unknown has the meaning constraint that it is The first serious attempt to develop a computational model of countable. Sentence t has two countable candidates: gleeps human emotion was undertaken by Colby (1), a psychoanalyst. and glumps. Most readers will choose "seven brown gleeps Colby is best known for his work with PARRY, & system that
274
EMOTION MODELING
mimics the linguistic behavior associatedwith schizophrenic paranoia. PARRY attracted much attention as a system that could fool psychiatrists into believing that they were conversing with a paranoid human over a computer terminal. Unfortunately, this variant of the Turing test taught more about human gullibility than about human cognition. Colby himself concludedthat a number of major problems in cognitive modeling would have to be overcome (especially the problem of natural-Ianguage processing) before the effort he originally envisioned could be attempted. Although the PARRY program is widely knowr, an earlier effort by Colby to simulate neurotic thought processestackled the more general question of how belief systems (qv) and thought processesinteract with repressedemotions (seeBelief systems).This program relied on a set of transformations that operated on beliefs in order to reduce anxiety. The goal was to simulate a woman who defensively denied her feelings of hatred toward a father who abandoned her. Colby represented the woman's beliefs in the form of simple sentencesand designed linguistic operations to perform belief transformations. If a belief were introduced that came "too close" to the truth, the system would identify all resulting conflicts within the larger belief pool, and select a defensivetransform to suppress the troublemaker. An excellent description of this system is presented in Ref. 2. NarrativeInference A different perspective on emotion modeling was pursued by Dyer (3) in his implementation of the BORIS system. BORIS was designed to understand and answer questions about narrative texts by drawing from multiple knowledge structures. One class of inference addressedby BORIS involved knowledge about affective reactions in responseto goal states and interpersonal relationships on the part of the namative's characters. Dyer hypothesized knowledge structures called ACEs (affect as a consequenceof empathy) in order to account for a variety of affective inferences. This work shows how knowledge about affective reactions can be organized and accessed during text comprehensionin order to producecausally coherent memory representations for narrative text. (see Episodic memory). Memory Representation A somewhat different role for emotion in memory has been suggestedby results in cognitive psychology (qt). For example, facts learned when a subject is depressedmay be best recalled when the subject is again in a depressedstate (a). This suggests that emotional experiences might be nothing more than a side effect of processesthat play an important role in memory access(see Memory organization packets). One computational model of memory representation that has attempted to link memory with emotions is the theory of plot units (5). Plot units are designedto facilitate the problem of summarizing narratives by creating a level of memory representation to highlight the most important and central concepts in a story. Each plot unit represents a configuration of emotional states that can be derived from chronological affect state maps constructed for each character in the narrative. Interestingly enough, the affect state maps needed require only a minimal theory of human emotion. It is enough to dis-
tinguish positive states, negative states, and neutral mental states (seeStory analysis). The plot unit approachto narrative summaruzationdoesnot exploit the emotional reactions of the reader. What counts is the reader's ability to infer a narrative character's emotional states.This suggeststentative evidencethat a computer'sability to summarize narratives might rely on manipulations of affect-oriented memory structures. Humor A somewhat more speculative role for affect has been suggestedby M. Minsky, who has proposeda relationship between humor and cognitive thought processes(6). Although Minsky does not make any claims about emotional states in general, he does suggest that jokes are funny to people as a conscious reflection of crucial and complex thought processesthat operate unconsciously. Minsky expands an idea first set forth by Sigmund Freud concerning the possibility of cognitive "censors" who set up powerful unconscious barriers to block nonsensical and unproductive trains of thought. A joke is funny becauseit somehow manages to slip by the censors (at least the first time it is heard). This experienceof thinking "forbidden" thoughts then creates a form of tension that is released by laughter until an appropriate censor can be created to prevent similar intrusions into mental territory that is deemedoff limits. These thought censorsmight just be the critical component needed for the successfuldevelopment of memory integration and knowledge acquisition (qv) techniques underlying commonsensereasoning (qv). Without them, one'smental reasoning processwould be free to wander through endlessly expanding associationsand search spaces,much like an unconstrained tree search taken to indefinite depths. Conclusions AI researchers interested in modeling human cognition (qv) should be concerned with the role of emotions as a possible clue to the problems of massive memory organizatton and learning. However, the task of modeling emotional reactions themselves may not shed any light on the broader question of how emotions relate to intelligent thought processes.It is intriguing to consider the possibility that human emotions might provide a key to the difficult problem of commonsense reasoning and general inference processes(see Reasonitg, commonsense;Inference). But for now, the role of emotion in computational models of human cognition deservesfurther investigation.
BIBLIOGRAPHY 1. K. Colby, Simulations of Belief Systems,in R. Schank and K. Colby (eds.), Computer Models of Thought and Language, Lawrence Erlbaum Associates,Hillsdale, NJ, pp. 25L-286, 1973. 2. M. Boden, Artificial intelligenceand Natural Man, Basic Books, New York, pp. 21-63, 1977. 3. M. Dyer, "The role of affect in narratives," Cognit. Sci. 7(3), zIL242 (1983). 4. G. Bower, "Mood and memory," Am. Psychol., 36(2), 129-148 (1981). 5. W. Lehnert, Plot Units: A Narrative Summarization Strategy, in
MEMORY EPISODIC
275
yoke of behaviorism, and many are uncomfortable with the idea of a key distinction being made in phenomenological terms. Consequently, somewhat less subjective criteria have been proposed.According to some of the more prominent of these, episodic memory is of an event rather than a fact, is temporary rather than permanent, is related to other contents of mind in a temporal rather than a conceptual w&y, has a veracity that is arbitrated by the rememberer rather than by W. G. Lnuxpnr experts, and originates from a particular occasionrather than Universityof Massachusetts from many different occasions. Such criteria are not entirely satisfactory. As an example, the occasionof hearing the news of President Kennedy's assasEMYCIN sination might seemto constitute an excellent example of episodic memory, and yet it could be argued that it violates each A nonspecific system for constructing rule-based expert con- of these criteria, in that it may serve as a source of knowledge sultation prograhs, EMYCIN was written in L979 by van adequate for responding to factual questions,endure until the Melle at the Stanford Heuristic Programming Project (seePro- rememberer dies, be brought to mind in associationwith conduction systems). EMYCIN is abstracted from the domain- ceptually related events, be shown to be at variance with obindependentpart of MYCIN and has been used to build several jective evidence,and be thoroughly fused with memory for the other expert systems in different problem domains (seeW. J. many occasionson which the incident has been thought about van Melle, System Aids in Constructing Consultation Pro- or pictures or replays of it seen. Conversely,knowledge of the gra,ms,UMI ResearchPress, Ann Arbor, MI, 1980). assassinationof President Lincoln is unlikely to be regarded as the product of episodicmemory, and yet it is about a particM. Tern ular event, may have lost much of its detail during the time SUNY at Buffalo immediately following its acquisition, may be organized in a way that reflects its temporal relation to other historic events, could appropriately be judged for veracity by the knower (as AUTOMATION. See Computer Systems;Com- when the knower happens to be an authority on the subject), ENGINEERING puter-aided Design. and could conceivablybe entirely attributable to a single occasion even though the occasionas such may be beyond recollection. At a more general level disentangling episodic memory and semantic memory is complicatedby the needthat eachhas EPAM of the other. Knowledge, or at least the bulk of it, has its origin in events, and events require knowledge to be understood. A program that simulates human learning of nonsensesyllaThe strong interrelation of episodic memory and semantic bles by building a discrimination net (see Language acquisi- memory (qv) and the lack of clear objective criteria for distintion), EPAM shows effects also observable with human sub- guishing between them have helped persuade many of those jects, namely oscillation, retroactive inhibition, and forgetting wary of phenomenologythat the distinction is not of fundawithout information loss [see E. A. Feigenbaum, The Simula- mental significance. But, regardless of its theoretical status, tion of Verbal Learning Behavior, in E. A. Feigenbaum and J. the distinction remains useful as a descriptive device, and for Feldmann (eds.),Computersand Thought,McGraw-Hill, New present purposesepisodicmemory will be used in referenceto York, pp. 297-309, 1963;E. A. Feigenbaumand H. A. Simon, those experimental procedures in which, loosely speaking, "EPAM-like models of recognition and learning," Cogn. Sci., subjects respond on the basis of specific events rather than 8(4),305-336 (1984)1. their general knowledge. It is perhaps worth noting that the definition of an event is J. Gpllnn necessarily arbitrary. A vacation in Europe, an excursion to SUNY at Buffalo Paris during that vacation, a visit to the opera during that excursion,a particular aria in the opera, or a particular note in the aria could each be consideredan event. In most episodic memory experiments events are typically defined as presentaMEMORY EPISODIC tions of specificitems, such as numbers, words, pairs of words, Episodic memory is usually contrasted with knowledge (see sentences,or pictures. Memory for these item presentations is Semantic Memory). The distinction, which was brought to the usually referred to simply as memory for items. The conditions fore by Tulving in L972 (L,2), takes a variety of forms but is of item presentation, the number of items presented, and the perhaps most readily appreciated in a phenomenologicalway: study-to-test delay vary accordingto the purposeof the experiment. Most memory tests are of one of three forms: unaided Episodic memory refers to the recollection of a particular event and is characterized by a definite awareness that the recall, cuedrecall, or recognition.In an unaided recall test the set of to-be-recalledevents is specified,albeit usually implicevent was personally experienced,whereas an item of knowledgeis usually more abstract in the sensethat it is brought to itly, and the subjectsreport as many events as they can. This test usually requires either serial recall, in which case the mind with no recollection of the event or events from which it items have to be reported in their exact order of presentation, was derived. free recall, in which case the items can be reported in any not fully off or psychologists have shaken the Experimental
W. Lehnert and M. Ringle (eds.),Strategiesfor Natural Language Processing,Lawrence Erlbaum Associates,Hillsdale, NJ, pp. 3754r2, 1992. 6. M. Minsky, Jokes and the Logic of Cognitive Unconscious,in L. Vaina and J. Hintikka (eds.),CognitiueConstraintson Communication: Representationsand Processes,D. Reidel Publishi.g, Boston, pp. 175-200, 1984.
276
MEMORY EPISODIC
order. In a cued recall test subjectsare given hints, or cues,to facilitate recall. More often than not a separate cue is presentedfor each item, although sometimesmore than one cue is presented for each item or one cue serves for more than one item. A cue may take many forms: For the word EAGLE, it may be a fragment of the word (e.g., -AG-E), a context item (e.g., "emerald" if the presentation item had been the word pair emerald-EAGLE) or something that had not formed part of the study list (e.g.,"a kind of bird" or "rhymes with beagle"). In a recognition test the to-be-remembereditems are intermixed with new items, referred to as lures or distractors, and the subjects' task is to decide whether each item occurred in the study list. The proportion of items given a positive responseis sometimes left up to the subjects and is sometimes specifiedby the experimenter. In addition, the subjectsmay be required to rate the confidencethey have in each decision.For a comprehensive account of these and other procedures for studying memory, see Ref. 3. In discussing the issues and findings of episodicmemory research, it is useful to distinguish between prim ary and secondary memory. The distinction was originally formulated by James (4) in phenomenological terms. Specifically, primary rnemory refers to the remembering of events that have never left consciousnessand that therefore belong to the psychological present. Secondary memory, or memory proper, refers to memory for events that, though represented in consciousness immediately after their presentation and again upon recollection, are not continuously maintained in consciousnessbetween these times; they belong to the psychologicalpast. In more contemporary usage, primary and secondary memory are generally thought of in a more conceptual way-they are often cast as separate memory stores or systems. In any case,primary memory refers to events that have occurred most recently, and secondary memory to events from further back in time. PrimaryMemory Most of what has been learned about primary memory concerns either its qualitative nature or its capacity, and these topics form the basis of the present discussion.The dominant form of inquiry has been objective experimentation, but as is apparent, introspection, however informal, has also played an essential role. Nature of PrimaryMemory. Objective experimentation and introspection both show that primary memory takes on the character of the perceptual-motor world. It assumesan auditory, visual, or some other sensoryquality or, as when a manual task is mentally rehearsed, a proprioceptive quality. Depending on whether it preserves the sensory quality of the event that gave rise to it, primary memory might be said to be direct or indirect. A strong argument for direct auditory primary memory' or echoicmemory, is given by the very fact of speechperception. Of its nature, speechis spreadout over time, and in order for it to be understood, information occurring at any one instant has to be integrated with a precise record of information that occurred immediately beforehand.Introspection confirms the existence of such a record. At each successiveinstant that speech is being heard, memory for the immediately precedittg few words has a freshnessfar more original than doesmemory for
earlier words. Not only is it plain exactly what these words were but it is almost as though they can still be heard, with such details as tone of voice, intonation, and accent clearly preserved.Memory of this sort is difficult if not impossible to sustain through further speech; subsequent words are apt to take their place in echoic memory whether or not the rememberer wishes it. Much of the experimental research on echoicmemory has involved the serial recall of spokenlists of about eight digits or words. Serial position functions, obtained by determining the probability of recall for each within-list position, show that level of recall increases sharply over the last two or three positions (5). The echoic nature of this recencyeffect, as it is called, is indicated by its virtual absencewhen the items are presentedvisually (6) or when they are acoustically similar to one another (7). The vulnerability of echoic memory to the effects of additional auditory information is illustrated by a sharp reduction in the recency effect when the list items are followed by an additional, nominally irrelevant item (8). Direct visual primary mem otY, known as iconic memory' has a fidelity even more striking than that of echoicmemory. Indeed, it is of such a quality as to create the illusion that the information is still present. The illusion occurs in watching a film. Iconic memory allows continued perception of the picture shown in a given frame during the time it takes to replace the frame by the next one, with the result that the film is seennot as a flickering sequenceof still pictures but in the same smooth way that the real world is seen. The persistenceof iconic memory has been measured by repeatedly flashing a visual stimulus and having subjectsadjust a click to coincide first with the onset of the flash and then with its offset. The interclick interval was found to exceedthe actual duration of the flash by up to 200 ffis, the discrepancybeing attributed to iconic memory (9). The utility of iconic memory has been demonstrated in a study by Sperling (10).Arrays of up to 12 digits and letters arranged in two or three rows were exposedfor 50 ms. Subjects wrote down as many items as they could from either the whole array (the whole report condition) or just one of the rows, with the choice of row signaled by a tone of high, medium, or low pitch occurring at the instant the array was physically terminated (the partial report condition). Responseswere appreciably more accurate in the partial report condition. For example, when the stimulus set consistedof 12 items arranged in three 4-item rows, subjectsin this condition reported an average of 3.03 items; since they could not predict which of the rows would be signaled, there must have been no fewer than approximately g.L items from the entire array that were in a reportable state at the time the tone occurred.This number was appreciably greater than the mean for the whole report condition, which was only 4.3 items. Apparently, the subjects retained the array in iconic form after its physical termination and so were able in the partial report condition to selectively read off items from the signaled row. That iconic memory rapidly losesits utility is indicated by the finding that delaying the signal by as little as 300 ms greatly reduced the partial report advantage. An example of indirect primary memory can be found in the verbatim retention of material just read. The material was perceivedvisually, but introspectionreveals its consciousrepresentation to be more auditory in nature, a sort of silent speech.Experimental confirmation of this impression comes from a study by Conrad (11), in which subjectswere given
MEMORY EPISODIC
277
the area under the recencypart of the function (i.e.,the sum of the recall probabilities for the last few positions) can be taken as a first approximation, and this turns out to be about 3.5 items. Note that this is substantially less than the estimate given by memory span. One or two variables have been shown to distinguish between prerecency and recency portions of the serial position function in just the oppositeway-that is, they affect recency but not prerecencypositions. Specifically, the recency effect is Capacityof PrimaryMemory. How much information can be largely eliminated if subjectsengage in a verbal task (such as retained in consciousmind at any one instant? What, in other simple arithmetic or copying down several other words) bewords, is the capacity of primary memory? This question tween presentation and recall (16) and is reduced slightly if raises a number of unresolved issues,not the least of which is the items are presented visually rather than auditorily (17). the appropriate unit of measurement. It is perhaps to mini- Neither of these exceptions seriously undermines a prim ary mrzethis particular difficulty that the question of capacity has memory interpretation of the recency effect. Thus, it is not been raised almost exclusively with respectto verbal material, unreasonable to supposethat an interpolated verbal task difor this can be broken into discrete units in a relatively objec- verts consciousmind from the recency items and that primary tive fashion. The relevant research falls into two categories: memory could be of slightly greater capacity when of a direct one concernedwith memory span, the other with the recency echoic form than when of an indirect phonological form. More serious are findings of a recency effect when subjects are diseffect. Memory span refers to the number of items for which there tracted after the presentation of each individual item (18), is an even chanceof perfect reproduction after a single presen- simultaneous recency effects for more than one set of items tation. Roughly speaking, this turns out to be seven items. (19), and, as is noted below, substantial recency effects over intervals spanning several weeks (20). Plainly, recencyeffects This fact suggests a model whereby primary memory is likarise for reasonsother than retention in primary memory. locations. a can Such distinct seven containing a to store ened model may be adequate for some purposes, but it does not The implications for the measurement of primary memory caaccount for the modest variations in memory span that do pacity remain a matter of debate. occur between types of items or conditions of presentation. Thus, memory span is about an item greater for letters than Memory Proper for words and about an item less for letters than for digits (I2) and about half an item greater with auditory presen- Memory in the dominant sense of the term-memory for tation than with visual presentation (13). An adequate events that have passedfrom and have to be brought back to interpretation of these and other complexities has yet to be consciousmind-is discussedin two parts, the first dealing with the events as they are experienced and the secondwith formulated. Apart from these empirical puzzles, there are reasons to their recollection. doubt that memory span is, even in principle, a valid measure EventsAs Experienced.Events vary in how long they are of primary memory capacity. For instance, memory span has some are remembered only fleetingly, others for frequency remembered; have high of greater a for words that been shown to be everyday occurrencethan for words that occur less often, and the better part of a lifetime. Yet, as obvious as this is, very becausethis variable is generally assumed to affect memory little effort has been made to identify and systematize the proper but not primary memory, the implication is that mem- variables that control memory persistence. For the present ory span may include one or two items from memory proper. purposesthese variables are organized into seven factors: duration, meaningfulness,emotionality, vividness, organizatton, Also, even if memory span were entirely the product of primary memory, it may reflect more the upper limit than the distinctiveness, and recency. These factors should not, of capacity typically used in attending to a continuous stream of course, be considered to be independent of one another, for more time allows for better organi zatron, emotional events information. This latter possibility suggeststhat the capacity of primary tend to be meaningful or vivid or distinctive, recency can be memory might be more appropriately ascertained by using a thought of as a form of distinctiveness, and so on. AIso, it is list length substantially in excessof memory span and focus- important to keep in mind that episodicmemory requires both an event and a rememberer and that each factor refers not to ing on recall of the last few items. A great many studies of this the events per se but to the events as experienced.One and the sort have been conducted,most of them using lists of 12-20 randomly selectedwords and a free recall test. Serial position same event may be experiencedand rememberedquite differfunctions reveal a recency effectspanning the last six or seven ently, dependingon the individual's knowledge,interests, inpositions (14), and this is generally attributed to primary tentions, and the like. This point will becomeapparent in conmemory. Of particular interest are findings that list length, sidering the individual factors. Duration. The effect of event duration has been demonthe rate at which the words are presented,the concretenessof the words (or rather of their referents), the frequency of the strated by varying rate of presentation in a word list experiwords in everyday usage, and many other variables have an ment: A leisurely rate results in a higher level of recall than appreciableeffect on recall of prerecencyitems but little if any does a brisk rate QL). Of course,the actual time for which a on the recall of recency items (15). Although there has been given word is presented is not necessarily the same as the much discussion of precisely how the capacity of primary effective study time. A word may still be thought about, or memory should be estimated from the serial position function, covertly rehearsed,while later onesare being presented.Some
serial recall tests on short sequencesof letters. To avoid extraneous difficulties, Conrad was interested in only those sequences in which subjects erred on just one letter. He found that the incorrect letter was acoustically similar to the letter that should have been reported. Given that acoustically simiIar letters do not tend to be visually similar, the implication is that the visually presented letters were codedin a speechlike form.
278
MEMORY EPISODIC
theorists have gone so far as to claim that effective study time is the principal factor determining memory Q2). Meaningfulness.The importance of meaningfulness can be readily appreciated by considering the effect that knowing the Ianguage has on remembering a verbal messageor that having the relevant expertise has on recognizing a previously seen x-ray photograph. Compelling experimental evidence for the effect of meaningfulness comes from studies showing that a clarifying sentenceor picture can sharply enhancememory for an otherwise cryptic passage(23). Closelyrelated to meaningfulness is the concept of "depth of processing"(24), which has given rise to a large number of experiments in which the way events are attended to is systematically varied. In many experiments of this kind the subject is presentedwith a series of randomly selected words and engages in a task designed to draw attention to either their semantic or nonsemantic aspects. Semantic tasks, such as rating the pleasantnessof the words (or, strictly, of their referents), produce higher levels of recall than do nonsemantic tasks, such as deciding whether the words contain a designated letter (25). Emotionality. Experimental research on emotionality has been limited for practical and ethical reasons,but there can be no question that this factor can exert a powerful influence on memory. Seeing a loved one in great physical distress,being acutely embarrassed,or receiving praise from someoneof high authority is likely to be long remembered. Vividness.Graphic events tend to be more memorable than do dull or vague events; memorable talks are generally lively and rife with concreteexamples.Experimental confirmation of the effect of vividness comesfrom the finding that objects are more likely to be remembered than are pictures of the objects, which in turn are more likely to be remembered than are their names QO. Organization. In one experiment that demonstratesthe effect of organizatton, subjectswere presented with a set of Llz words displayed in treelike configurations in which the words were placed either at random or in a manner designedto bring out their relation (e.9., "platinum," "silver," and "gold" were nestedunder "rate," which along with "common" and "alloys" was nested under "metals," which along with "stones" was nestedunder "minerals"). After three successivepresentations subjectsin the random condition failed to recall an average of 42 words, whereas those in the organized condition missed none at all (27). Even when organization is not built in by the experimenter, the rememberers may introduce their own. Trained rememberers may use powerful mnemonic systems,but even people with no such training have been shown to impose their own idiosyncratic, or subjective, organizatton when required to master a list of words randomly selectedfrom a homogenous set. Words were presentedrepeatedly, each time in a new random order, and a free recall test was given after each presentation. A steady improvement in recall acrosssuccessivetests was paralleled by a steady increase in the consistencyin the order in which the words were recalled. The implication is that the learning of such lists is fundamentally a matter of developing an organization (28). Distincfiyeness.A day at a cricket match is more memorabte if it is the only day ever spent at a cricket match than if it is merely one of many. The point is illustrated by the Von Restorff effect (29), which refers to the comparatively high level of recall of an item that stands out in some way from
among the others with which it occurred. For example, the word "Chopin" will have a much higher probability of being recalled if presented in an otherwise uniform list of color names than if presented in a list comprisedentirely of famous composers.More generally, the probability of recollecting an event falls off in a steady and systematic fashion with the number of other similar events with which it occurred, as shown by the finding that recall of a given item from a study list declines as the length of the list increases(21). Recency.Within a set of similar events,the most recent are generally the most likely to be recollected. This point ffiaY, perhaps, be appreciated by trying to recall movies seen' restaurants visited, books read, and so on (e.g.,seeRef. 20). What kinds of theories have been proposedto account for these various factors? Although many in number and diverse in form, virtually atl theories conceptualizememory as being embodiedin somesort of memory trace, which servesto bridge the temporal gap between the occurrence of the event and its recollection. This notion constitutes a hypothetical counterpart of the physiological approachto memory, and it holds the promise that someday the psychology of memory will be explained in physical terms. In addition, it is convenientin guiding the practice of simulating memory phenomena and in the field of AI generally. On the negative side it betrays a complexity that is often overlooked. The very idea of memory traces implies that remembering comprises three distinct stages:a trace formation stage, a trace retention stage, and a trace utilization stage.Furthermore, contemporary versions of this three-stage model are formulated in information-processing terms, and consequentlythe three stages are well articulated, with the traces being subjectedto various "processes"at any stage. By choosing the design of the overall system, the nature of the hypothesizedtrace, and the processesthat operate within the system, each of the factors determining event memory can be interpreted in a virtually unlimited number of ways. To take the first factor as an example, longer-lasting events can be assumed to survive longer in memory because they give rise to stronger traces, to more durable traces, or to more accessible traces, thereby localtzing the effect at the trace formation, trace retention, or trace utilization stage, respectively;or, in processterms, they can be assumedto allow more rehearsal, a greater depth of processing,the creation of vivid images, more organization, and so on. Unfortunately, such alternative interpretations do not always generate different predictions, and the criteria for deciding among them may be nothing more than eleganceand style. Recollection.Much of the theory and researchon recollection can be summarized in terms of four models: the all-ornone model, the threshold model, the generate-recognize model, and the encoding-specificityprinciple. The all-or-none model assumesthat an event is either recallable or not and that forgetting occurs when a recallable event becomesunrecatlable. It founders on the finding that different types of tests yield different levels of performance. An unaided recall test might indicate no memory for an item, and yet the item might be produced in a cued recall test or identified in a recognition test. The effect of type of test is accountedfor by the threshold model, according to which memories are representedby traces of variable size or strength. Recollection occurs when trace strength exceedsa threshold value that dependson the type of
MEMORY EPISODIC test. In relative terms the threshold will as a rule be high for unaided recall, intermediate for cued recall, and low for recognition. Since more traces will exceeda low threshold than a high threshold, this model neatly explains the effect of type of test on the probability of recollection. But it, too, has problems. For example, it has been found that recall of a word list is greater when the words have been selected in such a way that they conform to several distinct semantic categoriesthan when they are unrelated, whereas recognition shows no such effect (30). Such interactions cannot be accounted for by the threshold model, for the results of different tests lead to conflicting conclusionsabout how variables affect trace strength. The generate-recognize model postulates two distinct stages of recollection: In the first stage representations of potential target items within some permanent knowledge system are found or "generated"; in the secondstage each candidate item is subjectedto a "recognition" test. Neither stage is available to introspection; only those items that are generated and given a positive recognition decisionare made available to the consciousmind. Like the threshold model, the generaterecogpizemodel readily accountsfor the effect that type of test has on performance.The recognition test provides a copy of the target item, which ensures the item's generation. The free recall test provides minimal guidance for the generation stage, and the cued recall test typically provides some guidance, though not enough to guarantee generation. Thus, the effect of type of memory test is localized at the generation stage. The generate-rec ognuzemodel also provides an intuitively plausible explanation of interactions between type of test and experimental conditions. The effect of using semantically structured Iists, for example, would be to facilitate the generation process,which means that recall, not recognition,is affected.The model can even account for findings of free recall and recognition being affected in opposite ways. For example, under certain conditions words with a high frequency of everyday occurrence show a higher level of recall and a lower level of recogli.itionthan do rarer words (31). This would simply mean that low-frequency words are more likely to pass the recognition stage (perhapsbecausetheir presentationswere comparatively distinctive events), but this advantage is not sufficient to offset the disadvantage that these words suffer at the generation stage. Despite these accomplishments, the generate-recognize model does not adequately account for all of the evidence.In particular, it fails to account for evidencethat recognition performance varies according to the context in which the recognition test item is presented(32).According to the model, recognition performance should depend only on the decision of whether the permanent representation of the item is appropriately tagged and not on its context. The model is also undermined by findings that a word studied in the presence of a "context" word but not identified in a subsequentrecognition task may be recalled when the context word is re-presentedas a cue (33). The failure in recognition indicates that the word's permanent representation was not in a state that would support a positive recognition decision, whereas the successin cued recall indicates that it was. In light of such probleffis, some theorists have abandoned the generate-recognrzemodel. One idea offered in its place is the encoding-specificityprinciple (33),accordingto which recollection occurs when there is a sufficient match between the test situation and target event as experienced.No significant
279
distinction is drawn between recall and recognition. A recognition test is usually more successfulthan a recall test becauseit includes information (namely, the test item or "copy" cue)that closely matches the memory trace for the event. That a context word may under some conditions be more effective than the copy cue means merely that the context word matches the trace more closely than doesthe copy cue. One disadvantageof the encoding-specificity principle is that it doesnot generate predictions and so cannot be tested. The difficulties in formulating an adequate interpretation of event recollection multiply when the events involve more complex material. Recollection of a lecture or conversation is likely to take the form of a summary rather than an unabridged verbatim reproduction. Moreover, it is likely to be expressedlargely in the rememberer's own words. The implication is that such recollection is substantially a matter of reconstruction, of using knowledge of the world-perhaps conceptualizedas schemataor scripts (qv) (34)-to piecetogether an accountfaithful to the gist, or perhapsjust to the tenor (35), rather than to the details of the episode.
BIBLIOGRAPHY (eds.),Organization of 1. E. Tulvirg, in E. Tulving andW. Donaldson Memory,AcademicPress,New York, 1972. 2. E. Tulving, Elementsof EpisodicMemory,Clarendon,Oxford, u.K., 1983. Methodsin HumanMemory 3. C.R. Puff (ed.),HandbookofResearch Press,New York, 1982. and Cognition,Academic Holt, New York, 1890. 4. W. James,ThePrinciplesof Psychology, 5. B. B. Murdock,Jr., "Serialordereffectsin short-termmemory,"J. Exper.Psychol.Monogr.Suppl.76, 1-15 (1968). 6. R. ConradandA. J. Hull, "Input modalityandthe serialposition Sci. 10, 135-136 curve in short-term memory," PsychonoTrl. (1e68). 7. R. G. Crowder, "The sound of vowels and consonantsin immediate memory,"J.Verb. Learn. Verb.Behau.l0' 587-596 (1971). 8. J. Morton, R. G. Crowder, and H. A. Prussin, "Experiments with the stimulus suffix effect," J. Exper. Psychol. Monogr. Suppl.9L, 1 6 9 - 1 9 0( 1 9 7 1 ) . 9. R. N. Haber and L. G. Standing, "Direct estimates of the apparent duration of a flash," Can. J. Psychol. 24, 2L6-229 (1970). 10. G. Sperling, "The information available in brief visual presentations," Psycltol.Monogr. 74(L1),Whole No. 498 (1960). 11. R. Conrad, o'Acousticconfusionsin immediate memory," Br. J. P sychol. 55, 7 5-84 (1964). 12. C. W. Crannell and J. M. Parrish, "A comparisonof immediate memory span for digits, letters, and words," J. Psychol. 44, 319327 $e57). 13. A. Drewnowski and B. B. Murdock, Jr., "The role of auditory features in memory span for words," J. Exper. Psychol.: Human Learn. Mem.6, 319-332 (1980). L4. B. B. Murdock, Jr., "The serial position effect of free recall," J. Exper. Psychol. 64, 482-488 (1962). 15. M. Glanzer, in G. H. Bower (ed.),The Psychologyof Learning and Motiuation: Aduances in Researchand Theory, Vol. 5, Academic Press,New York, pp. 129-193, L972. 16. M. Glattzer and A. R. Cunitz, "Two storage mechanisms in free recall," J. Verb. Learn. Verb. Behau.5, 351-360 (1966). L7. B. B. Murdock, Jr., and K. D. Walker, "Modality effects in free recall," J. Verb. Learn. Verb. Behau.8, 665-676 (1969).
280
EPISTEMOLOGY
18. O. J. L. Tzeng,"Positive recencyeffect in a delayedfree recall," J. Verb. Learn. Verb. Behau. 12, 436-439 (1973). 'Three recency effects at the 19. M. J. Watkins and Z. Peynircioglu, (1983). Behau.22,375-384 Learn. Verb. Verb. same time," J. (ed.), and Attention in Dornic S. Hitch, G. 20. A. D. Baddeley and Performance,Vol. 6, Lawrence Erlbaum, Hillsdale, NJ, pp. 647667, 1977. 2L. B. B. Murdock, Jr., "The immediate retention of unrelated words," J. Exper. Psychol. 60,222-234 (1960). 22. D. Rundus, "Analysis of rehearsal processesin free recall," J. Exper. Psychol. 89, 63-77 (1971). 23. J. D. Bransford and M. K. Johnson, "Contextual prerequisites for understanding: Someinvestigations of comprehensionand recall," J. Verb. Learn. Verb. Behau. LL, 7L7-726 6972). 24. F. I. M. Craik and R. S. Lockhart, "Levels of processing:A framework for memory research,"J. Verb. Learn. Verb.Behau.11,671684 (1972). 25. T. S. Hyde and J. J. Jenkins, "Recall for words as a function of semantic, graphic, and syntactic orienting tasks," J. Verb. Learn. Verb. Behau. t2, 471-480 (1973). 26. W. Bevan and J. A. Steger, "Free recall and abstractnessof stimuli," Science172,597-599 (1971). 27. G. H. Bower, M. C. Clark, A. M. Lesgold,and D. Winzenz,"Hierarchical retrieval schemes in recall of categorizedword lists," J. Verb. Learn. Verb.Behau.8, 323-343 (1969). 'unrelated' 28. E. Tulving, "subjective organization in free recall of words," Psychol.Reu. 69,344-354 (1962). 29. H. Von Restorff, "Analyse von Vorgangen im Spurenfeld.I. Uber die Wirkung von Bereichsbildungen im Spurenfeld," Psychol. Forsch. L8, 299-342 (1933). 30. D. Bruce and R. L. Fagan, "More on the recognition and free recall of organizedlists," J. Exper. Psychol. 85, 153-154 (1970). 31. V. Gregg, in J. Brown (ed.),Recall and Recognition,Wiley, New York, pp. 183-216, L976. 32. D. M. Thomson,"Context effectsin recognitionmemory,"J.Verb. Learn. Verb. Behau. Il, 497-51I (1972). 33. E. Tulving and D. M. Thomson, "Encoding specificity and retrieval processesin episodicmemoryi' Psychol.Reu.80, 352-373 (1e73). 34. G. H. Bower, J. B. Black, and T. J. Turner, "Scripts in memory for text," Cog.Psychol. ll, L77-220 (1979). 35. U. Neisser,"John Dean's memory: A casestudy," Cognition 9, I22 (1981). M. WITKINS Rice Universitv
EPISTEMOLOGY Epistemology is the field of philosophy that deals with the nature and sourcesof knowledge. Key conceptsinclude belief, perception,representation,justification, description, and evaluation. Epistemologistsinvestigate baseson which beliefs can be singled out as knowledge. This involves two tasks. First, beliefs must be characterized, usually by being built out of more basic components related to the nature of minds and their interaction with the world. From these components,subjects form representations of states of affairs, which are candidates for beliefs (seeRepresentation,knowledge).Characterizing beliefs is a largely descriptive task. Whatever arguments may be brought to bear, the idea is to describe beliefs, their sources, and their components accurately and usefully. Sec-
ond, beliefs that qualify as knowledge must be distinguished from those that do not. Deciding which beliefs qualify as knowledge is an evaluative task. At any given time epistemic approachestend to divide into those that focus on the first task, and hence have a descriptive flavor, and those that focus on the second,and so have a more evaluative bias. Epistemology has traditionally drawn insights from philosophy of mind, philosophy of science,philosophy of mathematics, and such outside fields as logic (qv), psychology(seeCognitive psycholory), mathematics, and the physical sciences. Some of these contributions have taken the form of raising questions about the limits of knowledge: for instance, mathematical knowledge is particularly problematic becausethe nature of mathematical objectsseemsto eliminate perception as a sourceof knowledge. In other casesa particular discipline is treated as a paradigm of knowledge: in continental Europe, unlike the United States and the United Kingdom (England), epistemologytoday means the theory of scientific knowledge. Central Problems The traditional view equates knowledge with justified true belief (seeBelief systems).This view has beenclaimedto originate with Plato and has dominated epistemology since the Enlightenment. Viewing knowledge as belief ties it to subjects (most obviously, but not necessarily, people).Viewing knowledge as true belief ties it also to the world, since a belief that did not accurately reflect the actual state of affairs would not be true. Justification lies between the knowing subjects and the world, providing the grounds on which the particular believer can be claimed to know. Understanding the nature and basesof knowledge therefore involves investigating the ways in which minds and the rest of the world interact. Sourcesof Knowledge. From ancient times, philosophers have investigated what sources of knowledge, if otrY, have authority. Such a source must reside either in the knowing subject (usually reason) or in something that links the subject to the world (usually senseperception).since classicaltimes, the authority of perception has been disputed on grounds ranging from hallucinations and dreams to modern claims that belief conditions perception. Yet if perception is rejected,what doesreason have to work with? Many philosophershave been extremely hesitant to allow pure reason unaided by perception as a possible source of knowledge. Although epistemologists and psychologists are both concerned with sources of knowledge, they are concernedwith them in entirely different senses.Psychology studies (among other things) the ways in which individuals come to believe things, the kinds of evidencepeople find persuasive,what affects responsesto proposals, and the like. The question for epistemology is what grounds form an adequate basis for the claim to knowledge,independentof whether those grounds historically contributed to belief or even whether people would actually find them convincing. Psychologists study human response and behavior; epistemologists examine what can in principle serve as grounds of knowledge for any knowing subject. fustification. Justification traditionally involves a demonstration of truth, usually by appeal to logic. But logic can only show that an argument's conclusion is true provided that its
EPISTEMOTOGY
premises are true. Either the premises need to be justified in turn or they do not. In the first case, it seems that the argument must either become circular or go into an infinite regress. In the secondcase,how can the conclusion bejustified if its premises are not? One way out of this dilemma is to identify a class of basic beliefs and,argue that by their nature they require no external justification. Most frequently, basic beliefs are taken as absolute and certain, though it would be possible to hold this kind of view and also to hold that basic beliefs can be mistaken. Basic beliefs then provide ajustificational foundation for other beliefs. This approach has come to be called foundationalism. Another way out justifies individual facts on the basis of their role within a larger system of beliefs. This view, called coherentism, is spelled out in Lehrer's Knowledge (7). Both coherentists and foundationalists have problems with views ofjustification that rest on formal logical demonstration or otherwise require thatjustification guarantee truth. In actual situations in which people claim knowledge, the kinds of justification presented are frequently less certain and more sophisticated than logical inference (qv) from accepted facts. Even in science,which has always seemedto provide a particularly clean example of knowledge, justifications often extrapolate beyond what logical inference justifies, involve probabi listic judgments, or follow other patterns of reasoning that differ from those of formal logic (seeReasoning, commonsense; Reasoning, default; Reasoning, plausible). Also, scientific knowledge is notoriously open to change. Given these difficulties, what kinds ofjustification warrant belief?
I I I I I I I I I I I I I I I I I I
I II I
I
History
The ClassicalPeriod. Concernswith the nature and sources of knowledge were clearly established in ancient Greeceby the time of Plato and the sophists (fourth and third centuries e.c.). The view that knowledge is justified true belief is traditionally attributed to Plato. In the Meno, an early dialogue, he distinguishes between true belief and knowledge, claiming that knowledge requires grounding as well as truth. In the famous parable of the cave in Book V of The Republic, Plato makes a three-way distinction between ignorance, belief, and knowledge, with knowledge requiring correct understanding of the Forms. It should be noted here that for Plato the proper object of knowledge lay in the eternal relations among universals, not in matters of material fact. The Theaetefzs provides the longest discourseon knowledge and again stressesthe needfor a logos grounding true belief before it can properly be called knowledge.Logos is the root word underlying the word "logic," but Plato seems to have meant the term more in its sense of order and law, hence requiring a grounding for knowledge that was solid but, for Plato, still less than clearly understood. For Aristotle, too, knowledge meant knowledge of universals. In his Prior Analytics, Aristotle limited his logical language to propositions of the forms "A is predicated of all of B:' "A is predicated of part of 8," "A is not predicated of all of B:' and "A is predicatedof none of B " (in modern terms, "All B's are A'sr" "SomeB's are A'sr" "SomeB's are not A'sr" and "No B's are A's"). He justified this restriction on the grounds that he was providing the means of deriving knowledg", and all true knowledge is knowledge of relations among Forms. Hence Certainty. Depending on the way in which a particular view despite the usual contrasts between Aristotle and Plato, in of knowledge is spelled out, knowledge may be possible withthis regard their views largely coincide. Aristotle's contribuout certainty. But the search for certainty has preoccupied tion lies in his formulation of the first formal system of proof philosophers for centuries and recurs persistently in the claim (i.e., the developmentof logic). that nothing can be called knowledge that could possibly be wrong. Hence many epistemic theories have involved a search The Enlightenment.Seventeenth-century epistemologists for a basis that not only allows forjustified true beliefbut also reinterpreted classical thought in light of the emergence of identifies a class ofbasic beliefs as both true andjustified with no room for doubt. Hence, in this version of foundationalism. science. This epistemic development expresseditself in two basic beliefs provide a sort ofsafety net against the skeptical trends: rationalism and empiricism. Ren6 Descartes(1596claim that people cannot properly be said to know anything at 1650) was one of the earliest major proponents of rationalism. all. T'lle search for epistemic certainty need not be tied to foun- A mathematician as well as a philosopher, Descartesset himdationalism; Wittgenstein's Uber Gewissheit (2) can be viewed self the goal of adapting to epistemolory the rigorous system of proof used in geometry. First, basic principles must be identias an attempt to argue for certainty without basic beliefs. fied whose truth is immediate and unchallengeable. From n"presentation. Beliefs are in people's heads; the objects of these principles all else must be proven by principles of logic. those beliefs in general are not. Hence the relationship be- This reliance on infallible first principles, which Descartes tween beliefs and their objects matters greatly for epistemolo- called clear and distinct ideas, forms one of the strongest chargtttt. What kinds of things are representations? How are they acteristics of Cartesian thought. That these principles are derived, and what links them to the things they represent? clear and distinct ideas is important. The rationalist view What relationships hold between the complex representations holds that certainty is to be sought in the conceptual realm, that constitute beliefs and the states ofaffairs they apply to? not in the material. Descartes made a sharp distinction beI" twentieth-century philosophy, Quine focused attention on tween mind and matter. This clearly defined gap brought the this class of issues in his classic Word and Object (3). In recent relationship between conceptsand their objectsinto sharp redecadesthe question ofrepresentation has become especially lief. Beliefs consisted of combinations of ideas, independent central, inspiring works by thinkers as diverse as Searle (4) mental entities that might or might not reflect any indepenand Fodor (5). Artificial intelligence, computational models, dent object of importance. Given this view, the need to link at cognitive psychology, and the interdisciplinary efforts in cog- least some representations to their objects arose. Descartes nitive science (qv) are influencing new philosophical works, filled in this link with his famous principle of cogito ergo sum originally centered in philosophy of mind and language but ("I think, therefore I am"), arguing that the very presenceof increasingly involved in epistemology, which center on repre- doubts as to the existenceof anything proved that at least one sentation as the link between the mental and material level. thing exists, namely the doubter. From this principle, basedon
282
EPISTEMOLOGY
orderly introspection and controlled argument, Descartesattempted to derive the rest of his metaphysics. Gottfried Willhelm Leibni z (L646-1716) also took rational principles as the basis of his epistemology. His primary contribution to epistemologylies in the extent to which he held that justification could be reduced to logic, which for Leibniz largely replacedintrospection. His view recognizedtwo classes of truths: those of reason and those of fact. Truths of reason,he claimed, were based solely on logic, in that their subjects strictly contained their predicates: given a complete, correct definition of the subject of a truth of reason,the truth in question could be reducedto the form "A is A" by use of logic alone. Truths of fact could not be so treated; but, he held, they could be derived jointly from the complete definitions of the terms involved and the assumption that God choseto create the best of all possibleworlds. In England empiricism arose in reaction to the rationalist tradition. Whereas rationalist views seek their basic truths in human understanding and hold that knowledge arises either out of pure understanding or out of mind acting on sensory information, empiricism in its purest form denies the role of purely mental construets and holds that aII knowledge is ultimately based in senseexperience.Early empiricist views appear in the works of Roger Bacon (ca I2I5-I29D and John Locke (L632-L704), but the primary expositors of the school are Bishop Berkeley (1685-1753), and David Hume (17111776). Because sensory information is unreliable, empiricist views tend toward skepticism. Berkeley attempted to form an empiricist epistemologythat avoided skepticism. He began by accepting a dualist approach concerning mind and matter. On this view, only sensations are directly experienced;any link with an external reality is assumed,not perceived.But sense perception forms the only basis for knowledge: reason can elaborate on perception but cannot arrive at any knowledge that is not both constituted of sensory-derivedparts and based on sense experience. By holding experience apart from any external reality and denying that perception is representative, Berkeley endorsedan idealist view under which all knowledge relates only to perception (mental constituents). In this way Berkeley claimed that he had avoided skepticism, since skepticism concerning perceptions themselves was clearly wrong, whereas the question of whether they faithfully captured the reality they were supposedto represent no longer arose. Hume found this move unconvincing since it defined away all knowledge of interest. Under his view, all knowledge has its basis ultimately in senseexperienceor (in the caseof mathematical and logical knowledge) necessary relations among ideas. In the first case,that of knowledge basedon senseexperience, Hume held that certainty was impossible in the sense that there is no way to show that such knowledge in fact reflects any real, external world. In the caseof relations among ideas, he held that ideas again are based on specific sense impressions and that they are never universal except in the manner of their representation. Hence the truths of logic and mathematics can be viewed as artifacts of a manner of thinking about them. This view is a precursor of twentieth-century logical positivism, especially as espousedby Ayer in Language, Truth and Logic (6). Kant. Immanuel Kant's (1724-1304) epistemologyrepresents a direct reaction against Humean skepticism and empiricism. With the empiricists, Kant held that all knowledge
arises out of senseexperience.That is, without input from the senses,ho knowledge whatever would be possible. However, he argued in the Critique of Pure Reason. it does not follow that sense experience provides the sole basis for knowledge. Kant held that there are principles about things (as opposedto tautologies) that are not based on experience and that for knowledge to be possible at all, these principles governing reality must be applied to sense experience to structure it for human conception. These synthetic a priori principles he called necessarypreconditionsfor the possibility of knowledge. In presenting his analysis, Kant provided a new perspective on subjectivity. Heretofore, philosophers had divided reality into the objective and the subjective. Objective reality always included material reality; for some philosophers it also included objective, universal principles (for instance, Platonic Forms). Truths about objectivereality were viewed as independent of facts about the knowers. Subjectivereality was individual and internal, constituted by the internal states of a particular subject (person).That is, although one particular person's actual emotions are real, they belong to that person only. By definition, no one else can have those particular feelings, although others might have feelings like them in interesting ways. Another way to put this is that subjectivity was viewed as radically individual and relative. Kant held that subjective but universal principles could be discovered,which belong absolutely to any subjectivity whatsoever, and that these principles were united in a real, abstract universal, which he called transcendental subjectivity. To answer his fundamental question of how knowledge is possible, Kant examined the structure of transcendental subjectivity for structuring principles which themselves transcend experience and which, when applied to experience,yield knowledge. These principles include space,time, and the 12 principles in the table of categories: the categoriesof quantity (unity, plurality, totality), of quality (reality, negation, limitation), of relation (substance and accident, causeand effect, agent-patient reciprocity), and of modality (possibility/impossibility, existence/nonexistence, necessity/ contingencY). As he had divided subjectivity into individual and transcendental subjectivity, Kant also divided the objectiverealm into phenomena and noumena. The noumenal level of things-inthemselves provides the grounding for knowledge but cannot itself be the subjectof knowledge.Things-in-themselvesconstitute naked reality, unstructured by the principles of transcendental subjectivity. Phenomenal reality is objective reality as structured by the categories and intersubjectively available. Only phenomena can be objectsof knowledge; but knowledge of phenomena becomes on this view reasonably straightforward. The Twentieth Century. The early twentieth century saw the introduction and spread of two new approachesto epistemic thought. In Europe, philosophers influenced by resurgences of rationalism but repelled by the excessesof nineteenth-century romanticism began trying to develop systematic approachesto rationalist views. This trend started in the late nineteenth century with Gottlob Frege and was taken up by Franz Brentano and his follower Alexius Meinong. Another follower of Brentano, Edmund Husserl (1859-1938),provided probably the most powerful impetus in this direction. Although his primary concerns lay more in metaphysics and philosophy of mind than in epistemology
EPISTEMOLOGY 283 proper, he took a view under which hypothesesabout the nature of mind functioned also as an epistemic foundation. That is, his view of the mind provided units of conceptionthat could serve also as the fundamental structuring units of knowledge. Husserl's technique consistedof an orderly, disciplined introspection of mental contents. This set of techniques formed the basis for phenomenology(qv). Like Descartes,Husserl based knowledge and certainty fundamentally on internal experience.However, he investigated the structure of mental components, abstracting from (in his terms, bracketing off) their actual referents, contexts, and individuality. In this manner he aimed at discovering the fundamental elements of knowledge and reasoning. These elements were understood as real (no less,and perhapsmore, real than tables and chairs) and as transcendentally subjective. On principles of transcendental subjectivity, Husserl basedhis analysis of knowledgenot only of principles of reason but also of objective material reality. In England and the United States focus also rested on founding knowledge on something fundamentally internal, accessible,and basic to understanding. The empiricist bias already present in the Anglo-American tradition guided this investigation in a direction different from that taken by Husserl, to concentrateinstead on sensedata as absoluteunits of knowledge.The sensedata view seemsto have been introduced by G. E. Moore and is closely associatedwith Moore, H. P. Grice, H. H. Price, and C. D. Broad. Like Berkeley, these philosophers recogntzed sense data as ultimately mental rather than material, so that this view sharesthe problem, so prevalent in epistemic theories, that it never seemsquite to get to the world. However,they treated sensedata as incorrigible: although a person could be mistaken about the interpretation of a sensedatum, they claimed that it was impossibleto be mistaken about having the datum itself. This form of certainty harkens back both to Descartes'scogito and to Berkeley's empiricism. Grice went further, arguing that sensedata are also at some level incorrigible links to the external world because perceptionis linked causally to the objectperceived.The sense data view formed a foundation from which complex knowledge could be built on the basis of epistemicsimples-sense dataand according to which a causal theory of perception could provide the required link between reality and knowing subjects. In this manner it was believed that an epistemic theory could be developedthat drew its elements from introspection but that nonetheless was rooted more in objective material reality than in any form of subjectivity. These two traditions dominated epistemic thought in the early decadesof the century. Then, shortly after World War II, a new trend arose.Up to this point epistemologistshad concentrated primarily on characterizing the sources from which knowledge aroseand the elements of which it was made. In the second half of the century English-speaking philosophers turned instead to analyzing the concept of knowledge itself. Instead of discussing where knowledge comesfrom, how it is possible,or what it is made of, philosophersbeganto ask what it means to have knowledge. Once again this trend split into two enterprises. In England a school of thought called ordinary language philosophy was approaching philosophical concepts by looking at how terms related to them are used in ordinary nonphilosophical discourse. This approach arose in the 1930sand was particularly popular in ethics,philosophyof language,and metaphysics.In the late 1940sphilosopherslike Ludwig Wittgenstein, Gilbert Ryle, and J. L. Austin began
applying it to epistemology, looking at nonphilosophical contexts in which it would or would not be consideredcorrect to say that someoneknows something. In the United States the analysis of the concept of knowledge took more the form of analyzing conditions for knowledge. At about this time it became commonto divide the definition of knowledge as justified true belief into three explicit clauses. Disregarding minor variations, the following definition became standard: X knows that P, just in case (1) X beliefs that P; and (2) X has an appropriate justification for P; and (3) P is true. Initially, this definition would be accompaniedby examples showing the necessity of each clause. However, especially since the 1960s,the literature has teemed with challenges to the definition, in the form of exceptionsand counterexamples, for which one or more of the clausesfails but that nonetheless would be called knowledge-or for which all three clauseshold but that seem nonetheless not to be casesof knowledge. The classic example of the latter is Gettier's "Is Justified True Belief Knowledge?" (7). The move to analysis focused attention on justification. Prior to this century it was generally held that scientific knowledge provides the best paradigm for knowledge in that scientific knowledge is particularly clearly stated, well organized, and well justified. It was also believed that the essence of scientific justification lay in proofs by the means of mathematics or logic. But new developmentsin physicsundermined the view that scienceapplies principles of logic to observations of phenomena.There had always been difficulty with the status of scientific laws based on the problem of induction (see Inductive inference):from finitely many observationsit is impossibleto derive claims that both cover an infinite number of casesand can be assured to preserve truth. But at least prior to the turn of the century, the idea of an experiment as a pure observation made sense.By the middle of the century it had becomeclear that a substantial burden of theory underlay the design of scientific experiments so that rather than deriving CIustifyi.g) theories logically from observations, scientists in fact derived the circumstancesfor observation from their theories. Philosophy of sciencebecame deeply concernedwith the relationship between theory and observation and with the pattern of growth of scientific knowledge, and this concernspilled over into epistemology. At the same time, new developmentsin logic began to raise questions about the force of logical arguments, even where available. The intuitionists, led by L. E. J. Brouwer and Arend Heyting, developeda system of logic that made sense,had an intuitive appeal, and outlawed inferences possible under the classicallogic used by mathematicians. This achievementindicated that what had previously been viewed as principles of logic can successfullybe denied without leading to nonsense. In doing so, the intuitionists showed that even proofs of logic involve inescapable metaphysical presuppositions about the nature of knowledge and truth. In addition, logic turned out to be less powerful than people had thought. It had generally been believed that any problem that could be given a mathematically precise statement could also be solved by applying logic. David Hilbert's prograffi, a proposal based on that assumption, was widely viewed as a comprehensivestatement of the challenge before mathematics. In 1931 Gridel (8) proved
284
EPISTEMOLOGY
that the assumption was false by showing that there are statements in the formal language of natural number arithmetic that are logical consequencesof the usual axioms of arithmetic but cannot be proved in the first-order theory (unless that theory is inconsistent). This is the famous incompletenesstheorem (see Completeness).If Gtidel was right, formal logical techniques did not even suffice for answering all arithmetic questions. Taken together with the growing recognition that justification in the sciencesdid not follow axiomatic paradigms and dealt more with degreesthan with absolutes,Gtidel'stheorem provided the first major attack on logic as a foundation for epistemic justification since Descartes adopted his quasimathematical approach. This challenge had profound consequences.As soon as epistemologists stopped taking logic for granted, the question of justification loomed. In the sciences, justification was becoming increasingly a matter measured in degrees rather than absolutes: it seemedreasonable to view justification in general that way as well. But if all justification is a matter of degree, at what point do we say that X has sufficient justification to qualify as knowing P ? Worst yet, if something must be true to be known, and ifjustification never sufficed to establish anything as more than probable, how could X ever know that X knew P, even supposingX did in fact know P? Taken to their natural limit, these concernslead onceagain to skepticism. Most recently another counterskeptical trend has arisen, this time in the form of what might be called a new epistemic naturalism, associatedin the 1970swith Armstrong (9) and in the 1980swith Dretske (10). This view once again rests on a view of the mind, which takes it not as a single, undifferentiated black box, but as crucially layered and segmented. Different segrnentsof the mind deliver knowledge of different kinds and at different levels. One of the central thesesholds that if all these levels were as error prone as seemsto be suggestedby views that hold that belief conditions perception, people would never survive. On this view, although at the "top" level one may frequently be confused or mistaken about perceptions,there are levels at which perception is not relative to beliefs or other higher mental states. That is, although some levels of perception and knowledge are prone to mistakes, there are also levels advanced enough to be called knowledge and to serve as epistemic basesfor knowledge to which higher level mental states do not penetrate. This theory of impenetrable layers of the mind gives an answer to skeptics that is designed to provide both knowledge and, in some limited cases,certainty. Al and Epistemology By its nature, AI both raises questions for philosophy and deals with areas traditionally philosophical in nature. It should not be surprising, then, that the relationship between the fields has proven reciprocal, each contributing to the other's development.In 1981 Newell reported on a survey that found that AI researchers consider philosophy more immediately relevant to their work than they do psychology (11). From the other side, interest in issues arising from computational models has manifested itself clearly in works by Searle, Dretzke, and many others. Because AI research involves so many different kinds of philosophical problems,it is often hard to isolate the interaction with epistemologyfrom interactions
with philosophy of mind, philosophy of langu uge, Iogic, and metaphysicsin general. This section provides a brief glance at the most clearly epistemic interactions between philosophy and AI research. In 1969 John McCarthy and Patrick Hayes pointed to the need to increase epistemologicalawarenessin the AI community, especially when researchersclaim that their systemsnot only provide adequate output but also capture some essential feature of human understanding. One issue underlying many disputes both within AI and between AI and philosophy is the question of whether AI researchproducessimulations of intelligent behavior, models of intelligence, or actual synthetic intelligence. The difference between these three, roughly speakirg, is this. A simulation of intelligent behavior involves producing behavior that might be produced through the use of intelligence; such simulations can be fully successful,regardless of the means involved in producing the results, providing only that the actual system output simulates intelligent behavior to some reasonable degree. A model of intelligence must produce appropriate output; but, in addition, it must do so by embodying processesand information representations that mirror intelligent processesand knowledge. A synthetic intelligence is a full-fledged knowing subject, different from natural ones (people)primarily in its history, not in its status. Among other things, McCarthy and Hayes argued that AI research that wanted to produce either of the latter two kinds of system must involve itself in epistemic issuesin order to show that its goals had been met. The concern with epistemology within the AI community has grown partly because of the increasingly central role of knowledge representation (qv) in AI research. In recent years it has becomemore and more apparent that appropriate representations are critical to many AI tasks, including naturallanguage processing(seeNatural-language generation; Natural-language interfaces; Natural-language understanding), Iearning (qv), and planning (qv). Developing knowledge representation systems involves embracing some theory of what constitutes knowledge and beliefs. The concernwith representation in AI thus leads naturally to examining work on the parallel epistemic issue.In addition, onceknowledgerepresentation schemeshave been developed,questions still remain on how to interpret the material they model. Brachman's article on epistemolory and semantic networks Q2) presents a good example of the kinds of alternatives available and why they are important. Interest in knowledge representation has led to concern with the relationship between acquaintance (know of) and propositional content (know that). This relationship is crucial for epistemic theories and provides an area of direct contribution from epistemologyto AI and related work, and vice versa. In this century, strong links have formed between philosophy of langu age and epistemolory, especially in the realm of semantics (qv). Beginning from such semantically motivated views, AI approaches to knowledge representation have resulted in highly articulated representational techniques,making clearer than before how internal contents can be structured to reflect meanings. Recently Fodor (5), Pylyshyn (13), and others have been building on insights from these and other sources,trying to arrive at theories of meaning that wiII form bases of knowledge. Although less articulated than the neonaturalist school,these views also respond in new ways to the skepticism arising from the recent trend to analysis of
EPISTEMOLOGY 285 knowledge,this time with an approachthat, although drawing insights from empirical disciplines, is rationalist in flavor. Their models are in turn available to AI research in developing and arguing for knowledge representation schemes. In addition to know-of and know-that, AI research must deal with know-how: competence.At a very high level the AI community has been involved in disputes over procedural versus declarative representations of information. To some extent, these disputes have their basis in issues such as efficiency; but there are also epistemic questions involved. Attempts simultaneously to represent competenceand propositional knowledge have raised for AI questions of how these two kinds of knowledge are related. These are traditional philosophicalquestions, and some insight into them can be gained by studying the philosophical literature. In the other direction, AI research has brought the distinction between competenceand propositional knowledge into focus for philosophers and has shown how deep the distinction runs, in that representational techniquesprovide solid, natural support for propositional knowledge and belief support competence (as opposedto knowledge about competence)at best awkwardly, and vice versa. Traditional epistemologyby and large neglectscompetence,except for reasoning ability and perception. Philosophers have worried about perception for centuries; however, most of that concern has centered on establishing a link betweenexternal phenomenaand the data that "get in." Now AI is raising new questionsabout perceptualcompetence that have philosophical implications. Research into vision (qv) and speech understanding (qt) has demonstrated dramatically that intern ahzrng pixel maps of images or oscilloscopecurves of sounds barely touches the requirements for acquaintance.Going from digital representations of images to recognition of the objectsin them turns out to be a huge step, even for simple, unmoving images from highly restricted angles of very simple objectsfrom small, predetermined sets. No one has the slightest idea how to get the most powerful machines made to process stereo images (see Stereo vision) in real time under circumstancesremotely like those under which human vision works. The classicexposition of a theory of vision from an AI perspective can be found in the pioneering work of the late David Marr (14). Speechrecognition is similarly complicated; for a discussionof the problems involved, see the 1980 report on the Department of Defense speech understanding project (15). It seems clear that the structuring capacities neededto go from naked inputs to percepts far exceedthose Kant described,in complexity if not in power. Knowledge representation (seeRepresentation,knowledge) and natural-language understanding (qr) have together openedthe question of how the information stored in symbols in a computer can be said to have meaning (represent knowledge)becausethose symbols seem to have no connectionwith anything outside the computer. This is a version of the problem of referencefor language in general. Insofar as knowledge is related to language and meaning, it is a new version of a familiar epistemicproblem: how doesthe knowledgethat is in one's head relate to the reality that lies outside it? Pursuing this issue with regard to computer understanding has led AI researchersto related philosophical literature. The attempt to develop systems that can understand natural-language texts has also reopenedfor computer scientists the traditional philosophical problems of referential opac-
ity and related limitations on consequence.For instance, systems that attempt to understand written stories must reahze that "John knows that Jane's beagle has fleas" and "Jane's beagle is Fido" can both be true without it being true that "John knows that Fido has fleas." In fact, "John says that Jane's beaglehas fleas" can be true even when Jane has no dog at all, let alone a beagle. But "Jane's beagle has fleas" cannot be true unless Jane has a beagle;and "Jane's beaglehas fleas" and "Jane's beagle is Fido" certainly together imply that Fido has fleas. Distinguishing contexts in which such inferencesare justified from those in which they are not has been a problem for epistemoloW for some time; with advancesin AI research, it has also becomea practical problem for AI researchers(see Belief systems). Artificial intelligence research has concentrated attention on areas of justification hitherto little understood. More and more, it is becoming clear that intelligence involves not only the ability to reason accordingto logic in situations of (at least assumed) certainty but also the ability to extend judgments reasonably, though in the technical senseunsoundly, into areaswhere information is known to be incomplete.Theseinvestigations into modesof reasoning that deal in degreesor that, in effect,jump to their conclusionsinstead of drawing them can be viewed from an epistemic point of view as attempts to formulate techniques of justification short of logic (see Reasonirg, default; Reasoning,plausible).As such,they contribute to the epistemological literature. In the other direction, the attempt to get systems to reason in contexts of uncertainty has focused attention on the traditional distinctions between knowledge and belief, and has led AI researchersboth into the traditional epistemic literature on grounds for rational belief and into a specializedformal literature on alternative logics, which, although it is not strictly speaking a branch of epistemolory, contains substantial epistemic content. A striking example of this literature is Hintikka's work on knowledge and belief (16). In addition to these interchanges, epistemologistsare beginning to borrow from work in AI and cognitive science(qv) which has developedmodels of the mind for which the notion of level is central. These models go beyond accepting a mental-physical distinction to provide explanatory power for epistemic views like those of Armstrong and Dretske (9,10).The relationship between AI and epistemolory is more complex than these remarks would indicate, though. When AI researchersdiscussknowledge, it is frequently unclear whether they mean knowledge in a sensean epistemologist would accept or whether they mean what philosophers would call reasonable belief. Some researchers have explicitly retreated to discussionof belief as opposedto knowledge or truth; Doyle (17) now refers to belief maintenance instead of truth maintenance, for instance, and Martins (18) has developeda belief revision system that draws heavily on technical results in epistemology and logic. However, even with that proviso, there remain substantial areas of overlap. So the current status of epistemolory again has two apparent thrusts, both influenced by research in AI and computational paradigms. Becauseof the interchange of conceptsin recent years, both have much to offer AI researchers, even abovethe traditional distinctions and analysesthat have been helpful to date, in terms of presenting groundworks on which to build representation systems and from which to argue that what is represented genuinely mirrors important aspects of
EPISTEMOLOGY
knowledge and belief. The first, a naturalistic view, rests on a multilayered analysis of mind to provide an antiskeptical empiricism. The secondtrend also draws from analyses of mind influenced by cognitive psychologyand derives its thrust from an emphasis on representation as the semantic link between knowledge and reality. It remains to be seenwhether this will develop into a rationalist counterpart of the new empiricism.
18. J. P. Martins, Reasoningin Multiple Belief Spaces,Ph.D. dissertation, Technical Report 203, State University of New York at Buffalo, 1983;J. P. Martins and S. C. Shapiro, Reasoningin Multiple Belief Spaces, \th Intt. Joint. Conf. on Artificial Intelligence, Karlsruhe, FRG, PP. 370-373, 1983.
General
References
Surueysand Collections in Philosophy BIBLIOGRAPHY
H. L. Dreyfus (ed.), Husserl Intentionality Press, Cambridg", MA, 1982.
and Cognitiue Science, MIT
P' Edwards (ed.), The Encyclopedia of Philosophy, Macmillan, New 1. K. Lehr er,Knowled.ge,OxfordUniversity Press,Oxford, 1974.An York, Lg6T.Includes surveys of knowledge and belief, the history analysis-of-knowledgestyle attack on foundationalism and preof epistemology, perception, sensa (sense data), logic, philosophy position. sentation of the coherentist of mind, philosophy of language, and most major philosophical 2. L. Wittgenstein, On Certainty (Uber Gewissheit), D. Paul and thinkers. G. E. M. Anscombe,trans., G. E. M. Anscombeand G. H. Von A' P. Griffiths (ed.), Knowledge and Belief, Oxford University Press, 1969. Oxford, Wright, eds.,Basil Blackwell, Oxford,, 1967. Ordinary language tradition; once again, the edi3. W. V. O. Quine, Word and object, MIT Press, Cambridg", MA, tor's introduction provides a survey. 1960. Pappasand M. Swain (eds.),Essayson Knowledge and JustificaG. S. 4. J. R. Searle,"Minds, brains, and progTams,"Behau.Brain Scie.3, University Press,Ithaca, 19?8.Articles in the Anglotion,Cornell 415-457 (1980); focuseson the centrality of semantics and refertradition of analysis of knowledge; the editor's introAmerican ence. Also, Intentionality, Cambridge University Press, Camduction provides a survey. bridge, UK, 1983; Searle's most recent work on semantics and R. K. Shope, The Analysis of Knowing, Princeton University Press, referentiality. NJ, 1982. Princeton, b. J. A. Fodor, The Language of Thought,Haward University Press, (ed.), Perceiuing,sensing,and Knowing, DoubledayAnJ. Swartz R. CamPress, MIT Cambridge, MA, L975 The Modularity of Mind, NY, 1967.Sensedata approach. City, Garden chor, bridge, MA, 1983. Brain Sciences,Vol. 6, No. 1, March 1983. InThe Behauioral and 6. A. J. Ayer, Language, Truth and Logic, Dover Publications, Inc., of his 1981 book (seeRef' 10)' commenDretske precis by a cludes NY, 1946. and a responseby Dretske. researchers, 20 tary by more than 7 . E. Gettier, "Is justified true belief knowledge?"Analysis 23, LzI123 (1963). The classic list of counterexamples to the view of Primary Sources from PhilosoqhY knowledge as justified true belief. and on Completeness are original philosophical works, which supplement the These 8. K. G6del, Some Metamathematical Results collections above. Well-known works from before the twentiConsistency,On Formally Undecidable Propositions of Principia Mathematica and related systems I, and On Completenessand eth century are given without reference to edition since many Consistency (three articles), in J. Van Heijenoort (ed.), From appear in multiple editions, and virtually all should be availFrege to Gddel,A SourceBook in Mathematical Logic 1879-1931, able from any reasonable academic library. Entries follow Harvard University Press, Cambridg", MA, pp. 592-6L7 , L967. rough chronological order. The reader is warned that philog. D. M. Armstrotrg, Belief Truth and Knowledge, Cambridge Unisophical works of previous centuries make difficult reading for versity Press, Cambridg", UK, 1973. Early presentation of the nonphilosophers. new naturalism. Plato, Meno, Theaetetus,Republic (usually found in the collecteddia10. F. I. Dretske, Seeing and Knowing, University of Chicago Press logues). (Chicago) 1969. Dretske's basic presentation of his naturalist Prior Analytics, Metaphyslcs (frequently found in antholoAristotle, view. Knowledge and the Flou) of Information, MIT Press, Camgies). bridge, MA, 1981.Dretske'sdevelopedview. R. Descartes,Meditations, Discourseon Method. 1 1 . A. Newell, "The knowledgelevel," AI Mag. 2, L-20 (1981). G. W. Leibniz, Many of Leibniz's works take the form of letters and the L 2 . R. J. Brachman, On the Epistemological Status of Semantic Netlike. A useful collection, with individual writings grouped by works, in N. V. Findler (ed.), Associatiue Networks, Academic topic, ts Leibniz Selections,P. P. Wiener (ed.),Charles Scribners' Press,New York, L979Sons, New York, 1951. A more recent edition is New Esso'ysof 13. Z. W. Pylyshyn, Computation and Cognition, MIT Press, CamHuman (Jnderstanding, P. Remnant and J. Bennett (trans. and bridge, MA, 1984. ed.), Cambridge University Press, New York, 1981. A more comL4. D. Marr, Representing Visual Information, in A. R. Hanson and plete collection is the two-volume edition, Leibniz. Philosophical E. M. Riseman (eds.), Computer Vision Systems,Academic, New Papers and.Letters, L. E. Loemker (trans.), University of Chicago York, pp. 61-80, 1978. Pioneering work in vision, which makes Press,Chicago,IL 1956. clear the gap between pixel maps and perception with recognition. J. Locke, An Essay Concerning Human Understanding. 15. A. Newell, J. Barnett, J. Forgie, C. Greeo, D. H. Klatt, J- C. R. G. Berkeley, New Theory of Vision, Principles of Human Knowledge, Licklider, J. Munson, D. R. Reddy, and W. A. Woods,SpeechUnThree Dialogues BetweenHylas and Philonous. derstanding Systems:Final Report of a Study Group, North-HolD. Hume, Enquiry concerning Human Understanding. land, Amsterdam, 1973. Demonstrates the gap between sound L Kant, Prolegorrlena to Any Future Metaphysics.' Critique of Pure recognition' with hearing and wave reception Reason. lG. J. Hintikka, Knowled.geand Belief: An Introduction to the Logic of Huss erl, Logical Inuestigations,J.N. Findlay, trans., Routledgeand E. the Two Notions,Cornell University Press, Ithaca, NY, 1962. Paul, London, L970;Formal and TranscendentalLogic, D. Kegan 23L-272 L2, L7. J. Doyl€, A truth maintenance system, Artif. Intell. trans., Martinus Nijhoff. The Hague, 1969' Cairns, ( 1979).
EXPERTSYSTEMS
287
S. P. Stich, From FoIk Psychologyto CognitiueScience:The Case AgainstBelief,MIT Press,Cambridge,MA, 1983.
Hearne, Simulating Non-DeductiveReasoni.g, pp. 362-364. Martins and S. C. Shapiro,Reasoningin Multiple Belief Spaces,
Sourcesfrom Artificial Intelligence These are original research works in artificial intelligence. Someof these works were deliberately directed at epistemological issues; others are on topics that have substantial epistemological interest. Many of these works are primary research reports, but most should be relatively accessibleto outside readers.
M. Nilsson, A Logical Model of Knowledge,pp. 374-376. A. Sloman,D. McDermott, and W. A. Woods,Panel Discussion:Under What Conditions Can A Machine Attribute Meaning to Symbols?, pp. 44-48.
pp.370-373.
J. F. Allen, "Towardsa generaltheoryof actionandtime,"Artif. Intell. 23, 123-154(1984).Development of modelsto reflectknowledge involving time. M. Georgeff, A Theory of Action for MultiAgent Plannin g, Proceedings of the Fourth National Conferenceon Artificial Intelligence, AAAI84, Austin, TX, pp. L2L-L25, 1984. Processmodel for knowledge about action. H. J. Levesque,"Foundations of a functional approach to knowledge representation,"Artif. Intell. 23, L55-2I2 (1984). H. J. Levesque,A Logic of Implicit and Explicit Belief, Proceedingsof the Fourth National Conferenceon Artifi,cial Intelligence, AAAI84, Austin, TX, pp. 198-202, L984. A. Maida and S. C. Shapiro, "Intensional conceptsin propositional semantic networks," Cog. Sci. 6, 29L-330 (1982). J. McCarthy, Programs with Common Sense,in M. Minsky (ed.),Semantic Information Processing,MIT Press,Cambridge, MA, 1968. J. McCarthy and P. Hayes, Some Philosophical Problems from the Standpoint of Artificial Intelligence, in B. Meltzer and D. Michie (eds.),Machine IntelligenceVol. 4, Edinburgh University Press, Edinburgh, pp. 463-502, 1969.Reprinted in Webber and Nilsson (seebelow). J. McCarthy, EpistemologicalProblemsof Artificial Intelligence, Proc. of the Fifth Intl. Joint Conf. on Artificial Intelligence, pp. 1038L044, 1977. Reprinted in Webber and Nilsson (seebelow). J. McCarthy, First Order Theoriesof Individual Conceptsand Propositions, in J. E. Hayes,D. Michie, and L. I. Mikulich (eds.),Machine Intelligence,pp. 129-I47 , Vol. 9, Ellis Horwood,London. W. J. Rapaport, Quasi-Indexical Referencein Propositional Semantic Networks, in Proc. 10th Intl. Conf. on Comp.Ling. (COLING-84), Associationfor Computational Linguistics, pp. 65-70, 1984. An A.I. representation of knowledge-relatedconceptsbasedon philosophical works. B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial Intelligence,Tioga, PaIo Alto, CA, 1981.This collectioncontains many basic articles from acrossthe AI spectrum. The last section contains severalarticles explicitly related to epistemology,including the 1969 McCarthy and Hayes paper. R. W. Weyrauch, "Prolegomenato a theory of mechanizedformal reasoning," Artif. Intell. 13, 133-170 (1980). Reprinted in Webber and Nilsson (seebelow). W. A. Woods, Procedural Semantics as a Theory of Meaning, in A. Joshi, B. Webber,and I. Sag (eds.),Elementsof DiscourseUnderstanding, Cambridge University Press,Cambridge, UK, pp. 300334,1981. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83), August 8-t2, 1983, Karlsruhe, FRG, August, 1983. This conference is so rich a source that it almost deserves a section to itself. Relevant articles include the following: J. A. Barnden, Intensions as Such: An Outline, pp. 280-286. J. Doyle, The Ins and Outs of ReasonMaintenance,pp. 349-351.
J. T. Numnn Virginia
Tech
EPISTLE A text-critiquing system that checks spelling, Sammar (qv), and style in business correspondence,EPISTLE implements grammar checks, which constitute the central part of the system, using an augmented phrase structure grammar (qv). Style checks are currently limited to overly complex sentences.It was written by G. Heidorn, K. Jensen, L. Milter, R. Byrd, and M. Chodorowat IBM around 1981 [seeG. Heidorn et al., "The EPISTLE text-critiquing system,"IBM Sys.J.218), 305-326 (1982)1. K. S. Anone SUNY at Buffalo
EURISKO A learning (qv) program that uses heuristics (qv) to develop new heuristics, EURISKO was developedin 1981 by Douglas Lenat at Stanford University. This program, along with AM (qv), presentsthe stratery called learning by discovery(seeR. Michalski, J. Carbonell, and T. Mitchell (eds.), Machine Learning,Yol. 1, Tioga, Palo Alto, CA, 1983). K. S. Anone SUNY at Buffalo
EXPERT SYSTEMS Knowledge-basedexpert systems, or knowledge systems for short, employ human knowledge to solve problems that ordinarily require human intelligenc" (qr) (1). Knowledge systems represent and apply knowledge electronically.Thesecapabilities ultimately will make knowledge systemsvastly more powerful than the earlier technologies for storing and transmitting knowledge, books and conventional programs. Both of these technologies suffer from fundamental limitations. Atthough today books store the largest volume of knowledge, they merely retain symbols in a passive form. Before the knowledge stored in books can be applied, a human must retrieve it, interpret it, and decide how to exploit it for problem solving (qv). Although most computers today perform tasks aceording to the decision-making logic of conventional programs, these programs do not readily accommodate significant amounts of knowledge. Programs consist of two distinct parts, algorithms and data. Algorithms determine how to solve specifickinds of
288
SYSTEMS EXPERT
probleffis, and data charactenze parameters in the particular problem at hand. Human knowledge does not fit this model, however. Becausemuch human knowledge consistsof elementary fragments of know-how, applying a significant amount of knowledge requires new ways to organize decision-making fragments into competent wholes. knowledge systemscollect these fragments in a knowledge base and then accessthe knowledge base to reason about each specificproblem. As a consequence,knowledge systems differ from conventional programs in the way they're organized,the way they incorporate knowledge, the way they execute, and the impression they create through their interactions. Knowledge systems simulate expert human performance, and they present a humanlike facade to the user. Somecurrent knowledge engineering applications are medical diagnosis, equipment repair, computer configuration, chemical data interpretation and structure elucidation, speech and image understanding, financial decision makitg, signal interpretation, mineral exploration, military intelligence and planning, advising about computer system use' and VLSI design. In all of these areas, system developers have worked to combine the general techniques of knowledge engineering (KE) with specializedknow-how in particular domains of application. In nearly every casethe demand for a KE approach arose from the limitations perceived in the alternative technologies available. The developers wanted to incorporate a large amount of fragmentary, judgmental, and heuristic knowledge; they wanted to solve problems automatically that required the machine to follow whatever lines of reasoning seemedmost appropriate to the data at hand; they wanted the systems to accommodatenew knowledge as it evolved; and they wanted the systemsto use their knowledge to give meaningful explanations of their behaviors when requested. This entry presents a tutorial overview of the field of knowledge engineering. It describes the major developments that have led up to the current great interest in expert systems and then presents a brief discussionof the principal scientific and engineering issues in the field. The subsequent sections describe the processof building expert systems and the role of tools in that work, how expert systems perform human-computer interface functions, and the frontiers of research and development.
Early KE applications arose in universities and emphasizedmatching the performanceof human experts.DENDRAL (Z) and MACSYMA (3) achieved expert performance first. DENDRAL identifies the chemical molecular structure of a material from its mass spectrographic and nuclear magnetic resonance(nmr) data. MACSYMA manipulates and simplifies complex mathematical expressions. Beginning in the 1970s, AI researchersinitiated several KE applications. By the end of the decadeseveral projectshad accomplishedsignificant results: MYCIN incorporated about 400 heuristic rules written in an English-like if-then formalism to diagnose and treat infectious blood diseases,but its major impact on the field arose from its ability to explain lucidly any conclusion or question it generated (4). HEARSAY-II employedmultiple, independent,cooperating expert systems that communicated through a global database calied a "blackboard" to understand connectedspeech in a 1000-wordvocabularY(5). Rl incorporated about 1000 if-then rules neededto configure ordlrs for Digital Equipment's VAX computers and eliminated the need for DEC to hire and train many new people to perform a task that had proved difficult and that had resisted solution by conventional computer techniques (6). INTERNIST contained nearly 100,000judgments about reIationships among diseasesand symptoms in internal medicine and began to approach a breadth of knowledge and problem-solving performance beyond that of most specialists in internal medicine (7). TechniquesUsed in KnowledgeSystems
Figure 1 illustrates the primary building blocks of a knowledge system. The base level consists of those techniques that underlie nearly all applications. These include symbolic programmirg, propositional calculus (see Logic, proportional) search (qv) and heuristics. At the secondlevel of techniques Figure 1 shows the most frequently used forms of knowledge representation (qv): constraints (see Constraint satisfaction), assertions, rules (see Rule-based systems), and certainty factors (see Reasonitg, A Brief History of KnowledgeEngineering.Throughout the plausible). Examples of constraints include "Two distinct past two decades,AI researchershave been learning to appre- physical objects cannot occupy the same space at the same in the life insurance ciate the great value of domain-specificknowledge (see Do- time" and "Every beneficiary designated health of the inin the interest financial a policy have must problems. significant for solving basis as a main knowledge) constraints to incorporates system Most of the world's challenging mental problems do not yield sured party." A knowledge or conclusions. values, states, allowable on restrictions to general problem-solving (qt) strategies even when aug- express primarily value their derive systems knowledge some (qv). fact, In probTo solve mented with general efficiency heuristics symbolic complex grltze satisfy and reco to ability an through mediengineering, as such expertise human lems in areas of of constraint class the extends KE way cine, or programming, machine problem solvers need to know constraints sets.In this Previously, what human problem solvers know about that subject. Al- satisfaction problems amenable to computation. constraints, linear primarily on focused systems computer inhumans, over though computers have many advantages arbitrary symbolic concluding speed and consistency, these cannot compensatefor whereas knowledge systems address temporal, or logical spatial, on ignorance. In a nutshell, AI researcherslearned that high IQ straints such as requirements (see relationships. doesnot make a person expert, specializedknow-how does Assertional databases provide means for storing and reIntelligence). To make a fast and consistent symbol processor propositions.An assertion correspondsto a true proptrieving provide it must someone ert, exp human perform as well as a include "The King of specialized know-how comparable to what a human expert osition, a fact. Examples of assertions relationships possible explore to my company visited Sweden possesses.This need gives rise to knowledge engineering.
Communications
Organization andcontrol
ints Constra
Symbolic programming
Intermediate results
Assertions
Propositional calculus
Explanation and justification
Optimization
Rules
Search
Certainty factors
Heuristics
Figure 1. Building blocks of a knowledge system.
with West Coast high-technology companies,""Morgan is a dog," and "Morgan' is my dog's name." Many simple forms of assertionslend themselvesto relational databaseimplementations, but more complicated patterns do not. In general, most knowledge systems today incorporate their own specialtzed assertional database subsystems. Rules represent declarative or imperative knowledge of particular forms. To illustrate an imperative rule, consider:"If you observea patient with fever and a runny nose,you should suspectthat the patient has the flu." This rule tells a knowledge system how to behave. A related declarative rule would tell the system what it could believe but would leave hous unspecified:"If a patient has the flu, the patient tends to exhibit fever and a runny nose." Most knowledge systems use one or both of these rule forms. Declarative rules, in general, describethe way things work in the world. On the other hand, imperative rules prescribe heuristic methods the knowledge system should employ in its own operations. Certainty factors designate the level of confidenceor validity a knowledgesystem should associatewith its data, rules, or conclusions.These certainty factors may reflect any of a variety of different schemes for dealing with error and uncertainty. Some systemsemploy Bayesian conditional probabilities to estimate certainties (see Bayesian decision methods). Others use completely subjective systems,for example, where 1.0 implies certainty, -1.0 implies certainty of the proposition's negation, and 0.0 indicates either no opinion or no evidence. Many people devote considerable effort to the task of improving the certainty factor technolory. To a large extent this may prove fruitless. First, knowledge systems need to estimate the strength of their conclusionsprecisely becauseno valid and formal alternatives exist. One cannot eliminate the subjective quality of the decision process by any amount of formalization. Second, many alternative certainty factor schemeswork equivalently well. Knowledge systems do well becausethey can mimic human performance. Humans could not solve problems well if they needed to calculate complex mathematical formulas to determine their own certainty fac-
tors. Rather, humans perform well becausetheir knowledge generally utorks well enough. It is efficient, robust, and good enough to solve important problems. Knowledge systemssimply exploit that power of the human's knowledge. At the third level of techniques, Figure 1 shows organtzation and control (seeControl structures), intermediate results, explanation (qv) and justification, and optimization. A knowledge system organizes and controls its activity according to the architectural designprinciples it embodies.For example,a diagnostic expert system might reason backward (seeProcessirg, bottom up and top down) from all potential diseasesit knows searching for sufficient confirming evidence.It might consider first the diseaseconsideredmost likely a priori. Then it might ask for evidence according to the most likely and characteristic syndromes. Only when it encountered overwhelming amounts of disconfirming data might it begin to considerthe next possibledisease.An expert systemthat operated in this manner would exhibit a depth-first, backwardchaining control scheme. Each distinct control scheme may require a corresponding organization of the knowledge base and appropriately tailored inferential mechanisms (seeInference) that search it and apply knowledge. Thus, control and organization are closely linked. Intermediate results arise in all systems.Becauseknowledge systems can face difficult performance requirements, they ofbenneed to make effective use of intermediate results. In a backward-chaining system, for example, several possible alternatives under considerationall may require a commonbit of evidence. Collecting this evidence may require extensive amounts of computation and inference. Once the knowledge system evaluates that evidence, it has an incentive to save that result so it may reuse it later. In every organization and control scheme, comparable issues of temporary storage and reuse arise. Most knowledge systemstoday employ specialized and ad hoc methods to accomplish these functions. Becauseknowledge systems generally can explain and justify their results, they have captured the interest of a wide variety of potential users. Users (consumers)in many applica-
290
SYSTEMS EXPERT
powerful but often exceedingly complex computer tion areas need to trust the recommendationsof a knowledge use of the today in fields such as structural engiemployed programs conclutheir system. By offering to explain how they reached analysis. Quite commonlY, knowledge seismic and tr..iing reaof impression an user ,iorr, these systems convey to the to accessand retrieve information means incorporate (qv), systems systems these explanation an sonability. To construct way knowledge systems can In transform the expert heuristic rules and assertions into lines from on-line databases. this and directly to the vast automatically knowledge their of apply set of reasoning. A line of reasoning shows how a starting on-line. Frequently, a reside commonly now that data of parrlot.t produce a rules assumptions and a collection of heuristic goal of weaving diprimary may serve the ticular conclusion. Consumers generally find these explana- knowledge system databases, in different reside that tions as plausible as the rules themselves. The other people verse sourcesof knowledge heurequire practices, and coding and formats different reflect who interact with the knowledge system also exploit their interpretaintegrated meaningful, produce a to means ristic who explanation capabilities. Knowledge base maintainers, most often in complex organizations, *ay include experts and technicians, continually revalidate tion. These needs arise systems of large corporamanufacturing and entry order the on performance their assessing by their knowledge systems functions of defensedeanalysis and test cases.They need to validate that the system both reaches tions, or the intelligence partments. the right decisions and that it does so for the right reasons. Figure 2 illustrates the major components of a contempoOpiimization techniques play an important role in knowlknowledge system and placesit in its environmental conrary applicomputer like other systems, edge systems.Knowledge system as a computer .uiionr, must perform their tasks as quickly as needed.Many text. The figure depicts the knowledge and operational endevelopment distinctive with knowledge system applications today interact with users so application knowledge sysduring participate people who The vironments. cases, In these often that they generally are waiting for input. in Figure shown tools the use extension and development tem assuring to attention pay considerable knowledge engineers mainteknowledge base that the dialogue itself seemsexpert in terms of which queries, 2, tools for knowledge acquisition, the tools, these Using design. interface and in which order, the knowledge system generates.This requires nance, validation, incorthat systems knowledge construct engineers knowledge itself. dialogue effective ways to opti mrze the structure of the in Figure 2: a knowlNew tools for building knowledge systems provide improved porate the three key componentsshown interface. To do user and a methods for specifying such imperative knowledge clearly and "ag. base, an inference engine, building the for tool a selects engineer separating it effectively from descriptive knowledge about the this, the knowledge problem-solvfit the features built-in whose system in knowledge problem domain. The most important area of optimtzation generally also will complex task domains, however, concernsthe knowledge sys- ing knowledge in this domain. That tool that consticontrol and organization to an approach embody test generate and it Does performance: temb problem-solving knowledge paradigm the problem-solving candidate solutions in an efficient order; does it avoid redun- tutes the specific develcompletes system knowledge a Once adopt. will dant computation; does it compile the symbolic rules effec- system it ordinarily environment that In operation. it enters opment, it transdoes tively; doesit retrieve assertionsefficiently; and various communication netform the knowledge base into more appropriate organizations accessesdatabases, connects to existing installed equipwith integrates to or transfers works, algoefficient more exploit can that tasks tzed for special sensor systems. from directly data rithms? In someapplications, optimization of a knowledge sys- ment, and may receive system called knowledge illustrative an entry in this Later tem has reducedrun times to as little as one-thousandthof one At this point, detail. in some is described Advisor Drilling the percent of initial completion times. concretely more illustrate to is used system knowledge that comis their techniques system knowledge of The capstone munication capabilities. Knowledge systems communicate these major componentsand environmental systems. The Drilling Advisor addressesproblems of sticking and with knowledge engineers,experts, consumers,databases,and that can occur during the processof drilling for oil. In dragging interact and access other computer systems. Just as humans a drill string may encounter tremendous sticking nutshell, a speak to needs system knowledge a sources, with these various to each in its own appropriate language. Knowledge systems forces arising from friction between geological strata and the communicate with knowledge engineersthrough structure ed- drilling pipe, stabili zers,and bit. In operation, the knowledge itors that allow them to accessand modify componentsof the system needs to accessan on-line database of drilling operation reports that describekey parameters. It needsto commuknowledge base easily. Knowledge systemscommunicatewith nicate with regional or central operating management to reeluthat explanations with dialogues sample through experts its own reports. cidate their lines of reasoning and highlight for the expert ceive knowledge base updates and to transmit and this means environments, on-rig harsh in operate It must where to make knowledge base changes. For consumers, equiphardened special with integrate and on run must processes it to natural-language knowledge systemsmay exploit generate that sensors to access direct exploit can It also ment. responses. user interpret to or answers questions and generate pressure of the drillSomeknowledge systemstoday use videodisks to retrieve pic- drilling data such as depth of the bit and mud. Being tures and replay instructional sequencesfor consumers. The Drilling Advisor itself incorporatesthe knowledge repyond their interactions with people, knowledge systems also and problem-solving paradigm of Teknowledge's resentation ofsystems Knowledge systems. computer other interact with base for sticking inten need to formulate and execute conventional data-process- S.1 expert system tool. The knowledge and descriptions of rules heuristic 300 approximately cludes ing applications as a subtask. In this way several knowledge The inference enparameters. drilting key 50 approximately higher of brain" the "new tytt.*s have evolved almost like is a drilling superwho user, the with gine a diatogue conducts lowerpreexisting, powerful, atop piggy-back sits animals that mimics in its content level ,,oldbrains." These piggy-back knowledge systemsincor- visor, in English or French. The di4logue questioning and analysis of the of manner the sequencing and effective make to porate the scarce expert know-how needed
EXPERTSYSTEMS USERINTERFACE
KNOTYTEDGE ACoUTSTT t0N
Jf,l|lll, I o*or*,.,lrxprnHATron KilOWLEDGE BASE
INFERENCE ENGINE
- SO LVIN G KNOWLEDGE PR O BLEM PABAD IGM REPRESENTATION PRO G R AM M INEN G VIR ON M EN T
VALIDATION TOOLS
USER INTERFACE DESIGN
LANGUAGE PROGRAMMING SYSTEM
COMMUNICATION NETWORKS INSTALTED EOUIPMENT
OPERATING SYSTEM
COMPUTER
KNOWLEDGESYSTEM Figure
2. Technology applied: a knowledge system and its environmental
human expert who served as the model. The pursuit of each hypothesis follows a depth-first, back-chaining approach starting with the most likely sticking problem and proceeding to collect necessary supportive evidence. Each hypothesis, datum, and heuristic rule may reflect uncertainty. The knowledgesystem combinesthese uncertain quantities to determine a certainty factor (CF) between - 1 and 1 for each potential diagnosis.For all diagnoseswith CFs exceeding0.2, the Drilling Advisor formulates a treatment plan to cure the current problem and minimize the likelihood of problem recurrence. The Drilling Advisor was developedin a programming environment composedof S.1 and LISP (qv), and it usesLISP as its underlying programming system. It can also conduct consultations using a C implementation of S.1 operating on any standard UNIX-based system. The development environment consistsof tools provided by S.1. That system provides special tools to help the knowledge engineer and expert acquire the expert's knowledge. These include both English language and abbreviated notations for rule display and entry, knowledge base browsing, structure editors, case libraries, and automated testing facilities. During the course of the system development, the expert became proficient in using the development tools so he or she could direct late stages of knowledge acquisition and knowledge base maintenance. Each time the expert or knowledge base maintainer modifies a rule, the S.1 system automatically validates that the knowledge system still performs its test cases conectly. Finally, the knowledge system conductsits interaction with the drilling supervisor in natural language by automatically translating, collecting, and formatting appropriate fragments of text associated with the knowledge base elements that participate in a line of reasoning. The interface tools make it possible for the knowledge system to produce a sophisticated and intelligible dialogue using only the short, descriptive phrases associatedwith each drilling parameter that the expert provided. The Drilling Advisor also displays graphically several key factors, including the plausible sticking probleffis, the rock formations, and the drill bit and stabilizers constituting the "bottom hole assembly." Finally, using standard tools in the S.1 package, the Drilling Advisor displays dynamically its alternative lines of reasoning, its inter-
context.
mediate conclusions,and the heuristic rules momentarily under consideration. Fundamentals of KnowledgeEngineering Knowledge engineering, as most engineering fields do, combines theory and practice. This section discussesthe fundamentals of the engineering discipline as it exists today. The discussionmakes three main points: First, becauseknowledge systemssolve problems that ordinarily require human intelligence, they exhibit properties common to most intelligent problem-solving systems, whether natural or artificial. Second, to determine the best organizatton and design for any particular knowledge system, we must consider the type and complexity of the problem and the power and form of the heuristic knowledge available for solving it. Although KE has existed for only a very short time, it makes some useful prescriptions for the best way to organrze a knowledge system in various situations. Third, knowledge contains a capacity for intelligent action but doesnot typically carry with it a means for tapping and realizing that potential. Thus, in building practical knowledge systems today, knowledge engineers always engineer knowledge; that is, they convert knowledge to applicable forms. These three facts convey in a simple manner the essential ideas that motivate the more detailed discussions that follow. Basicldeas. Table 1 makes five basic points, which are explained below. Knowledge in this context means those kinds of data that can improve the efficiency or effectivenessof a problem solver. Three major types of knowledge fit this description: Facts expressvalid propositions,beliefs expressplausible propositions, and heuristics express rules of good judgtnent in situations where valid algorithms generally do not exist. Experts differ from others in the quality and quantity of knowledge they possess.Experts know more, and what they know makes them more efficient and effective. In contrast to conventional data-processing applications, most knowledge systems work in situations that do not admit optimal or "correct" solutions. Most human professionalsper-
292
EXPERTSYSTEMS
Table 1. The Basie rdeas of rntelligent Problem Solving
The difficulty of problem-solving tasks increases in four
ways: (a) The problem solver may not possessaccurate data 1. Knowledge - facts + beliefs + heuristics 2. Success- finding a good enough answer with the resources sources or knowledge that performs without errors. These available shortcomings cause it to explore many false paths. (b) When 3. Search efficiency directly affects success the data change dynamically, the problem solver must acceler4. Aids to efficiency ate its reasonitg, base some decisions on its expectationsfor (a) Applicable, correct, and discriminating knowledge the future, and revise its decisionswhen current data discon(b) Rapid elimination of "blind alleys" firm erroneousprior assumptions. (c) Of course,the more pos(c) Elimination of redundant computation sibilities it must consider,the harder the task. However, it is (d) fncreased speed of computer operation difficult in many applications to quantify the size of the search (e) Multiple, cooperativesourcesof knowledge spaceand to find alternative formulations of the search space (0 Reasoning at varying levels of abstraction that simplify the problem as much as possible.(d) A problem 5. Sourcesof increasedproblem difficulty (a) Erroneous data or knowledge solver that must use complex and time-consuming methods to (b) Dynamically changing data eliminate alternatives from consideration works less effi(c) The number of possibilities to evaluate ciently than one possessing equally effective but simpler, (d) complex proceduresfor ruling out possibilities cheaper measures.
form tasks that require skilled, assertive, and informed judgment, and these requirements arise from the complexity, ambiguity, or uncertainty of the available data and problem-solving methods. In such cases the problem solver must balance the quality of the answer it producesagainst the effort it expends.An expert finds the best compromise,usually by seeinga way to find an acceptableanswer with a reasonable expenditure of resources. Given such a pragmatic orientation to performance,intelligent problem solvers benefit directly from improved efficiency. In particular, improvements in speed or selectivity can produce an acceptable solution more affordably, enabling the problem solver to find better solutions in the time available or take on and solve additional problems. How then does an intelligent problem solver improve its efficiency?Table 1 lists the six most common ways (a) It possessesknowledge that applies often, avoids errors, and makes useful distinctions to exploit significant differencesamong diverse types of situations. (b) It eliminates quickly paths of investigation that ultimately will prove useless. It prunes these "blind alleys" early by advancingin time those decisions that can remove fruitless classesof possibilities from further consideration. (c) It eliminates redundancy by computing things once and then reusing the results later if needed.(d) It acceleratesits computations, which in the case of knowledge systems means that it increasesthe quality of its compilation and employs faster hardware. (e) It takes advantage of diverse bodies of knowledge that can contribute to the problem at hand. Specifically, it uses independent bodies of expertise to reduce ambiguities and eliminate sourcesof noise. Or it exploits knowledge basesfrom complementary disciplines to find a solution using whichever techniques or heuristics work best on the given problem. (f) Lastly, it analyzes a problem in different ways, ranging from the high level and abstract to the low level and specific. Most complex problems require the problem solver to jump around in levels of abstraction; and they can reward an insightful observation at any level by obviating enormous amounts of additional analysis at the other levels. Examples of such insights at various levels include recognizing that the current problem has the sameform as one previously solved; detecting that one of the problem requirements rules out all but two candidates;or noting that a fi.gure incorporates orthogonal, horizontal, and vertical line segments, suggesting that it depicts a man-made object.
KnowledgeSystemOrganization and Design. Unlike dataprocessingapplications, current knowledge systems do not fit specific models, such as the typical update-master-file or input-process-putforms so common in commercial data processing. Moreover, the KE field doesnot yet have commonschemes for characteri zing its designs and systems. However, experiencedknowledge engineers do adhere to some general principles when designing knowledge systems.These principles determine high-level architectural properties of knowledge systemsthat permit them to perform their tasks effectively. To determine an appropriate knowledge system design, these principles ask questions about the kind of problem-solving complexity the task involves and the kind of heuristic problem-solving knowledge available. Figure 3 graphically shows many of the best understood design principles. The basic factors in this diagram are explained next. The reader interested in a detailed explanation should see Ref. 1. Figure 3 divides all knowledge system application problems into two categories characterized by small and large search spaces.It then elaborateseach of these two basic categoriesby citing additional attributes that also may charactertze the problem. For example, in the small-spaceproblems it distinguishes three possibly overlapping subcategoriesbasedon the kinds of data the knowledge system must process.When these data seemreliable and unchanging and the system knowledge performs reliably, the figure prescribes the most typical knowledge system architecture: exhaustive search that pursues one line of reasoning at a time, such as depth-first backward-chaining. Furthermore, the prescribed system can reason monotonically: It need not initially formulate guessesthat it later might need to retract. At the other extreme the figure addressescomplex probleffis, such as those with large factorable search spaces (the search space can be broken into smaller subspacescorrespondingto independent subproblems) in which pursuing one line of reasoning doesnot perform consistently well, no single body of knowledge provides enough power to solve all the problems the knowledge system faces, and the initial form of knowledge representation proves too inefficient to achieve the neededlevel of performance.In these casesthe design principles prescribe several remedies,respectively. First, the knowledge system must explore and develop several promising lines of reasoning at once until it obtains more certainty about the actual solution. Second,it should incorporate several independent subsystems, each of which should contribute to decision making on an opportunistic basis. That is, the top-level knowledge system should maintain
EXPERTSYSTEMS Problem characteristics
Smallsearch space
, /,/l L
\r
,r/
Dataandknowledge reliable andfixed
Unreliable dataor knowledge
IL
Time-varying data
description Design
reasoning, monotonic search, Exhaustive lineofreasoning single frommultiple sources, evidence Combining probability models, models, tuzzy models exact State-triggered expectations generate Hierarchical, andtest
/// '//--
Noevaluator forpartialsolution
Bigfactorable search space
Fixed ofabstracted order steps
And Nofixed sequence ofsubproblems
Abstract search space
And Subproblems interact
propagation, least commitment Constraint
And guessing isneeded Efficient
lineof reasoning Single tooweak
revision forplausible reasoning Belief
Multiple lines ofreasoning
And Single knowledge source tooweak
method tooinefficient Representation
Heterogeneous models, opportunistic scheduling, variable-width search Tuned datastructures, knowledge compilation, cognitive economy
Figure 3. Knowledge-systemapplication problems.
an agenda of pending subsystemactions and schedulefor execution first those pending actions that promise to contribute most to the developing solution. This means the knowledge system will pursue a variable number of simultaneous, competing alternative solution paths, where the actual number at any point reflects the momentary lack of certainty regarding the "best" path. Lastly, knowledge systemscan exploit several advanced techniques for improving efficiency. Generally, these require making some kind of transformation to the initial knowledge representation and inference engine. These may include adopting data structures more attuned to the types of inference the knowledge system performs; compiling the knowledge into a new structure, such as a network or tree, that facilitates rapid search; or using dynamic techniques to cacheintermediate results and perhaps compile incrementally more efficient methods for frequently repeated inferencesthat initially require complex chains of computation. In short, today's design principles provide high-level guidance to the knowledge system designer. Like architectural principles in housing and commercial construction, these principles suggest the broad outlines of a construction task without specifying the details. Knowledge systemsbuilt in a manner consistent with the principles in Figure 3 will prove similarly well adapted to their environments but will vary considerably in their fine structure. EngineeredKnowledge.One aspectof KE seemsboth obvious and subtle. What seems obvious is that knowledge engi-
neers extract knowledge from experts and integrate it in an overall knowledge system architecture. Hence, they are engineers who construct systems out of elementary knowledge components.What is subtle is that the way a knowledge system uses knowledge to solve problems directly affects how the knowledge engineer extracts, represents, and integrates it. Knowledge doesnot come off the shelf, prepackaged,ready for use. On the contrary, "knowledge" is the word usedto describe a variety of fragmentary bits of understanding that enable people and machines to perform otherwise demanding tasks reasonably well. As an example, an understanding of the way technology transfer generally occursenables a technical manager to reason in many different ways for different purposes:If setting up a technology transfer progr&h, the manager needs to shape and apply the knowledge in a manner different from what would be required if the manager were asked to review someoneelse's program, estimate a budget for it, forecast its likely results, or analyze its similarity to previously successful and unsuccessfulprograms. In short, people seemto possessa general understanding of the way things work. Today, a knowledge engineer building a knowledge system assesses what the knowledge system needsto do, evaluates the various ways it can do that, and formulates a version of an expert's know-how that allows the knowledge system to meet its goals. In summary, knowledge systemstoday can incorporate significant quantities , of human knowledge to solve problems electronically that ordinarily require human intelligence. To do this, the knowledge systems adopt a general organi zation
EXPERTSYSTEMS
construct with high-level design prescriptions and then fit the problem-solving knowledge into that framework. To make an Lxpert's knowledge fit, the knowledge engineer molds the knowledge to produce the necessaryperformance. In this way knowledge engineers today genuinely engineer knowledge. The actual work of building a knowledge system is described below. ConstructingKnowledgeSYstems To build a knowledge system today, a knowledge engineer performs four types of functions. Figure 4 defines these as knowledge-processingtasks informally referred to as minitg, molding, assembling, and refining. These terms arise in mining rare metals and seem an apt way to describethe processes involved in extracting knowledge and manufacturing knowledge systems.Knowledge, like a rare metal, lies dormant and impure, beneath the surface of consciousness.Once extracted, an element of knowledge must undergo several transformations before it can add value. These four basic processingtasks are discussedhere and, in particular, the iterative and incremental role of knowledge acquisition (qv) in the evolutionary development processis emphasized. Figure 4 also provides the technical terms for each of the four primary construction activities and identifies the key products of each phase. Knowledge acquisition involves eliciting from experts or books the basic conceptsof the problem domain, that is, the terms used to describeproblem situations and problem-solving heuristics. From this starting point the knowledge acquisition processcontinues until it elicits enough problem-solving knowledge to enable the knowledge system to achieve expert performance.Heuristic rules constitute the key product of this activitY. Knowledge system design produces a framework or architecture for the knowledge system, &s discussedabove.In addition, the knowledge system designer selects an appropriate schemefor representing the problem-solving knowledge. Representation options include formal logic (qv), semantic networks (qv), hierarchical frames (seeFrame theory), active objects, rules, and procedures.Each of these alternative schemes has supported at least one previous knowledge system development effort. Any representation must accommodate the available knowledge and facilitate the search and inference required to solve the problems of interest.
Knowledgeprocessing tasks
Once a knowledge engineer has selectedthe framework and knowledge representation, knowledge programming begins. In this activity knowledge engineers transform human knowhow into a knowledge base that will fuel an inference engine. Generally, people developing knowledge systems today adopt an existing knowledge engineering tool that incorporates a predefined inference engine, so knowledge programming need only produce a knowledge base. The process of refining knowledge continues until the knowledge system achievesan adequate level of performance. Generally, a knowledge system performs poorly at the start. In transforming an inexact understanding of an expert's behavior into heuristic rules, both the expert and knowledge engineer err. They misunderstand abstract concepts,incorrectly express rules of thumb, and neglect many details needed to ensure the validity of knowledge base rules. These errors do not reflect poorly on their professionalism.On the contrary, no enor-free approach exists. Experts do their tasks well because they use lots of knowledge, not becausethey think about or verbalize it. In fact, KE provides for most knowledge-intensive activities the first practical means for codifying and validating knowledge. Before the development of KE, experts generally could not express their know-how in any effective way, and they could not assessmuch of it empirically. Knowledge systems make it possible to test directly how well knowledge works. As a direct result, they also highlight the weaknesses and deficiencies.By focusing attention on these shortcomings, an expert often can improve a knowledge base rapidly. This leads to the common development pattern of an incremental, evolutionary development with performance that first approacheshuman levels and then generally exceedsthem. Figure 5 illustrates one key aspect of knowledge acquisition, the transfer of an expert's understanding to a knowledge engineer's knowledge system. This transfer involves two-way communication. At first, the knowledge engineer interrogates the expert to request insight into how the expert solvesparticular problems and how the expert thinks about the objectsand relations of interest. In Figure 5 these componentsof understanding are labeled World and Task knowledge. The expert reveals some of this knowledge through the problem-solving task descriptions given to the knowledge engineer. The knowledge engineer listens to the experts description to hear the problem-solvingelements. Unlike a systemsanalyst, who formulates an algorithm to solve a client's problem,
Engineering products
Engineering activities
Mining
Knowledgeacquisition
and rules Concepts
Molding
Knowledgesystem design
Frameworkand knowledge representation
bling Assem
programming Knowledge
Knowledgebase and inferenceengine
Refining
refinement Knowledge
conceptsand rules Revised
Figure 4. Knowledge-processing tasks and KE activities knowledge systems (engineering products).
used in constructing
various types of
Knowledge system Description : model Expert's description of task
^
v
'
Knowledge engineer
Expert
Figure 5. Transfer of an expert's understanding to knowledge engineer's system.
the knowledge engineer simply wants to capture the existing problem-solving method. To do this, the knowledge engineer will ordinarily adopt a KE tool and then try to fit the fragments of expertise into the structure the tool provides. This requires the knowledge engineer to create a description of the way the expert thinks about and solves problems in that domain. This description models the expertise of the expert. Once implemented as a knowledge system, this model generates problem-solving behaviors that the expert can critique and improve. Often this improves the expert's self-understanding. Figure 6 depicts the iterative, evolutionary process of knowledge system development. This figure highlights the
ways testing a knowledge system feeds back to earlier stages of construction. As this figure indicates, testing can indicate shortcomings in all earlier stages.Thus, &s development progresses,there are usually changesin requirements, concepts, organizing structures, and rules. Toolsfor BuildingKnowledgeSystems Many software aids exist to simplify the KE task. In fact, &s discussedabove, most knowledge engineers build knowledge systems by adopting an existing tool and then constructing a problem-specific knowledge base. A KE tool offers aids for knowledge acquisition, knowledge base maintenance, validaREF()RMULATI()NS
RETINEMENTS
IDENTIFICATION
VALIDAIE THAI RUTTS ORGANIZE KN()WTEDGE
OESIGN STRUCTURE IO ORGANIZT KNOWLEDGE
IOENTITY PRt)BLEM CHARACTERISTICS C()NCEPTUALIZATION
FORMATIZATION
IMPLEMENTATION
Figure 6. Evolutionary processof'knowledge system development.
TESTING
296
SYSTEMS EXPERT
tion, and user interface design, ds discussedpreviously (see Fig. 2). Such software sits atop the programming environments (qv), programming languag€s,and operating systemsof its host computer systems. Over the past 20 years these tools have evolved, bottom up, from low-level languages to highIevel KE aids. What is a KE tool? It is more than software, or put another way, the KE tool software reflects a general KE viewpoint and a specific methodolory for building knowledge systems. The tool reflects a high-Ievel problem-solving paradigm. It may, for example, build in an assumption that solutions to diagnostic problems ought to reason from design documents and causal models. Or, conversely,it might reflect a preferencefor building diagnostic experts by capturing an expert's empirical symptom-problem associations.In short, a paradigm constitutes a high-level strategy for using knowledge to solve a class of problems. Today, different knowledge engineersare investigating diverse paradigms that vary along several dimensions: whether to use empirical associationsor reasonvia first principles from underlying causal models; whether to formulate knowledge in terms of formal logic or in terms of more informal heuristics; whether to aggregate knowledge into relatively large functional units or disaggregate it so it fits a small-grain-size format; and so on. Each paradigm suggestssome additional design properties for the knowledge system architecture, and a KE tool generally builds these properties directly into its knowledge base structure and inference engine. A tool such as S.1, for example, built expert systems only with rule-based, backwardchaining, monotonic, and singular line-of-reasoning architectures. Does this sound restrictive? On the one hand, these design constraints surely restrict what a knowledge engineer can do and what the consumer knowledge systemscan do. On the other hand, a tool like S.1 exploits its knowledge system design constraints to improve the quality and power of the assistanceit gives. Becauseit knows the form of knowledge in the knowledge base, the detailed operation of the inference engine, and the organization and control of problem solvitg, the KE tool can simplify the development tasks considerably. A KE tool offers a particular way to represent knowledge and, therefore, generally works well only for representing certain kinds of knowledge. Sometools emphasizeheuristic rules, others emphasize categorical taxonomies, and still others address simulation and modeling. Paired with each kind of knowledge representation, KE tools generally provide one way to apply that knowledge. A tool that builds a backward chaining knowledge system generally does not have the capability to build a forward-chaining system. A toll that helps reason with empirical cause-effect associations generally does not have capabilities to apply systematic search techniques to underlying causal models, and so forth. However, several research-orientedtools aim to provide a mixture of representations and inference techniques and may one day lead to more comprehensive KE frameworks. Examples of these include Xerox's LOOPS, Stanford'sMRS and AGE, Yale's DUCK, and Inference's ART. Tools generally provide someknowledge-programminglanguage facilities and knowledge-processingutilities. S.1 provides abbreviated forms for experts to express domain rules and allows the expert to browse the knowledge base for rules with arbitrary characteristics, such as rules that determine the value for the shear stress in a structural engineering
knowledge system. ROSIE, another research tool, provides a general-purpose symbolic programming langUage and assertional database within the context of a standard sequential, modular programming system (8). It doesnot provide any particular problem-solving architecture, however. As a final example the research tool RLL (9) provides only a hierarchical knowledge base organization and a very general agenda-based control scheme,leaving the knowledge engineer to implement all domain knowledge and problem-solving heuristics directly in LISP. The low-level symbolic programming languages themselves, notably LISP and PROLOG, provide even less structure. Although they do not restrict the knowledge engineer in any way, they do not provide any specificassistancein knowledge acquisition, knowledge representation, or knowledge system evaluation. In short, KE tools today span a wide range of software aids that reflect various assumptions about what kinds of knowledgesystemsto build and how to build them. Sometools, however, have evolved from dozensof related applications covering tens of person-yearsof development. These are discussed below in somewhat greater detail. SomeRelativelyMature KnowledgeEngineeringTool Classes. Throughout the history of AI many researchershave focused their efforts on developing tools to aid the construction of problem-solving systems. Generally, tools developedin advanceof applications have not coruectly anticipated needed capabilities. This lack of foresight reflects primarily the general naivet6 of researchersin a new and uncharted territory. Several families of applications have given rise to useful paradigffis, architectures, and related tools. Three of these families revolve around the MYCIN, HEARSAY-II, and Rl knowledge systems. These are illustrated in Table 2 and described briefly in turn. The MYCIN family originated with a rule-basedexpert system for the diagnosis and treatment of infectious blood diseases.The general methodology employed by MYCIN gave rise to a research tool called EMYCIN and a related system called TEIRESIAS that could assist the knowledge acquisition processin EMYCIN. Figure 7 illustrates the history of EMYCIN and its descendants.PUFF, an expert system for interpreting respirometer data and diagnosing pulmonary diseases, was the first actual application built using EMYCIN. 5.L combined many of the best features of EMYCIN and TEIRESIAS and has supported numerous commercial knowledge system
Table 2. Families of Systemsand Tools FamiIy
Tools
Systems MYCIN PUFF WAVES Drilling Advisor
EMYCIN TEIRESIAS
HEARSAY-II
HEARSAY.II HASP/SIAP ACRONYM PROTEAN
HEARSAY.III AGE BB1
R1
Rl(XCON) XSEL AIRPLAN
OPS 5 OPS 7 OPS 83
MYCIN
s.1
EXPERTSYSTEMS
Language
Tool
EMYCIN TEIRESIAS
Knowledge system
Drilling advisor
MYCIN
WAVES
Time Figure 7. Schematic history of EMYCIN.
developments.Two of these, called WAVES and Drilling Advisor, illustrate the breadth of systemsthat one can build using this tool. WAVES is an expert system that assessesa data analysis problem for a geophysicist and prescribes the best way to processthe data using selectedmodules from a millionline FORTRAN analysis package. Drilling Advisor, on the other hand, determines the most likely cause for a stuck oildrilling operation and prescribes expert curative and preventative treatments accordingly. The secondfamily with an extensive range of applications revolves about HEARSAY-II, one of the first 1000-word consystems (5). HEARSAY-II emnected-speech-understanding bodies its own general paradigm in a characteristic architecture. HEARSAY-II embodies the "cooperating experts" paradigm. This paradigm views complex knowlege systemsas collections of cooperating expert subsystems.In addition, part of the HEARSAY-II paradigm concernshow cooperating systems should interact. In particular, it proposes that they should exchange ideas via a global database called a blackboard (see Blackboard systems). Each independent source of knowledge should read the state of problem solving on the blackboard and contribute its own ideas by augmenting or modifying decisionson the blackboard. Although HEARSAYII itself solved a problem of understanding connectedspeech, many other applications and some tools have embraced its paradigm. The HASP and SIAP applications used the HEARSAY-II approachto interpret sonar signals (10), and the ACRONYM system exploited it to interpret photo images (11). Other applications of this general architecture have addressed problems in learning, planning, design, and information fusion. At the present time, PROTEAN, a major researchproject at Stanford, aims to develop a means for identifying the threedimensional shape of proteins using an enhancedblackboardbasedsystem called BB1 (L2) (seeBlackboard systems). Two general research tools have emerged thus far to support blackboard applications of this sort, HEARSAY-III, (13), AGE (L4), and BB1 (L2). HEARSAY-III provides a very general representation for intermediate results, for independent sourcesof knowledge, and for flexible control but assumesthe knowledge engineer will determine the appropriate problemsolving strategy and inference techniques. It also assumesthe
knowledge engineer will program knowledge directly in LISP. AGE, on the other hand, emphasizesmodular and customtzed control algorithms to facilitate experimentation. It too provides a particular representation for intermediate results and asks the knowledge engineer to use LISP to represent knowledge. Where HEARSAY-III expects relatively large modules of knowledge, AGE expects fine-grained rules. BB1 provides both a flexible framework for building systemswith cooperating knowledge specialists and a similar flexible and blackboard-based mechanism for implementing expert heuristics for making resource allocation and focus-of-attention control decisions.Many people believe that the blackboard architecture will becomeincreasingly important as KE takes on more difficult and important tasks. The third family of knowledge systemsrevolves around R1, which was renamed to XCON, a system to configure parts of a VAX computer (6). R1 solved a problem that proved intractable to conventional data-processing methods because it required thousands of heuristic rules to capture the full variety of engineering componentsand relations. The tool used to implement Rl is called OPS (15) (see A* algorithm). This tool reflects the paradigm known as a "pure production system." The underlying philosophy of the pure production system holds that truly intelligent behavior, such as that of humans, consistsentirely of many fine-grained, independentconditionaction rules called productions. OPS makes it easyfor a knowledge engineer to write such productions. OPS also includes an excellent compiler that eliminates many redundant computations and acceleratesthe selectionof the first matching rule on each cycle of computation. OPS provides a simple but uniform representation method for describing problem-solving state data and for specifying rule conditions that match these state data. To program the action componentsof rules, OPS expects the knowledge engineer simply to alter the intermediate state data by changing property values or to write specializedLISP code. OPS has been applied to a variety of applications. Two examples include XSEL, a program that DEC sales personnel can use to help a customer order and plan the layout for a VAX computer, and AIRPLAN, a knowledge system to plan and scheduleflight training on board a U.S. Navy carrier. All ap-
298
EXPLANATION
plications of OPS exploit its capability to perform general computations specifiedin relatively independent rules. Each rule provides for a data-driven action. This gives OPS applications the flavor of interrupt-driven, or data flow, computations.Unlike many other KE tools, however, OPS provides little structure for representing facts, relationships, and uncertain knowledge, and it does not contain a general architecture for problem solving. Instead, it provides a novel schemefor pattern-directed inference that makes some symbolic programming tasks sirnple. Current Statusof Tools. Tools will play a major role in the industrtalization of KE. Their power derives from the paradigms, architectures, representations,inference engines,utilities, and programming systems they embody. Good tools will offer all of these kinds of aids to the knowledge engineer. As a consequence,good tools will require considerablework to develop. They approach in complexity and value the CAD, CAM, and CAE tools used in design, manufacturing, and engineering (see Computer-aided design; Computer-integrated manufacturing). Different KE tools will be desired, however, for different kinds of applications with different design requirements using different kinds of knowledge and specialtzed kinds of inference.Ultimately, KE tools will diversify in ways akin to electronic instruments. Becauseknowledge comesin different formats for different uses, tools appropriate to those uses will vary in form, purpose, and architecture. KE is a very young field. Today's best tools have derived from many years' experience applying the same generat kind of research tools repeatedly to a wide variety of applications. Out of that experiencecomesvalid and useful criteria for tool designs.In the next few years many new kinds of applications will arise, and development of corresponditrgtools will lag those applications by several years.
BIBLIOGRAPHY 1. F. Hayes-Roth, D. A. Waterman,andD. B. Lenat,Build,ingExpert Systems, Addison-Wesley, Reading,MA, 1988. 2. R. K. Lindsay,B. G. Buchanan,E. A. Feigenbaum, andJ. Lederberg,Applicationsof Artifi,cialIntettigence for OrganicChemistry: TheDENDRAL Project,McGraw-Hill,New york, 1990. 3. w. A. Martin and R. J. Fateman,The MACSYMASystem,in Proceedingsof the SecondSymposium on Symbotic and Algebraic Manipulation, pp. 59-75, L97I. 4. E. H. Shortliffe , Computer-BasedMedical Consultation: MYCIN, American Elsevier, New York, Lg7G. 5. L. D. Erman, F. Hayes-Roth,v. Lesser,and D. Reddy, "HEARSAY-II speech-understandingsystem: Integrating knowledge to resolve uncertainty," Comput. Suru. lz2), pp. 2 ls-zs} ( 1gg0). 6. J. McDermott, Rl: An Expert in the Computer SystemsDomain, in Proceedingsof the First Annual National Conferenceon Artificial Intelligence,Stanford, CA, pp. 2G9-271, 1980. 7. H. E. Pople,J. D. Myers, and R. A. Miller, DIALOG INTERNIST: A model of diagnostic logic for internal medicine, Proc. of the Fourth IJCAI, Tbilisi, Georgia,849-888, Lg7S. 8. F. Hayes-Roth, D. Gorlin, S. Rosenchein,H. sowizral, and D. Waterman, Rationale and Motivation for ROSIE, Technical Report N-1648-ARPA,The Rand Corporation, 1981. 9- R. Greiner and D. Lenat, A RepresentationLanguage Language,
in Proceedingsof the First Annual National Conferenceon Artificial Intelligence,Stanford, CA, pp. 165-169, 1980. 10. H. P. Nii, E. A. Feigenbaum,J. J. Anton, and A. J. Rockmore, Signal-to-symbol transformation: HASP/SIAP case study," AI Magazine 3(2), L982. 11. R. Brooks, R. Greiner, and T. Binford, The ACRONYM modelbasedvision system,Proc. of the Sixth IJCAI, Tokyo, Japan, 10b1 1 3( 1 9 7 9 ) . 12. B. Hayes-Roth, "A blackboard architecture for control," J. Artif. I ntell. 26, 25I-32I ( 19Sb). 13. L. D. Erman, P. E. London, and S. F. Fickas, "The design and an example use of HEARSAY-III," Proc.of the SeuenthIJCAI, Vancouver,Brit. Col., 409-415 (1981). 14. H. P. Nii and N. Aiello, "AGE: A knowledge-basedprogram for building knowledge-basedprograms," Proc. of the Sixth IJCAI, Tokyo, Japan, 645-6bb (1g7g). 15. C. L. Forgy, The OPS4 Users Manual Technical Report, Technical Report CMU-CS-79-L32, Computer Science Department, Carnegie-Mellon University, Ig7g. General References S. Brownstone, R. Farrell, E. Knat, and N. Morten, programming Expert Systemsin OPSS:An Introduction to Rule-Based Programming, Addison-Wesley,Reading,MA, 198b. B. G. Buchanan and E. H. Shortliffe , Rule-BasedExpert Systems,Addison-Wesley,Reading,MA, 1984. F. Rose,Into the Heart of the Mind, Harper & Row, New york, 1gg4. S. Shamoan, "The expert that thinks like an underwriter," Menagement Technology,Pebruary 1985, pp. S4-Sg. D. Stamps, "Expert systems," PW lsoftware Pubtishing and Setting Vol. 36, September1984. M- M. Waldrop, "The intelligence of organizations," Science, ZZ5, 1 1 3 6 - 1 1 3 7( 1 9 8 4 ) . F. Hevns-RorH Teknowledge,Inc. The author gratefully acknowledgesAddison-Wesleyand IEEE Computer for granting permission to reprint figures. Drilling Advisor represents the work of numerous technical personnel at Elf-Acquitaine and Teknowledge, chief among these Cliff Hollander and JacquesMarie Corteille.
EXPLANATION Trust in a computer system comesnot only from the quality of its results but also from the assurancethat the system's reasoning is sound and appropriate to the task at hand. Explanation is the problem of producing machine-generateddescriptions of the operation of a computer system-what it does,how it works, and why its actions are appropriate. In addition to the problem of actually producing explanations, the area of explanation research also includes the problem of designing computer systems so that they can be explained. Becauseproviding explanations is so much a part of being a consultant (human or machine), explanatory capabilities are crucial to the ultimate acceptanceof expert systems (qv) (1). For that reason,most of the work on explanation has taken place in the context of expert systems, although other areas, such as software development,have been addressed(2). In addition to increasing users' confidencein a system and allowing them to assessits applicability to a problem, an ex-
EXPLANATION
planation facitity can also be useful to system developersas a debugging tool. Machine-generated explanations often make errors apparent that are easily overlooked in more formal program code.The emors are revealed partly becausethe explanations are simply more understandable but also becausethe explanations provide a different viewpoint on the system, €ffiphasizing aspectsof the system obscuredin the formal representation. An explanatory facility can also serve as an important component of a tutoring system (3) (see Education applications). Approachesto Explanation The simplest (and most primitive) way to provide explanations is to anticipate in advance all the questions that might be asked and store the answers as text. When a question arises during system execution, the explanation facility finds and displays the correspondinganswer. This very simple approach (sometimes called the "canned-text" approach) is frequently used to provide limited on-line documentation for text editors and operating systems. This approach is practical for small, slowly changing systems.However, several problems limit its applicability to larger, rapidly changing systems.The fact that the program code and the text strings that explain it can be changed independently makes it difficult to assure that the text strings describe what the code actually does. Another problem is that all questions and answers must be anticipated in advance.For large systemsthis may be a nearly impossible task. Finally, the responsesare inflexible. Becausethere is no model or internal representation of what is actually being said, it is difficult to customize the system's responsesto particular situations or avoid repeating information that has already been presented. These limitations are slightly ameliorated by allowing the text strings to contain blanks that are filled in depending on the context in which the explanation occurs.However, this fillin-the-blank approach still does not solve the problems because the meaning of the responseand the interrelationships among parts of the responseremain implicit. Another approach is to produce explanations by paraphrasing systemcodeand, possibly,tracesof its executioninto natural language. This technique has been successfully used to describe how a system works and how it handles particular cases(1,4,5). See Figure 1 for an example of an explanation produced by MYCIN (1). Because explanations are produced by directly translating the code,they remain consistent with the system's behavior even if it is rapidly evolving. However, although this approach can describe how a system works, it cannot describe why it does what it does.
RULEOOg ( $: A N D( S A M EC N T X TG R A MG R A M N E G ) PREMISE (sAMECNTXTMORPHCOCCUS)) (CONCLUDE ACTION: NEISSERIA TALLY800) CNTXTIDENTITY lF: 1) Thegramstainof the organismis gramneg,and 2) The morphology of the organismis coccus THEN:Thereis stronglysuggestive evidence(.8) that the identity of the organismis Neisseria Figure 1. An explanation of a MYCIN rule.
299
The problem is that justifying a system's actions requires knowledge of the design decisions behind the system. These decisionsdo not have to be represented explicitly for the system to perform correctly. Just as one can follow a recipe and bake a cake without ever knowing why the baking powder is there, so too an expert system can deliver impressive performance without any representation of the reasoning underlying its rules or methods. However, the absenceof this knowledgemakes it difficult to add explanation facilities to existing systems. Recent work in expert system architectures has attempted to capture this missing knowledge in two ways. In NEOMYCIN (6) descriptive facts of the domain (such as causal relations and disease typologies) are explicitly represented and separated from metarules (see Metaknowledge, metarules, and metareasoning) that represent the diagnostic strategies the system employs. These metarules capture the purposebehind the system's actions and can be used to provide justifications. The XPLAIN framework (7) also separates problemsolving and descriptive domain knowledge.In addition, it uses a "program writer" to compile an efficiently executableexpert system from the problem-solving and descriptive knowledge. As the program writer createsthe expert system,the decisions it makes are recorded in a development history that provides explanation routines with the rationale behind the system. Figure 2 presents an example explanation produced by the XPLAIN system. Note that this explanation presents a causal rationale describing why it is important to check serum potassium. Such rationales are part of the deeperknowledge behind a system that is not part of the code but that needs to be captured to give adequate explanations. Currentlssues To date, most of the work on explanation has focusedon the problem of representing sufficient knowledge to make explanations possible. More recently, researchers have begun to focus on the problem of producing an explanation. Current explanation facilities use simple explanation generators that are somewhat ad hoc. The first issue in producing explanations is recognizing and understanding the user's need for information. In current systems the user usually indicates that he or she doesnot understand something by asking a question. This is a severelimitation becausethe user must recognize that an explanation is needed.Somerecent work has addressedthe problem of inferring the need for explanation based on a user's actions (8,9). The work of Miller and Black (10) presents another style of interaction. They have created an expert system for drug therapy that does not offer advice directly but instead provides a detailed critique of therapy plans that a user provides. The system offers explanations on its own initiative when it discovers significant differences between its recommendationsand the user's. Once the need for information is identified, the next issue is to create a responsethat will provide it, taking into account what the user already knows and selectinga presentation style that is appropriate. Researchon this problem of text planning is still at an early stage (see Refs. 11 and L2 and Natural-language generation). A final issue that is beginning to receive attention is that explanation really involves a dialogue between the system and a user (13). In a sophisticated explanation system a user should be able to interact with the
300
FEATUREEXTRACTION
Pleaseenterthe valueof serum potassium:why? The systemis anticipatingdigitalistoxicity.Decreasedserum potassiumcauseslncreased which'may-aule a changeio ventricularfibrillation.Increaseddigitalisalsocauses automat'rcity, observesdecreasedserum potassium,it reducesthe dose if the sys-tem i"ii"jiiJ iijtomaticity.-Thus, of digitalisdue to decreasedserum potassium. Figure 2. Explanation producedby XPLAIN.
system to discussmisconceptions and request further clarification of its explanations. Other Benefits Although explanation facilities were originally developedto make systems more acceptable to users, they have provided additional benefits. Capturing the knowledge required to provide justifications forced the development of systems to be more explicit and principled. Separation of different kinds of knowledge such as problem-solving and descriptive domain knowledge was originally necessaryso that they would not be confounded in explanations, but this separation makes systems more modular and hence easier to maintain. Thus, the benefits of an explanation facility can go beyond explanation.
b. W. Swartout, "A Digitalis Therapy Advisor with Explanations," tn Proceedings of the Fifth International Conferenceon Artificial Intelligence, Cambridge, MA. pp. 819-825, L977. 6. W. Clancey, "NEOMYCIN: Reconfiguring a Rule-Based Expert System for Application to Teaching," Proc. of the SeuenthIJCAI, Vancouver,BC, August 1981,pp. 829-836. 7. W. Swartout, "XPLAIN: A system for creating and explaining
expertconsultingsystems,''Artif.Intell.21(3),285_325(1983). (Also available as ISI/RS-83-4.) 8. W. Mark, Representationand Inference in the Counsul System,in of the SeuenthInternational Joint Conferenceon ArtifiProceed,ings Vancouver,BC, August 1981,pp. 375-381. Intelligence, cial g. J. Shrager and T. Finin, An Expert Systemthat Volunteer Advice, tn Proceedingsof the SecondNational Conferenceon Artifi'cial Intelligence,AAAI, Pittsburgh, PA, 1982,pp. 339-340' 10. P. Miller and H. Black, "Medical plan-analysis by computer: Critiquing the pharmacologic management of essential hypertension," Comput.Biomed. Res. 17,38-54 (1982). BIBLIOGRAPHY 11. W. Mann and S. Thompson,Related Propositionsin DiscoLtse,ISI/ RR-83-115,USC/Information SciencesInstitute, Marina del R"y, ExpertSystems L. B. G. Buchananand E. H. Shortliffe,Rule-Based cA, 1983. TheMYCIN Experimentsof the StanfordHeuristicProgramming tZ. K. R. McKeowD, Text Generation, Cambridge University Press, Reading,MA, 1984. Project,Addison-Wesl.y, Cambridg", UK, 1985. Z. W. Swartout, The Gist Behavior Explainer, in Proceedingsof the 13. M. Pollack, J. Hirschberg, and B. Webber, User Participation in Third, National Conferenceon Artificial Intelligence,AAAI, Washof Expert Systems,CIS-82-10.University the ReasoningProcesses ington, DC, 1983, pp. 402-407. (Also available as ISI/RR-83-3.) of Pennsylvania, Philadelphia, PA, 1982.(A short version appears B. W. Clanc€y,Transfer of Rule-BasedExpertiseThrough aTutorial in Proceedingsof the AAAI-82, Pittsburgh, PA, pp. 358-361.) Dialogue, STAN-CS-769, Stanford University Computer Science Department, Stanford, CA, L979. 4. T. Winograd, A Computer Progranx for Understanding Natural W. R. Swenrout Language, TR-I7, MIT Artificial Intelligence Laboratory, CamUniversity of SouthernCalifornia bridge, MA, L97I.
EXTRACTION FEATURE Feature extraction generally refers to the reduction of a complex signal (or a piece of the signal) to a set of numbers that .un be used, .for example, to recognize the signal. In AI the signals are most often either speech (see Speech recognition; Sfeech understanding) waveforms (one-dimensional signals) or images (ordinarily two-dimensional signals, but sometimes three-dimensional signals) (see Image understanding). Only two-dimensional images are considered in this entry; similar remarks would apply to many other types of signals. For a more complete discussion of all of the material covered in this
entry, the reader is referred to one of the comprehensive textbooks on computer vision such as Rosenfeld and Kak (1) or Ballard and Brown (2). Perhaps the most common set of feature extractors are local operators, which are applied to all small neighborhoods (e.9., 3 x 3 blocks of pixels although much larger blocks are ofben used) of the image. Edge detectors (qv) are described in a separate entry [see also the survey by Davis (3)]; however, it is also possible to design feature extractors for other types of local thin lines, corners, spots, thick lines (often patterns-€.g., called streaks), etc. Such features are usually detected using linear operators that can be realized as convolutions and effi-
FEATUREEXTRACTION
301
ciently implemented using special-purposehardware. However, it is often the casethat simple nonlinear operatorscan be designedthat will detect the features of interest more specifically than the linear operators.For example, to detect thin, vertical lines in an image, the following linear operator might be employed: -1 2 -1 -1 2 -1 -1 2 -1
One can also define geometric properties that measure aspects of the shape of a picture subset. Classically, the ratio of perimeter squared to area is a measure of shape cornpactness. It is generally low for simple, convex figures and high for complex figures with tortuous boundaries. Many methods have been proposedfor measuring the elongatednessof a picture subset. The simplest, perhaps, is the aspect ratio of the smallest upright rectangle enclosing the subset. Other geometric features include the following:
The responseof this operator at any pixel in an image is the convolution of this 3 x 3 pattern of numbers with the intensities in the 3 x 3 neighborhoodof the pixel. In a completelyflat area of the image, the responseis 0; if a pixel is centeredon a vertical line, the responseis proportional to the contrast of the line (i.e., the differencebetween the averageintensity of pixels on the line and the average intensity of the background).However, notice that this operator can also give a strong response to a bright, isolated pixel. Thus, it doesnot give high response specifically to vertical lines. A simple nonlinear operator might require that the central pixel in each row of the 3 x 3 neighborhood be brighter (or darker) than both its left and right neighbors. The addition of such logical conditions, although making the operation computationally more costly, also makes the detection more specificto the feature of interest. Similar remarks apply to the other types of local features described. The detection of local features can often be enhanced by utilizing iterative feature extraction algorithms such as relaxation algorithms, which iteratively update the "probability" that a neighborhood correspondsto a given feature (e.g.,a vertical edge)basedon the probabilities that adjacent neighborhoods correspond to other given features. The article by Davis and Rosenfeld(4) contains a survey of relaxation processesin image processing. The relative locations in which such features appear in an image can be used to recognizemore complicatedstructures in the image using both statistical and structural techniques from pattern recognition (qv). More generally, given any subset of points from an image there are many properties that can be computed for that subset that are often useful for recognition. These properties can be broadly classifiedas photometric, geometric, or topological. The simplest photometric properties are the moments of the distribution of intensities for the pixels in the subset, such as the average intensity and the variance of intensity. For color images, &r even wider variety of useful photometric features can be identified-e.9. , hue, saturation, and various color ratios. Chapter 2 of the book by Ballard and Brown (2) contains an introduction to analysis of color images.More complexphotometric properties are generally referred to as textural properties and are often basedon statistics of higher order distributions of intensities or colors. The paper by Haralick (5) contains a comprehensivesurvey of image texture analysis. Geometric and topological properties depend only on the positions of points in the subset and not on their intensities. The simplest geometric property is area,which correspondsto the number of points in the subset.The perimeter of a subsetis somewhatmore difficult to define.If S is the picture subsetand if T is the complement of S, the perimeter of S can be defined as the number of distinct pairs of adjacentpixels, (s, t), where s belongs to S and / belongs to T. If adjacency is restricted to being either horizontally or vertically adjacent, then if S contains only a single pixel, it would have a perimeter of 4.
1. The diameter of a subset (greatest distance between any two points in the subset). 2. The moments of the subset (If the subset S is composedof points {(xi,il}|:r, the (j,k)thmoment of S, rnikis definedto be
i
xjiY!
j:1
which includes, as specificcases,the centroid of the subset, m0rln and mLuln. It is possible to define combinations of such moments that are invariant to rotations and scalings of the original set S. Such mornentinuariants are often used in shape recognition. 3. Statistics of the distance transform (the distances of each point in S from the complement of S), or, more specifically, the skeleton (the points in S equidistant from two points in the complement of S ) of the subset.The skeleton of a set S is also often referred to as the medial axis transform of S. The medial axis transform can be determined analytically for a set S having polygonal boundaries (in which case it can be shown that the skeleton is composedof straight-line segmentsand parabolic arcs).It was originally proposedby BIum (6) as a model for biological forms. Topological properties are based on the connectivity of S. Atry set S can be decomposedinto a set of connectedcomponents that are the largest connectedsubsetsof S. A component of S is called simple if it contains in its interior no components of its complementT.If S is not simple, the componentsof 7 in its interior are called holes. The number of holes in a component could be used, for example, to distinguish between a B, which has two holes, and & D, which has a single hole. One can generahze the idea of feature detection to include relationships between subsetsof a set sincesuch relationships are often crucial for recognition and subsequentanalysis. It is often useful to detect symmetries such as parallelism; collinearity or, more generally, "goodcontinuation" betweena pair of curves are also important relationships for describing complex shapes.Relations such as above,below, left-of, and rightof are also used for describing complex shapes;however, their definitions are ordinarily complex since simple mathematical definitions (e.g.,S is above T if the y coordinate of every point in S is greater than the y coordinate of every point in T) are often too stringent to capture the connotation of these relations in common usage by people.
BIBLIOGRAPHY 1. A. Rosenfeld and A. Kak, Digital Press, New York, 1982.
Picture Processing, Academic
302
FOL
2. D. Ballard and C. Brown, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, 1982. 3. L. Davis, "A survey of edge detectots," Comput. Graph. Img. Proc. 4,248_270 (1975). 4. L. Davis and A. Rosenfeld, "Cooperating processesfor low-level vision: A survey," Artif. Intell. L7r 245-263 (1981). S. R. Haralick, Statistical and Structural Approaches to Texture, Proc.IEEE 67,796_804 (1979). 6. H. Blum, A Transformation for Extracting New Descriptions of Shapes, in W. Wathen-Dunn (ed.), Models for the Perception of SpeechandVisual Form, MIT Press,Cambridgu,MA, pp. 362-380, L967.
L. S. Davts University of Maryland
FIFTH.GENERATION COMPUTING. See Computer systems; Logic Programming.
See Programming environF I L E . M A I N T E N A N C E SYSTEMS. ments.
SeeRobotcontrolsystems. PROBLEM. FINDSPACE FOL Primarily a proof checker for proofs stated in first-order logic (qv), FOL was developedby Weyhrauch and Filman, around 1975, at Stanford University, FOL also provides a sophisticated, interactive environment for using logic to study epistemological problems (see R. Weyhrauch, Prolegomena to a Theory of Mechanized Formal Reasoning, Report No. STAN-CS-78-687, Computer Science Department, Stanford University, 1979). K. S. Anone SUNY at Buffalo
amounts of knowledge needed in a commonsensereasoning (qv) system, but more important, he wanted to create an enormously descriptive database that encoded knowledge in a structured, yet flexible manner. The structure provided by the knowledge base would allow a computer system to impose coherence on its "experience" (input information), and the flexibility would allow the system to accessappropriate information in novel situations whose occurrence could not be anticipated in advance. Briefly, Minsky envisioned a schemewhere knowledge was encoded in packets, called frames ("frame" is based on the metaphor of a single frame in a film), and frames were embedded in a retrieval network, called a frame system,so that if one frame was accessed,indices to other potentially relevant frames would become available. A frame would be activated whenever one encountered a new situation; the tricky part would be to get the appropriate frame to be activated in the appropriate situation, and this would be the responsibility of the frame system. Part of the motivation of the use of frames at all, rather than restricting oneself to the use of more elemental propositions,was that a frame would be a large enough unit of knowledge to impose structure in a new situation, yet would be modular enough to be used as an element in a flexible database.Although Minsky's frame paper servedas a rallying point in AI, setting off a flurry of research,aspectsof the frame notion-namely, that modules of knowledge imposecoherence on experience-is traceable back to the schema notion of Bartlett (4). In linguistics Fillmore (5) used the term "case frame" and his theorizing evolved into a blend of prototype theory and frame theory (6) (seeGrammar, case).In sociolory Goffman (7) "borrowed the term [frame] from Gregory Bateson to refer to analytical frameworks within which human experience can be made inteltigible" (8). Minsky noted that his "frame" is in the tradition of Kuhn's paradigm used in history of science. He also noted (1) related ideas in AI, namely, Abelson (9), Minsky and Papert (10), Newell and simon (11), Norman (L2), and Schank (13). More recently, Schank and Abelson (14) and Schank (15-17) have conductedextensive work on what might be called frame-theoretic knowledge structures.
Terminology. In fact, the term "frame theory" as introduced above is ambiguous in whether it denotesthe theoretical deF O R W A R D C H A I N I N G . See Processing, bottom-up and topvelopments deriving from the frames paper or whether it simdown. ply denotesthe term "higher level knowledge structura," such as the "scripts" (qv) (based on the metaphor of a script for a play) of Schank and Abelson (14), the beta structures of Moore and Newell (18), or the "schema"notion of Bartlett. (To make FRAMETHEORY matters worse, there is also the so-called frame problem, Frame theory is a paradigm for representing real-world which has nothing directly to do with frame theory or knowlknowledge so that the knowledge is usable by a computer.This edgestructures.) The term frame was most popular in the mid (19), in which one of entry reviews the history and motivation of frame theory, its to late seventies.In a recent AI textbook the authors is a well-known frame theorist, the term frame use in AI, and the structure of frame-based AI languages that theory does not appear in the index, nor are frame langUages were in part motivated by frame theory. or frame systems covered.Instead, the issues of frame theory the primarily as are subsumed under the topics of memory organization and AI in emerged Beginnings.Frame theory (1) Minsky, abductive inference. Sowa (20), in an AI textbook, includes Marvin by written report technical a of result prototypes."Winston's which was subsequently published in Ref. 2 and then again framelike conceptsunder "schemasand of com"representation under (21) frames discusses textbook Minsky's "frames published in highly abridged form in Ref. 3. Intelligence Artificial of Handbook In knowledge." monsense construct to effort an represented paper," as it becameknown, and scripts." u fru*ework, or paradigh, to account for the effectiveness of (22), frame theory is discussed under "frames "structured under concepts frame includes text Qg) commonsensethought in real-world tasks. In Pad, Minsky Nilsson's "units." and representations" object encyclopedic the wanted to construct a database containing
FRAMETHEORY
303
Since the general use of higher level knowledge structure trolled by accessprocessesoperating on the retrieval network was fashionable at the time, and with the influences between in the frame system. In the above case the relevant frame research groups being difficult to untangle, this entry often system would be that for a house, and the door and room uses the word "frame" to generically mean "knowledge struc- frames would be subsystemsin this system. Given that certain ture" or "higher level knowledge structure" when that seems frames were active, adding the information to the database appropriate. There are also frame languag€s, which were at that you opened a door would serve as an accesstrigger from least in part developedin order to implement frame basedAI the currently active door frame to a room frame. Furthermore, programs. This entry is concernedwith those as well. Often the door frame and the room frame would share whatever implementations of frame languagesare called frame systems, substructure was common between them. Minsky called this but this usage is different than Minsky's (1) use of the term. the sharing of terminals and consideredit an important feaFor Minsky a frame system was the retrieval network in ture of frame systems because it meant a great savings on which the frames were embedded. recomputing information. Although here the door frame and room frame are treated as being autonomous,they too would Intent of FrameTheory. One of the best ways to communi- be embeddedin frame systems. For instance, the room frame cate the intent and spirit of frame theory is to use one of system would contain frames describing the appearanceof the Minsky's well-known examples, consisting of an imaginary room at different viewpoints, and they would be tinked by anecdotedescribing the correspondencesand interactions be- movements of the viewer. tween a person's expectations,perceptions, and senseexperiIn terms of the anecdotal example, descriptions were used enceas he opensa door and enters a room. Kuiperc (24) elabo- that had the character of a folk psychology;these folk psycholrated this example with more detail and much of the ory descriptions are characterized below in the context of discussionis drawn from him. Supposeyou are about to open a frame systemsand higher level knowledge structures. Someof door to enter an unfamiliar room in a house.Being in a house, these are: recognition of a situation as being of a certain cateand prior to opening the door, you have expectationsor predic- gory (such as realizing that you are in a room); interpretation tions about what will be visible on the other side of the door. of the situation in terms of that category (such as realizing For instance, if you were to see a landscapeor seashoreupon that the room is in a house);prediction of what else is to arise opening the door, you would first have difficulty recognizingit; in the situation (such as expecting to seea piece of furniture); you would, upon recognizing it, be quite surprised; and finally, surprise at failed predictions (such as identifying a fire hyyou would be somewhat disoriented becauseyou could not in- drant when one has construedhis situation as being in a living terpret the input information and would be at a lossto choosea room); disorientation when a category cannot be found to inset predictions about what is to happen next. This, so the anal- terpret the situation (as when you reahze you are not in a ysis goes, is becausea "room frame" has been activated as a living room, but have no alternative hypotheses);and possible function of your opening the door and the frame plays a major reinterpretation of the situation. As mentioned in the beginrole in controlling your interpretation of perceptual input. The ning of this entry, the goal of frame theory was to account for room frame even comes with certain default predictions (see the effectivenessof commonsensethought in the performance Reasonitg, default): You are expecting a room with a certain of real-world tasks. These phenomena,so familiar to everyone kind of shape; you would experience surprise upon seeing a who does commonsensethought, may seem mundane. Recylindrical room or upon entering the inside of a geodesic. counting them may seemtrivial. However, it was a basic tenet Upon entering the room (which you expectedto find), if you of frame theory that an attempt to mimic these phenomenain saw a bed, your room frame would get specialized to a "bed- a computer system could lead to the development of a more room frame." In other words, you would accessthe most spe- intelligent computer system.This can be taken as the intent of cific frame available. Possibly, you could utilize the informa- frame theory. tion that you are in a room to facilitate your recognition of furniture. This is often called top-down processing (see ProFramelanguages. Minsky also introduced terminolory and cessing,bottom-up and top-down), or in the context of frame sketches for what he thought a frame language might look theory, frame-driven recognition. However, if you saw a float- like. This terminolory included terms such as "frames,', ing fire hydrant (25), you would again have difficulty rec ognrz- "slots," "terminals," and "default assignments." The frames ing it, experience surprise after identifying it, and probably paper thus played a role in spawning, or influencing, two lines experience disorientation because your input information is of research. One line was focused on the high-level goals of apparently inconsistent with the predictions of the currently frame theory as stated above,and the other line was concerned active frame. Indeed, psychologists (e.g., Biederman) have with developing "frame languages" along the lines of Minsky's demonstrated experimentally that drawings of objectsare eas- suggestions.Many of the same researcherswere working on ier to identify (as indicated by reaction time and error rate) in both lines of research simultaneously. It must also be observed their usual context than in an anomalous context. that many of the researchersworking in what this entry treats From this example it can be seenthat a frame, as originally as "frame theory" did so for their own independentreasonsand envisioned,was a module of knowledge that becameactive in a did not necessarily conceptualizein terms of frames frame or presumably appropriate situation and servedto offer interpre- languages (e.g., Newell, Schank, Norman, and Rumelhart). tation of, and new predictions for, that situation. Minsky made More is said about frame languages at the end of the entry. vague suggestions about the nature of a data structure that could do this sort of thing. He proposedthe notion of a "frame system" that consistedof a collection of related frames many of FrameTheory and Folk Psychology which shared the same subcomponents(he referred to them as Although the examples in the frame paper were divided beterminals) linked by a retrieval network. Thus, as one walked tween perception and language, and Minsky viewed both probthrough a house, one's course of expectations would be con- lems as roughly being of the same nature, there have been
FRAMETHEORY
many more frame-based applications to language than to visual perception (but see,e.g., Refs. 26 and 27).In particular, much frame-theoretic research has been done in the context of natural langu age and story understanding (see Ref. 28 for a discussionof possiblereasons).This section sketcheshow folk psychology descriptions of thought can be characterized in terms of frame theory. Recognition,lnterpretation,and Prediction.As a simple illustration of recognition, interpretation, and prediction, consider the two sentence sequencesbelow, taken from Schank and Abelson (14). At the global level the first sentencesequence (A) is considerably different than the second(B).
prediction was easy once recognition was achieved. Recognition involved accessingthe correct higher level structure; the structure encoded predictions; interpretation involved a simple manipulation of that structure (called script application) to retrieve the predictions. In the general caserecognition is not trivial and is the most important aspect of the theory. Misrecognition, Interpretation, and Reinterpretation.Consider another story segment, given by Collins, Brown, and Larkin (33), which has been used by Charniak (34) and O'Rorke (35) as challenge casesfor testing story understandmg programs. SequenceC:
SequenceA: 1. John went to a restaurant. 2. He asked the waitress for a hamburger. 3. He paid the tip and left.
1. He plunked down $5 at the window. 2. She tried to give him $2.50,but he wouldn't take 3. So when they got inside, she bought him a large bag of popcorn.
This example is interesting becausefor most peopleit invokes a cycle of repeated incorrect or incomplete recognition and reinterpretation. Many people, upon reading the first sen1. John went to a park. tence, invoke a bet at horse race frame. Upon reading sentence 2. He asked the midget for a mouse. 2 they interpret it as an attempt to return change. Finally, 3. He picked up the box and left. sentence 3 triggers a recovery from the misrecognition as a scenario to recognition as a movie scenario embedded betting Although correspondingsentencesin these sequencesare comscenario. Additionally, the role assignment for in dating a parable in syntactic structure and type of semantic informa2 and 3 must be changed from cashier to in sentences "she" tion conveyed in the literal meaning, comprehensionfor the partner. dating sequencesas a whole differs radically. SequenceA successfully Norvig (36) and O'Rorke (35) have implemented systems accessessome kind of higher level knowledge structure (e.g., can recover from simple misrecognitions. The example that the restaurant frame, or the restaurant script) and sequenceB that O'Rorke's program works on is given below. fails to accessa comparable structure. If A did not accesssuch a knowledge structure, one'scomprehensionwould be reduced 1. John put two quarters in the slot. to the level of B and could be characterized as disorientation. 2. Then he started his first game. This contrast provides a striking example of the immediate payoff of invoking higher level knowledge structures. Charniak (2il represents the earliest attempt to consider In O'Rorke'simplementation, the initial sentencesignals both methods for the processingof stories like A, which are about the vending machine and video game frames, and the system stereotyped situations such as restaurant dining or a child's initially elaborates both frames while noticing that they are birthday party. Minsky (1) argued for the need for rich default incompatible becausethe coin insertion event cannot be asstructures for this kind of story. Schank (30), in a paper enti- signed to a video game slot and a vending machine slot simulpoint the tled "IJsing Knowledge to Understand," and Schank and taneously (a consequenceof interpretation); at this while frame machine vending the chooses arbitrarily Abelson (14) specifically proposedthe use of a script for such system describes sentence The second preserving elaborations. both (32) stories, and Cullingford (31) and Schank and Reisbeck game implemented that proposal in a program called SAM, a pro- an event that is recogntzedas part of the rejected video (qv) backtracking dependency-directed triggeritrg thus frame, such and summarizes, questions about, gram that answers the rejected and is (e.g., frame machine Ref. S7); the vending stories. A script (qv) is a kind of frame that is specialized frame comit is the only because game is selected frame video toward describing stereotyped event sequences.For example, This can be viewed as with respect to story A, SAM can answer the following ques- patible with all of the given information. reinterpretation. to leading surprise by in the story, stated tions, whose answers are not explicitly accessinga record of the predicted event sequencefor restauFrame-DrivenRecognition.Considerthe example sentences rant dining. below, which contain ambiguous words, taken from Charniak and McDermott (38). Word disambiguation will be treated as a Did John sit down in the restaurant? form of recognition. Did John eat the hamburger? SequenceR:
In terms of the folk psychology processesmentioned previously, the SAM program must recognize the situation being deseribed as that of restaurant dining and then predict the likely event sequence.In story A, recognition was trivial and
Example The programmer
was near the terminal.
The plane was near the terminal.
FRAMETHEORY
305
Example 2:
Table 2. Some of Wilensky's Text Comprehension Principles
The porter studied the case. The lawyer studied the case.
Coherence:Determine frames that provide a coherentconstrual of the input. Concretion: Determine as specifica frame as possibleconsistent with the input. Exhaustion: Determine enough frames to account for all of the input. Parsimony: Determine frames that maximize connections between inputs.
On the basis of these examples, it appears plausible that the ambiguous word (e.g.,"terminal" in example 1 and "case" in example 2) is disambiguated on the basis of the frame that was recognizedearlier in the sentence(e.g.,Refs.gl, gg-4I). Subjectively, one is often aware of only noticing the correct sense of the word without realizing that there are alternatives. One way of approaching this is to store a lexicon with each frame. When that frame is activated, the associatedlexicon is searchedfor word meaningsprior to the global lexicon. In terms of frame theory there is recognition of a frame controlling interpretation processeswhich, in turn, control recognition of subsequentinput. This is sometimes called top-down processing or frame-driven recognition. A striking application of frame-driven recognition appears in DeJong's (42) FRUMP progr&h, also describedin Schank and Abelson (14) and Winston (21), for summarizing newspaper stories about certain classesof events, such as terrorism and earthquake disasters. This program keeps a tabulation of the things that are supposedto be described in each kind of story, and this tabulation drives the program's recognition processfor describedevents. FrameManipulation So far essentially no detail has been provided about the specifics of frame representations and the specificsof algorithms to manipulate the representations.This is because,to a large degree, these specificsare unimportant to frame theory, and emphasison them can be misleading. The major factor controlling the performance of a program in a commonsensedomain is the knowledge that it embodies(e.g., Ref. 43). Thus, it is important for a program to, say, determine that John is likely to eat at a restaurant after being told that John walked into a restaurant. The particulars of how this knowledge is embodied is often of secondaryimportance. An exemplary illustration of this methodologycan be found in Wilensky (44), which contains a set of text comprehension principles for the domain of plan-based stories along with a collection of frame manipulation primitives. The frame manipulation primitives are used in service of the higher level and more important text comprehensionprinciples:
Table 1. Some of Wilensky's Frame Manipulation Primitives Invocation:Initially consideringa frame. Determination: Deciding if enough evidence exists to infer an invoked frame. Elaboration: Filling in a slot of a determined frame. Termination: Inferring that a determined frame is no longer relevant.
The partial list of Wilensky's text comprehensionprinciples, shown in Table 2, attempts to identify exactly what it means for a person or computer to have comprehendedtext and, as such,they tell one what to use the frame manipulation primitives for. For instance, the rejection of the vending machine frame in favor of the video-game frame in O'Rorke's program (35) can be viewed as being consistent with Wilensky's principles of coherence and exhaustion. This is because the vending machine frame cannot coherently explain all of the input, but the video game frame can. The operation of Cullingford's program (31) is similarly consistent with the principle of coherence.Ideally, the mechanics of this process would be describedin terms of the frame manipulation primitives. Notice that all of the principles in Table 2 are concerned with frame recognition (determination). This testifies to the overwhelming importance of accessingthe right knowledge structure, indicating the importance of recognition, and suggesting that it is the structure of the frame system that is crucial to frame theory rather than the structure of a frame. Memory Organization. If one does view frames as data structures, in particular, units of knowledge whose size is larger than a proposition, several questions come to mind: How are frames recognizedor accessed? How large are frames? How are frames used? Where do frames come from? The first two questions fall under the topic of memory organization and are consideredimmediately. The third is postponed until the subsequent section, and the last question is beyond the scopeof this entry. Recognition,Matching, and lndexing. There has been much discussion in the literature of the processesinvolving frame recognition and the accessof higher level knowledge structures. (2,I5,L6,24,34,46-49)As was alluded to in the context of Wilensky's text comprehensionprinciples, frame recognition is of fundamental importance. Despite the fact that humans seemto recognize frames effortlessly, for computer programs it has been quite difficult in the general case.In fact, questions of frame recognition are still very difficult and open questions in AI. Consider the sentence(34): The man sawed the woman in half.
Table 1 contains a partial, but fairly standard (24,45),list of frame manipulation primitives. In Table 1 what has been called frame recogRition is now decomposedinto two components: frame invocation and frame determination. Frame determination represents the latter stage of frame recognition.
For most people this suggeststhe magic act fraffi€, but how? The answer doesnot lie in the nature of the representation for the sentence. Rather, the world happens to be organi zed so that there is only one situation where this event occurs, and one's memory system is able to detect this regularity. The
306
FRAMETHEORY
indexing scheme controlling frame recognition is thus not a function of the represented information [e.g., (saws-in-half (some man) (some woman))l but rather a function of the system's history of experience. Focusing on the problem of frame recognition leads to consideration of basic issues in memory organization. In particular, Schank has been concernedwith the folk psychologyphenomenon of reminding as a clue to the architecture of memory access. Schank (15) has recordedexamplesof memory accessby the use of very abstract indices. For instance, he reported being reminded of the situation of "waiting for an hour in a gasoline line in order to buy one dollar's worth of gas;" this is called the gas line frame. Schank was reminded of the gas line frame after being totd about someonewho waited for 20 min in a postal line to buy one postagestamp (the postageline frame). The postage line frame leads to accessof the gas line frame, presumably on the basis of the index "waiting in a long line to do just a little bit when it would be better to do more" (the inefficient queuing frame). It seemsthat any frame can potentially be used as an index to any other frame. In this case it seems that inefficient queuing was initially an abstraction from the gas line frame at the time one heard about the gas line incident. This abstraction, a frame in its own right, then served as an index to subsequent accessof the postage line frame. The inefficient queuing frame was subsequently reabstracted from the postage line frame, causing the reminding experience.This scenariois shown in Figure 1. Why inefficient queuing was abstracted, rather than something else, is again an open question. How Largeis a Frame?The size of a frame is more closely related to memory organization than one might first suspect. This is because,in humans, the size of a frame is not strictly determined by its semantic content but by other factors as well. One clue as to what someof these other factors might be comesfrom Schank'g (15) analysis of an experiment by Bower, Black, and Turner (50), in which it was demonstrated that there are memory confusionsbetween the waiting room scenes of stories describing doctor's office visits and dentist's office visits. In other words, & doctor's office visit frame is composed of subframes, one of which is the health care waiting room
Inefficientq ueuing
I n e f f i c i e nqtu e u i n g
t I
t3*
!o,
iI
t€s
lil
rg tq Instance Gasline#
lI I
\?,
\ Gasline
Postageline
(a)
(b)
Figure 1. (o) Abstraction createdfrom an experiencedinstance of the gas line frame; the abstraction becomesan index to that instance. (b) Same abstraction as in (o) recreated from a description of an instance of the postageline frame; this serves as an accessroute to the previously indexed gas line instance.
frame. What is the moral? The size of a frame is not dependent on the semantic content of the represented frame (such as doctor'soffice visit) but dependson whether componentsof the descriptive information in the frame (such as the waiting room component) is useful elsewhere in the memory. It appearsthat when some set of knowledge becomesuseful in more than one situation, one'smemory system detectsthis, then modularizes that componentinto a frame in its own right, and then restructures the original frame to use this new frame as a subcomponent. How the system detects when such modularization should take place and how the system actually does such restructuring are, again, open questions of memory organiza' tion. L,52)have conSchank and his associates(15- t7 ,42,48,49,5 structed a theory of frame accessor memory organization on the basis of these considerations. FrameTheory and ISA Hierarchies Although Minsky did not emphasizethe use of ISA hierarchies in his frame paper [Quillain (53,54)was one of the earliest to advocate ISA hierarchies.l, the fact that he was very much concernedwith sharing structure acrossframes so that partial results would not have to be recomputedmade the use of various generalization hierarchies implicit in his proposal. In actual historical fact, it is safe to say that a frame language that doesnot have facitities for somekind of generalization hierarchy is not a frame language. Generalization hierarchies are examined from a number of perspectives.It is via ISA hierarchies that a transition is made from frame theory to frame languages. What are the lnitial Categories?Rosch (55) has suggested that there are three levels of categories: basic; subordinate; and superordinate. In the domain of furniture the concept chair would be an example of a basic level category, whereas the concept furniture would be an example of a superordinate category. The concept lawnchair would be an example of a subordinate category.The knowledge representation language KRL (56) was influenced by this taxonomy and included them as distinct data types. In humans the basic level categoriesare perceptually based and tend to be the first categories that humans learn; the other categoriesevolve from them. That is, superordinate categoriesinitially evolve on the basis of generalization from the basic level categories,and the subordinate categorieswould evolve on the basis of discrimination from the basic level categories.Once formed, the categorieshave a feature space structure that maximizes the similarity between members within a category and the differencesbetween members acrosscategories. Related to this is the notion of prototype. According to prototype theory, exemplars of a category that are near the center of the feature spacecan be viewed as prototypes of that category. For example, a robin is considereda good example of a bird whereas a penguin is not. A robin should occur near the center of the feature spacefor bird and should be a goodprototype for the category. This idea finds its way into frame languages [e.g., NETL (57) and KRL (56)l via the construct of a "typical member." In such a langu&ge, for the frame BIRD, there would be an associateddescription of the typical member; additionally, there may be a description of a few typical instances,such as robin.
FRAMETHEORY
3O7
Sentence1 activates the archeologicaldig frame, and sentence 2 activates the drinking tool frame. Either sentence3a or 3b could be the last sentenceof this story. In the case of 3b the reader is led to view the cup as an archeologicalartifact, which he may not have done before reading the story. For instance, he may have to assign the cup to the found artifact slot of the archeologicaldig frame. If the reader, or John, ever abstracts the category cups found on digs, this new category will have two superordinates.
Statechanges
Slotsand PropertyInheritance.The use of ISA hierarchiesis intimately connected with default reasonittg (qv) as implemented by the mechanism of property inheritance. Usually in frame systems, this means the inheritance of slots that are Viewpoint. Figure 2 showsthe top of an An Epistemological allowed to have default values. For instance, in a frame dataISA hierarchy constructed on purely epistemic grounds in base one might have a frame representing the concept which an underlying premise is that all of the conceptsthat ACTION, which is a specialization of the conceptEVENT, and the system uses can be placed in the hierarchy. At the top of has a slot for the actor of an action. Then if one includes the the hierarchy there is the category "all conceivableideas." The concept WALK, indicating that it is a specialization of first level divides the universe of ideas into objects,states, and ACTION, then WALK will inherit the actor slot of ACTION events.This taxonomy is by no means acceptedfact; it is only a and furthermore will inherit any slots that EVENT happensto first approximation that gets the general picture right. In have. This is illustrated in Figure 4. terms of acquisition, new conceptsare created by specializing, In Figure 4, uppercase identifiers indicate either frame or discriminating, already existing concepts in the system. names or slot names. Note that, in that figure, slots can only This differs from Rosch's (55) framework where the earliest take values of a certain type and defaults may be specified.So conceptsin the system are the basic level conceptsand later for the definition of ACTION, the actor slot can only be filled concepts are the superordinate and subordinate concepts. by a person. Further, ACTION, as does WALK, inherits the Frame languages adhere to the epistemological viewpoint; slots of TIME and PLACE from EVENT with the default valthat is, they require the programmer to hand code an ISA ues of twentieth century and United States. There is another hierarchy starting from the top. partially defined concept,CORONATION, which has its own Although the ISA hierarchy appears as a tree, this is not place slot with a type specification that makes the default necessarilythe case.It could be a lattice, as shown in Figure 3, value inherited from event illegal. This more local specificawhere an ISA hierarchy at the intermediate level is shown. In tion takes precedence.In general, the most locally defined slot this figure, a murder event is both a deliberate action and a is the one that is used. Often there are proceduresassociated kill event, and a deliberate action is both a causal event and with a slot that becomeactivated when the slot changesvalue an action. (these are called "if-added demons") or if the value of a slot Sometimes these lattices are called tangled hierarchies. It needsto be determined (theseare called "if-neededdemons"). is also the case that when comprehending text or interactIn a tangled hierarchy there would be multiple inheritance. ing with the world, it may be necessaryto introduce new en- That is, a frame with more than one superordinate would intries into the ISA hierarchy dynamically, or at least occasion- herit slots from each of the superordinates. ally. Consider story D: The Virtual Copy Concept. Fahlman (57) expressedthe semantics of an ISA hierarchy as that of making a virtual copy of 1. John was on an archeologicaldig. a description located higher in the hierarchy available to con2. He unearthed a cup. cepts lower in the hierarchy. If one learns that Clyde is an 3a. He wondered if it could still hold fluids. (instance cup-1 drinking-tool) 3b. He wondered how valuable it was. (instance cup-l artilframe:EVENT fact) isa:THING Figure 2. What the top of the ISA hierarchycouldplausiblylooklike.
slots: (TIME (a TIME-LOCATION)(default value twentieth century)) (PLACE (a LOCATIONXdefault value United States))l lframe: ACTION isa: EVENT slots: (ACTOR (a PERSON))l
Causalevent
Kill event
Murderevent Figure 3. A tangled hierarchy at the intermediate level.
lframe: WALK isa: ACTION slots: (SOURCE(a LOCATION)) (DESTINATION (a LOCATION)) l [frame: CORONATION isa: EVENT slots: (PLACE (a (COUNTRY with (LEADER (a KING))))) l Figure 4. A Few Simple Frame Definitions.
308
FRAMETHEORY
elephant, one would like to have immediately available an elephant description. Assuming that stored with the frame elephant there is an elephant description, it would be useful to have this description available when one reasonsabout Clyde; whether the description is really copiedor if it is only a virtual copy implemented by property inheritance is immaterial. The inheritance mechanism just described is the most common way of implementing the virtual copy concept. of the Language.SubSubsumptionand the Expressiveness sumption refers to the location of a conceptin an ISA hierarchy. An example of subsumptionis the following. The concept WALK is subsumedby the conceptACTION which is, in turn, subsumedby the conceptBVENT. Often it is desirable to place an incoming description into the ISA hierarchy so that the subsumption relationships between it and the other concepts in the hierarchy are correct (e.g.,Refs. 58 and 59). Brachman and Levesque (60) have shown that a seemingly unimportant choicein the expressivenessof primitives in a frame language can make dramatic differences in the time complexity of a subsumption algorithm. In particular, it appears that when a language reachesa certain threshold of expressiveness,determining subsumption relations for expressions in that language becomesco-NP-complete.It is not known what other types of inferences are sensitive to expressivenessof the underlying language, nor how this phenomenonshould influence the design of future languages. Cases,S/ofs, and Predictability. A number of researchers have suggested (2L,6L,62)that the cases of linguistic case grammar (qv) (5) and the slots of frame theory are one and the same; this has been termed the "case slot identity" theory. Linguists have posited some small number of semantic cases, somewherebetween 8 and 20 but have never been able to come to agTeementon the exact number. Shown in Figure 5 is a partial list of casesadapted from Winston (21). Someof the cases,such as agent, are clearly indisputable as being general enough to classify as a linguistic case,but others such as sourceare arguable, and something like raw material is doubtful. According to the caseslot identity theory, casesof linguistics correspond to the slots of frame languages. The more distinctive casescorrespondto slots that go with concepts at the top of the ISA hierarchy, whereas the more debatable casescorrespond to slots that go with fairly specializedconcepts.This seemsto explain why linguists have had such trouble deciding on exactly how many casesthere really are. A related line of reasoning comesfrom Sommers'(63) and Keil's (64) ontological hierarchies. These are constructed in accordancewith a notion of predicability, For instance, Sommers and Keit argue that both "man" and "fled'could have the property, or predicate, "is alive," but only "man" could have the predicate "is honest." In frame theory one might say that a frame "normally living thing" has a binary valued slot "is alive," the frame "man" has a binary valued slot "is honest," and the frame "flea" would not have this slot, nor any way to inherit it.
ProposedCase
ExamPle of use.
Agent Instrument source Destination Raw materiat
John broke a window' John broke a window with a hammer' John went from New York to san Francisco. John went home. John made a bird out of plastic.
Figure 5. A partial list of linguistic cases.
In short, there are largely unexplored connectionsbetween slots in frame theory and related constructsin linguistics (e.9., Ref. 5), philosophy(e.g.,Ref. 63), and psychology(e.g.,Ref. 64). OrganizationalFeaturesUsed in FrameLanguages As mentioned earlier, Minsky's frame paper is associatedwith two related traditions of research: higher-level knowledge structures and frame languages.Most of this entry has so far emphasizedhigher-Ievel knowledge structures and the discussion is now turned to data structures and organizational features commonly found in frame languages. Among the earliest frame languages were FRL (frame representation language) (65,66)and KRL (knowledgerepresentation language) (56,67-69). In FRL, as in subsequentframe languag€s, the primary data structure is of type "frame" (or "unit," or something equivalent.) Instances of this type are approximately record structures, in which the field names are called slots, the values are called terminals, and often the slots have default values. Furthermore, the frame definitions are embeddedin an ISA or part-of hierarchy, in which various kinds of inheritance is allowed, such as the inheritance of slots. In short, a frame language will generally take the form of something like record structures with typed fields containing default values, and the record structures are embeddedin an ISA hierarchy, as describedearlier. One difficulty that has troubled many knowledge representation languag€s,including frame languages,is the lack of a formal semantics. This has made it difficult to compare the knowledge representation features of different frame languages. Consequently, the discussion of frame languages is based on comparing them with predicate calculus (see Logic, predicate), a well-known and well-understood standard. Although a frame language may seem exotic, especially if it was inspired by Minsky's frame paper, it is often easier to translate into predicate calculus than one might expect. Slotsin Framesand Functionsin PredicateCalculus.As a first step toward translating a frame language into predicate calcuIf lus, one can make the following observation (1-9,28,70,71). an instance of a frame correspondsto an object in some domain, the frame correspondsto a predicate and its slots correspondto term-creating functions; both the predicate and functions take the instance as argument. Consider a frame for family as shown below: [frame: FAMILY slots: MOTHER-OF (a PERSON)) (FATHER-OF (a PERSON)) (CHILD-OF (a PERSON))l Then, if the identifier FAMILY -2L is taken to denotea particular instance of the frame FAMILY, one can assert this by using the expression (FAMILY FAMILY-zI) One can apply the slot identifiers as functions to create representations of the mother, father, and child of the particular family denoted by FAMILY-?L. Additionally' one can use equality assertionsto assert identity as shown below. (MOTHER-OF FAMILY -21) Carol) (FATHER-OF FAMILY-21) HenrY) JohnnY) (CHILD-OF FAMILY-2I)
FRAMETHEORY
The above is a simple example, but much more needsto be done to fully translate a frame language into logic. For instance, in place of the equality statement above, a frame language is likely to use the assignment operator, thus treating "Carol" as the value of the functional term "(MOTHER-OF FAMILY-21)." Doing so leads to very difficult knowledgerepresentation problems that fall under the name "referential versusattributive distinction" (e.g.,seeRefs.72-76). Also, the semantic content of slot-specificheuristics must be translated into inference rules with special control information. For instance,if a slot has an associated"if-added" demon,the demon must be translated into a forward-chaining inference rule; if a slot has an associated"if-needed" demon, it must be translated into a backward-chaining inference rule (see Demons). This requires translation into a logic programming language rather than into pure logic. Further complications stem from the fact that frames are usually embeddedin an ISA hierarchy so that slots are inherited; slots may have multiple values and so cannot be strictly treated as functions; slots may have default values, which require an extension of the standard monotonic logic; and type checking is done on slot values. FormalizingProperty Inheritance.In spite of all this, some work has been done on formalizing and translating the property inheritance of frame languages into logic or a logic programming language. Etherington and Reiter (77) have formalrzedgeneric property inheritance for frame languag€s,such as Fahlman's (57) NETL, by the use of default logic. Tranchell (78) showed how to encodethe inheritance structures of KLONE in the logic programming language SNePS(e.g.,seeRef.
7e).
309
There is an object-orientedlanguage, called Loops (91), that incorporates objects,logic programming, and procedures. There are Uniformity, Coherence, and Expressiveness. framelike languagesthat are in part inspired by the principles The principle of of uniformity, coherence,and expressiveness. uniformity embodiesthe maxim that all conceptsare made of the same "stuff," and hence a knowledge representation language should use exactly one data type to represent concepts. The principle of coherenceis based on the intuition that related conceptsin the mind are "well knit" and not fragmented. The principle of expressivenessembodies the intuition that any concept conceivable by the human mind should also be representable in a knowledge representation language. Although these principles are well known and commonly receive lip service, it appears that the present-day limiting factors in the construction of intelligent systems involves putting massive quantities of information in the system (surfacerepresentation), but the quality of representation of the represented information (deep representation) is a minor factor. In other words, highly expressive, coherent, and uniform knowledge representation languages are not yet cost-effectivein applications, although, theoretically they are more justified. One of the earliest framelike languages to take these principles seriously was OWL (74-76,92). More recently, there have appeared languages, KODIAK (93) and UniFrame (94), that have been concernedwith these epistemologicalissues. These languagestake as their starting point that there should be exactly one data type in a frame-basedmemory, & concept, and not two (frames and slots), and that the representation of conceptsin memory should be well knit.
Object-OrientedLanguages.Parallel to the developmentof frame languages has been the development of object-oriented programming languag€s,such as Smalltalk (95) and Lisp Machine Flavors (96). (The name "flavors" was allegedly inspired by an ice cream parlor located in the vicinity of MIT.) It is often asked how frame languages differ from object-oriented programming languages.They are quite similar, differing primarily in emphasis. An object-oriented programming language is viewed as a practical programming language able to compete with standard programming languag€s, whereas a frame language tends to be either a research tool or a language to be used in the construction of AI databases. An object-orientedprogramming language useshierarchies of classesof objectsthat have associatedslots for "state variables" and "methods" (proceduresfor manipulating the state variables and objects). The methods and slots are inherited down the class hierarchy. The user can declare some object to Hybrid Systems.Frame systems sometimesare adapted to be an instance of some class. The effect is that the object acquires its own set of state variables and has accessto the creating rich descriptionsor definitions (e.g.,Brachman'sKLONE in Ref. 86), rather than for encoding assertions. Given methods associatedwith its class. An illustrative application of an object-orientedprogramthis, Brachman and Levesque (87) and Brachman, Fikes, and Levesque (88) have developed a hybrid language, called ming language is in object-orientedsimulations, such as video KRYPTON, that consists of a frame component to provide games. For instance, it is easy to define different classesof terms and predicates and a predicate calculus component to monsters and embed these classes into a hierarchy so that common properties can be inherited, just as in frame lanmake assertionsinvolving the terms and predicates.A similar partitioning is used by Charniak's (89) FRAIL (FRame-based guages.An object-orientedprogramming language also allows AI Language). In this language a frame can have a set of the user to define multiple instances of any number of these predicate calculus facts associatedwith it, and the facts make classes,giving each monster his own state information. Each reference to the slots in the frame. When the frame becomes classcan have associatedmethods (either directly or by inheriactive, the facts becomeavailable to the inference engine. Rich tance) to erase elements of the class,display elements of the (71) has exploredsimilar ideas,as have Allen and Wright (90). class,and move elements of the class.With such a set up, one
Often one will hear the admonition, usually made in a negative vein, that a frame language (or some other class of knowledge representation language) is at best merely a notational variant or syntactic sugar for some dialect of logic and at worst some vague, worse than uselessinternally inconsistent language. However, one can look at this positively. One can think of frame languages as very high level knowledge representation languages that should be given a formal semantics and translation into somelogic (80-82). This offersan important payoff in terms of implementability; to implement a new frame language, one need only supply a compiler that compiles the language into some existent logic programming language (e.g.,Refs.79, 83, and 97).Work on compiling frame languages into lower level languages has been done by Greiner and Lenat (84) and Westfold (85).
310
FRAMETHEORY
can repeatedly send messagesequencesto each of the monsters the content of which indicates eraseyourself, move yourself, display yourself, thus causing the video screen to simulate a world of several autonomous creatures moving around (the monsters will save their state information between messages).This could be done in a frame language also, but execution would probably be much slower becausethe frame language is attempting to seek more generatity. SummingUp Frame theory is really a vague paradigm that inspired an era of research on intelligent systems and a host of frame languages. The research issues have evolved with very little of the specifics of Minsky's original suggestions survivirg, in part because Minsky's suggestions were for the most part interesting lines of argument rather than specific proposals. What used to be called frame theory is probably, at present, most actively developedby Schank's group at yale under the name of memory organization. Frame languageshave evolved into hybrid systems,consisting of a predicate calculus component and a frame component.The frame componentis used to define predicates and terms for use by the predicate calculus component.Consistent with this, there seemsto be a promising line of development where frame languagesmay be implemented by compiling them into codefor logic engines.What is persistently useful in frame languages will probably find its way into conventional programming languages via the route of object-orientedprogramming languages(seeLanguages,object oriented).
BTBLIOGRAPHY 1. M. MinskY, A Framework
for Representing Knowledgr,
Artificial
Intelligence Memo 306, MIT AI Lab, I974. 2. P. Winston, The Psychology of Computer Vision, McGraw-Hill, New York, L975. 3. J. Haugeland (ed.), Mind Design, MIT press, cambridg", MA, 1 9 81 .
13. R. Schank, Identification of ConceptualizationsUnderlying Natural Language, in R. Schank and K. M. Colby, (eds.), Cimputer Models of Thought and Languag€,W. H. Freeman, San Francisco, 1973. 14. R. Schank and R. P. Abelson, Scripts,Plans, Goals, and (Jnd,erstonding, Erlbaum, Hillsdale, NJ, lg7z 15. R. Schank, Reminding and Memory Organization:An Introd,uction to MOPs, Yale University, Department of Computer Science, ResearchReport #I70, December1929. 16. R. Schank, "Failure-driven memory," Cog. Brain Scj. 4, 4L-60 (1 9 8 1 ) . L7. R. schank, Dynamic Memory: A Theory of Remind,ingand,Learning in Computers and People, Cambridge University Press, New York, 1982. 18. J. Moore and A. Newell, How can Merlin Understand?,in L. W. Gregg (ed.;, Knowledge and cognition, Erlbaum, potomac, MD, 1973.pp, 201-252. 19. E. Charniak and D. McDermott, Introd,uction to Artificial Intettigence,Addison-Wesley,Reading,MA, 1995. 20. J. Sowa, ConceptualStructures: Information Processingin Mind and Machine, Addison-Wesley,Reading, MA, 1gg4 2I. P. Winston, Artificial Intettigence (2nd ed.), Addison_Wesley, Reading, MA, 1984. 22. A. Barr and E. Feigenbaum, Handbook of Artifi.cial Intettigence, Vol. I, Kaufman, Palo Alto, CA, 19g1. 23. N. J. Nilsson,Principles of Artificial Intelligence,Tioga,Palo Alto, cA, 1990. 24. B. Kuipers, A Frame for Frames, in D. Bobrow and A. Collins (eds.),Representationand (Jnderstanding, Academic Press, New York, 1975,pp. 151-184. 25. I. Biederman,Extractingon the Semanticsof a Glanceat a Scene, in M. Kubovy and J. R. Pomerantz(eds.),PerceptualOrganization. Erlbaum, Hillsdale, NJ, 1991. 26. T. E. weymouth, J. S. Griffith, A. R. Hanson,and E. M. Reisman, "Rule basedstrategiesfor image interpretation ," AAAI -85,Washington, D.C., 429-482, 1989. 27' R. Fisher, "Using surfacesand objectmodelsto recognizepartially obscured objects," Proc. of the Eighth IJCAI, Karlsruhe, FRG, ggg-995, 1993. 28. E. Charniak, "A commonrepresentationfor problem solving and language comprehensioninformation ," Artif. Intell. 16, 2zi-Zls (1981).
4. F. C. Bartlett, Remembering:A Study in Experimental and Social Psychology,The University Press, Cambridge, UK, lg}2, revised 29. 1961. 5. c. Fillmore, The case for case, in E. Bach and R. Harms (eds.), Uniuersals in Linguistic Theory. Holt, Rinehart, & Winston. New 30. York, 1968.
6. C. Fillmore, An Alternative to Checklist Theories of Meaning, in Coger (ed.), Proceedings of the First Annual Meeting of the Birkeley Linguistics Society, Institute of Human Learning, Berkeley, CA, L975,pp. 123-131.
7 . E. Goffman, Frame Analysis, Harper & Row, New york, 8 . Reference 6, p. 130.
Lg74.
9 . R. P. Abelson, The Structure of Belief Systems, in R. C. Schank and K. M. Colby (eds.), Computer Models of Thought and Lan_ guage, W. H. Freeman, San Francisco, 1928.
1 0 . M. Minsky and S. Papert, Progress report on Artificiat Intelligence, MIT AI Lab Memo 252, Lg7Z.
1 1 . A. Newell and H. Simon, Human Problem Soluing, prentice-Hall EnglewoodCliffs, NJ, L972. 12. D. Norman, Memory, Knowledg., and the Answering of euestions, in R. L. Solso (ed.;, ContemporaryIssuesin Cognitiue psychology: The Loyola Symposium, w. H. Freeman, San Francisco, 1973.
E. Charniak, Towards a Model of Children's Story Comprehension, Ph.D. Thesis and AI Lab Technical Report 266,MIT, cambridge, MA, 1972. R. schank, "using knowledge to understand," in R. schank and B. Nash-Webber, (eds.), Theor. /ss. Nat. Lang. proces.,I (L975),distributed by the Association for Computational Linguistics.
31. R. Cullingford, Script Application: Computer (Jnd,erstand,ingof Newspaper Stories, Report 116, Yale University Department of Computer Science,1928. 32. R. Schank and C. Reisbeck, Inside Computer (Jnderstand,ing, Erlbaum, Hillsdale, NJ, 1981. 33- A- Collins,J. S. Brown and K. M. Larkin, Inferencein Text Understanding. In R. J. spiro, B. c. Bruce, and w. F. Brewer (eds.), Theoretical .fssuesin Reading Comprehension Erlbaum, Hillsdale, NJ, 1980. 34. E. Charniak, "With spoonin hand, this must be the eating frame," Theor. /ss. Na.t. Lang. Proces.Z, IB7-19B (1gZS). 35. P. O'Rorke, "Reasonsfor beliefs in understanding: Applications of non-monotonic dependencies to story processi.g,,' AAAI -gS, Washington, DC, 30G-909,August 1983. 36. P. Norvig, "Frame activated inferences in a story understanding
FRAMETHEORY program ," Proc. of the Eighth IJCAI, Karlsruhe, FRG , 624-626, 1983. 37. J. Doyl€, "A truth maintenancesystem,"Artif. Intell. 12, g1-z7z
(1e7e).
38. Reference19, p. 598. 39. E. Charniak, Cognitive Scienceis MethodologicallyFine, in W. Kintsch, J. Miller, and P. Polson (eds),, Methods and Tacticsin CognitiueScience,pp. 263-276. Erlbauh, Hillsdale, NJ, 1984. 40. R. Granger, "FOUL-UP: A program that figures out meanings of words from context," Proc. of the Fifth IJCAI, Cambridge, MA, L72-r78, L977. 4L. P. Hayes, "On semantic nets, frames, and associations."Proc. of the Fifth IJCAI, Cambridge, MA, 99-L07 , t977 . 42. G. DeJong, "A new approach to natural language processirg," Cog.Sci.,3(3), 25I-273 (1979). 43. A. Newell, "The knowledgelevel,"Artif. Intell. 18, 8T-Lz7 (1982). 44. R. Wilensky, Planning and Understanding: A computationatApproach to Human Reasoning, Addison-Wesley, Reading, MA, 1983.
311
65. I. Goldstein and B. Roberts, "NIJDGE: A knowledge basedscheduling program," Proc. of the Fifth IJCAI, cambridge, MA, 287263, t977. 66. R. Robertsand I. Goldstein, The FRL Manual, MIT-AI-LAB Memo 4A9,Cambridgu,MA, 1977. 6 7 . D. Bobrow, R. Kaplan, M. Kuy, D. Norman, H. Thompsonand r. Winograd, "GIJS: A frame-driven dialog system,"Artif. Intell. g, 1 5 5 - 1 7 3( L 9 7 T . 68. w. Lehnert and Y. wilks, "A critical perspectiveon KRL," Cog. Sci. 3, 1-28 (1979). 69. D. Bobrow and r. winograd, Scl. 3,29-42 (1979).
"KRL: Another perspective," cog.
70. U. Reimer and U. HahD, "A formal approach to the semantics of a frame data model," Proc. of the Eighth IJCAI, Karlsruhe, FRG,
337-339,1983.
7L. C. Rich, "Knowledgerepresentationlanguagesand predicatecalculus: How to have your cake and eat it too," AAAI -g2, pittsburgh, PA, 193-196, 1982. 72. K. Donnellan, "Referenceand definite descriptions,"Phitos.Reu. 75,28L-304 (1966). 73. R. Moore, "D-SCRIPT: A computational theory of descriptions," Proc. of the Third IJCAI, stanford, cA, zzg-229, 1g7g. 74. W. A. Martin, Descriptionsand the Specializationof Concepts,in P. Winston (ed.),Artificial Intelligence,An MIT Perspectiue. MIT Press,Cambridg", MA, 1g?9. 75. W. A. Martin, Roles, Co-descriptors,and the Formal Representation of Quantified English Expressions,MIT Laboratory for Computer Science,TM-139, Cambridge,MA, Lg7g. 76. w. A. Martin, "Roles, co-descriptors,and the formal representation of quantified English expressions(abridged)," Am. J. Computat.Ling. 7, L37-I47 (1981).
45. S. Fahlman, Frame Verification (pp. 264-zG7 in Minsky's unabridged article), A Framework for RepresentingKnowledgr, in P. Winston, The Psychologyof Computer Vision, McGraw Hill, New York, pp. 21L-277. 1975. 46. S. Fahlman, A Hypothesis-FrameSystem for Recognition Problems, Working Paper 57, MIT AI Lab, I974. 47. Reference14, Chapter 2. 48. J. Kolodner, "Maintaining organization in a dynamic long-term memoryl' Cog.Sci. 7(4), 243-280 (1989). 49. J. Kolodner, Conceptual Memory: A Computational Model, Erlbaum, Hillsdale, NJ, 1984. 50. G. Bower,J. Black, and T. Turner, "scripts in text comprehension 77. D. Etherington and R. Reiter, "On inheritance hiearchies with and memoryi' Cog.Psychol. ll, I77-220 (1979). exceptions,"AAAI-83, Washington, D.C., L04-L08, 1983. 51. J. Kolodner, "Reconstructive memory: A computer model," Cog. 78. L. Tranchell, A SNePS Implementqtion of KL-One, Technical ReSci.7,28I-328 (1939). port 198, Department of Computer Science, SUNY at Buffalo, 52. M. Lebowitz, "Generalization from natural language text," Cog. L982. Sci. 7, I-40 (1983). 79. S. Shapiro, The SNePS Semantic Network ProcessingSystem, in 53. M. R. Quillian, Semantic Memory, in M. Minsky (ed.),Semantic N. V. Findler (ed.),AssociatiueNetworks: Representationand [Jse Information Processing,MIT Press, Cambridge, MA, 1968. of Knowledge by Compu.ters,Academic Press,New York, L7g-20s, 54. A. Collins and M. R. Quillian, "Retrieval time from semantic 1979. memory:' J. verb. Learn. verb. Behau.8,240-247 (1969). 80. P. Hayes, "In defense of logic," Proc. of the Fifth IJCAI, Cam55. E. Rosch, "Cognitive representations of semantic categories,"J. bridge, MA, 559-565, L977. Exper. Psychol. lO4, 192-233 (1975). 81. D. McDermott, "Tarskian semantics,or no notation without deno56. D. Bobrow and T. Winograd, "An overview of KRL-O, a knowledge tation," Cog.Scl. 2, 277-282 (1978). representationlanguage:' Cog. Sci. 1, g-40 0g7T). 82. D. McDermott, Artificial Intelligence Meets Natural Stupidity, in 57. S. E. Fahlman, NETL: A Sys/emfor Representingand Using RealJ. Haugeland (ed.), Mind Design, MIT Press, Cambridge, MA, World Knowledgu, MIT Press, Cambridge, MA, lg7g. 1 9 81 . 58. J. Schmolzeand T. Lipkis, "Classificationin the Kl-one knowl83. W. F. Clocksin and C. S. Mellish, Programming in Prolog, edge representation system," Proc. of the Eighth IJCAI, KarlsSpringer-Verlag,New York, 1981. ruhe, FRG, 330-332, 1983. 84. R. Greiner and D. Lenat, "A representation language language," 59. T. Lipkis, A KL-One Classifier,in J. G. Schmolzeand R. BrachAAAI-80, Stanford, CA, 165-169, 1980. man (eds.),Proceedingsof the 1981KL-One Workshop,pp. 12885. S. Westfold, "Very-high-level programming of knowledge L45, 1981,Bolt, Barenek and Newman fnc., Cambridg", MA. representation schemes," AAAI-84, Austin, TX, 844-849, 60. R. Brachman and H. Levesque,"The tractability of subsumption 1984. in frame-baseddescriptionlanguag€s,"AAAI -84, Austin, TX, g486. R. Brachman, On the Epistemological Status of Semantic Net37, Lgg4. works, in N. V. Findler (ed.), AssociatiueNetworks: Representation 61. C. Fillmore, The Case for Case Reopened,in P. Cole and M. Saand Use of Knowledge by Computers,Academic Press,New York, dock (eds.),Syntax and Sernantics,Vol. 8, Grammatical Relations, 1979. AcademicPress,New York, 159-181, L977. 87. R. Brachman and H. Levesque,"Competencein knowledge repre62. E. Charniak, "The case-slotidentity theoryi' Cog. Sci.5,2}b_zgz sentation," AAAI-82, Pittsburgh, PA, 189-L92, 1982. (1981). 88. R. Brachman, R. Fikes, and H. Levesque,"KRYPTON: Integrat63. F. Sommers,"Structural ontology,"Philosophia,L, zr-42 (1971). ing terminology and assertion," Proc. AAAI-83, Washington, D . C . ,3 1 - 3 5 , 1 9 9 3 . 64. F. Keil, Semantic and ConceptualDeuelopment:An Ontological Perspectiue,Harvard, Cambridge, MA, 1g7g. 89. E. Charniak, The FraillNasl ReferenceManual, Technical Report
312
GAME PLAYING
CS-83-06,Department of Computer Science,Brown University, 1983. 90. B. P. Allen and J. M. Wright, "Integtating logic progTamsand schemata," Proc. of the Eigth IJCAI, Karlsruhe, FRG, 340-342, 1983. 91. M. Stefik, D. G. Bobrow, S. Mittal and L. Conway, "Knowledge programming in LOOPS," AI Magazine4(3), 3-13 (1983).
stereotypes,and instantiation aids, in contrast to other framerepresentation languages. The language designers used an earlier version, FRL-O, to implement NUDGE, I system used to understand incomplete and possibly inconsistent management-scheduling requests (seeR. B. Roberts and I. Goldstein, The FRL Primer, Report AIM-408, AI Lab, MIT, Cambridge, MA, Lg77).
92. P. Szolovits,L. Hawkinson and W. Martin, An Oueruiewof OWL, a Language for Knowledge Representation, MIT/LCS/TM-86, MIT K. S. Anone Laboratory for Computer Science,Cambridge, MA, L977. SUNY at Buffalo gB. R. Wilensky, KODIAK: A Knowledge RepresentationLanguage, in Proceedingsof the Sixth Annual Conferenceof the Cognitiue ScienceSociety,Boulder, CO, June 1984, 344-352. 94. A. Maida, "Processingentailments and accessingfacts in a uniform frame system," AAAI-84, Austin, TX, 233-236, 1984. gb. A. Goldberg and D. Robson,Smalltalk-8}: The Language and its FRUMP Implementation, Addison-Wesley,Reading, MA, 1983. and summatrzl;ng pro96. D. Weinreb and D. Moon, The Lisp Machine Manuol, MIT Press, A script-driven newspaper skimming gr&ffi, fast reading and understanding memory program Cambridge,MA, 1981. the Yale AI Project. Once 97. P. Hayes,The Logic of Frames,in D. Metzitg ("d.),Frame Cancep- (FRUMP) was written by DeJong at story looking for the news the it skims a script is decided, tions and Text (Jnderstanding,Walter de Gruyter, Berlin, PP. 46expectedwords to fill the holes in the script. [see G. DeJong, 61,1979.
ty f;,,1#.1?lurrur"univers FRL
Skimming Stories in Real Time, Doctoral dissertation, Yale University, New Haven, CT, 1979 and G. DeJong, An Overview of the FRUMP System, in W. G. Lehnert and M. H. Ringle (eds.), Strategies for Natural Language Processing, Lawrence Erlbaum, Hillsdale, NJ, L982,pp 149-1761.
A frame-oriented representation language developedaround 1977 by Roberts and Goldstein at MIT. FRL stressesdemons,
GAMEPTAYING An important part of AI research lies in efforts at understanding intelligent behavior as opposedto simulating it for solving a specific problem in an applications domain. For this former area of research games remain an excellent metaphor. A considerablebody of knowledge exists in mathematics about the property of games;there has not been enough interaction betweenthis knowledgeand AI research.One reasonis that it has been proven (1) for games like chess,checkers,and go that no game-playing strategy can exist that remains efficient over larger and larger boards (see Game trees). As a result, AI research in games has been restricted to two extreme viewpoints. On the one hand, efforts are made to incorporate knowledge of specificgames (e.g.,chesson a standard board) into the progTam.At the other end of the spectrum, one restricts one's attention to methods of search (qv) reduction. There are classesof games, however, for which methods of efficient play can be developedand used. Such techniques are often applicable over a wide class of seemingly unrelated games. Study of such classesyield insight into the notion of similarity and analory, activities of recognizedvalue in automatic learning of problem-solving and game-playing strategies. Such studies form an important third approach to the study of games.
A. HaNYoNG YUHaN SUNY at Buffalo
In what follows all of these aspects of AI research into games are discussed.The secondand third approachesare discussedin somedetail since a general body of knowledge exists for these. For the first approach the reader is directed to spectalized treatises and papers Q-4). In the next section some formal definitions are made to facilitate the later discussion. MathematicalFormulation In the original, most general definition (5), a game is characterized by the set of all sequenceof plays possible,as made by N players, and by the payoffs to the N players correspondingto each sequence.Each play reduces the set of sequencesto the subset that has that play as the initial one. The moves thus characterize a partition on the set of sequences.However, since all the players do not necessarily know what play was made (e.g., in Kriegspiel or bridge), the players' knowledge restricts the set to some superpartition of this partition. Most of the work on gamesin the field of AI has been in the caseof two-person games with complete information and with alternating moves, although some work on bridge and poker have been reported. As a result, one obtains a considerable simplification on the structure on the partitions over the set of all play sequences.A set of nested subpartitions result as the
"t
\
I I t
I
(b)
(a)
Figure 1. Game with 16 possibleplays. (a) Partition representation of von Neumann and Morgenstern (5). The game being with complete information, two persons, and alternate moves, the tree representation (6) is also possible.Ifthe secondplayer is not allowed to know the first player's first move, after his move he would not know where the play is and would merely have the play localized in either the set enclosedby the dotted line in (o) or its complement. Ifthe player on move is determined by rules other than alternation, more subsetswould appear in (o) and the simplicity ofthe tree representation would be lost.
representation of the plays, and one can analyzethe games in terms of trees. Figure 1 indicates the relationship between the game tree and the partitions on the sequencesrepresented by them. It also shows the kind of complications introduced by incomplete information that makes the game tree less useful for general N-person games. Most of the analyses of games in AI have been in terms of game trees (qu). In these trees each node representsa classof possible continuations of the game. However, one could also considerthe node to represent the history of past moves.From the latter point of view, each arc of the tree representsa move
by a player. The node also restricts how the rest of the play is allowed to continue. of course,two distinct histories of moves do not always restrict the possiblecontinuations in distinct ways. For instance, in chess, the sequenceP-K4, KI-KB3, KI-KB3 leads to the same situation as Kt-K83, KI-KB3, P-K4. Many authors have considered it meaningful to treat the two resulting nodes in the tree as equivalent and representedby the board configuration producedby them. This identifies two nodesof the tree as a single node, and as a result, the structure becomesa graph rather than a tree (seeFig. 2) The nodesof this graph may be
Kt.KB3
Kt- KB3 Kt.KB3
K t .K 8 3
Kt.KB3 Kt.KB Kt.KB3
(a)
(b) R Kt B o K B R P P P P P P P
P P P P P P P P o K B R
R Kt B
(c) Figure 2. (o) Two distinct move sequencescan determine the same constraint on possiblecontinuations and are, in that sense,"equivalent." By identifying the equivalent nodes of the game tree, one obtains the game graph (6). In the case of most games these equivalent nodes do indeed represent identical "game board configurations" (c), giving concrete meaning to the nodes of the g.aph.
314
GAME PLAYING
considered to be represented by the configuration of pieces on the board together with a specification as to who is on move. The latter specification may or may not be uniquely determined by the node of the graph. Such subtleties need not interest us here. The interested reader will find these discussionsin Ref. 6. For our purposeswe shall take the formalization where each node can be consideredfrom the point of view of either player being on the move. Conway (7 -9) abstracts the graph away by defining a game (a game node in the aboveterminology) to be given by two sets of games (i.e., game nodes),to wit the ones that one player could reach if he was on move and the ones that the other player could reach if the other player was on move. As Conway would put it, "A game is an ordered pair of two sets of games." The Conway approach has led to the development of a new area of nonstandard number theory and unified some known theories of impartial gameswith this extendednumber theory. However, so far a clear way has not been found to use their results to develop new winning strategies of known interesting games. This theory shall therefore not be discussedany further. However, the interested reader is urged strongly to look into the work of Conway and his colleagues (especially Ref. 8) for some extremely exciting and amusing, albeit occassionally strenuous reading. Strategies For the rest of this overview the important problem of concern is given a node in a given game, how doesthe player on move assure a win if possible? A number of considerations arise in determining how such an assurancecan be obtained. First, the only decision available to the player is the choiceof a move available at the node. The opponent'smove leads the game from the resulting node to a node where the player has to make another choice.This choice cannot be made beforehand: Until the opponent has played, the player doesnot know what the resulting node will be. The player's decision cannot be with respect to a single
move or even a sequenceof moves.He or she has to decide,not on a move,but on a method by which, given a node,a move can be chosen. In mathematical parlance, one needs a function mapping nodesto moves. Such a function is called a strategy. A winning strategy is one whose repeated application, one at each node where a choiceis needed,leads to a win whenever a win is possible. This conceptof strategy in gameshas a very closerelationship with the same conceptin game theory as studied in Economics.For a discussionthe reader should seeMinimax procedure. Suffice it to say here that in a two-person game with completeinformation and alternating movesthe calculation of strategy (albeit still inefficient; seeabove)becomesmuch simpler than in the general case.The general method applied to the case of Figure 1 would lead to the calculation of a 32 x L024 matrix. Only the simple special caseis discussedhere. If one unfolds the game graph into a tree, one obtains a method of calculating the winning strategy by a method that, with some modifications (see below), has remained the only error-proof method known. The method can best be described in terms of a recursive definition: If the node is a leaf (end of game),the game is already won or lost, and its value is the value of the node.No move needs to be made. If the node is the player's move, find the value of each node to which one can move. Make the move that maximizes this value. The value of the node is this maximum value. If the node is the opponent'smove, the value of the node is the minimum value among the nodes that can be reached by a move. Figure 3 shows the value of a node. The optimum player's move from a node and for all the nodesreached from it by the optimal move of each player follow the leftmost branch sequence. This method of evaluation and strategy construction is called the minimax method. The trouble with this method of
Figure 3. Minimax value of nodesin a game tree. The leaves are evaluated by some intermediate evaluation function. The leaves whose values are enclosedin circles are the only ones that need to be evaluated by an alpha-beta algorithm. See text.
GAME PTAYING
finding the stratery is the amount of calculation involved. Early estimates about chessrevealed that such a calculation made from the starting position of chess would involve the generation of about 732nodes.The most optimistic guessabout the speedof calculation and spaceused by memory still yields the fact that such a calculation would take millenia to calculate on an impossibly large machine. Thus, playing chess"optimally" is not feasibleusing such a naive search technique. People do play chess well; nobody knows if anybody has ever played optimal chess.See Refs. 2 and 3 for a discussion of how machines can be made to play acceptablechessand how it compareswith human play. Some of the principles on which optimal and suboptimal game playing can be basedare describedbelow.
315
return her to a node where the number of sticks is not a multiple of four.
Kernels. There are games where the winning nodes and winning strategies can be identified without search.The class of such games is not a trivial one: Various rather complicated (but efficient) techniques are known for calculation of the values of nodesin such games. See Refs. 6 and 8 for many examples. In what follows, the technical term "kernel" calculation means the calculation of winning and losing nodes.(In precise terms, the losing nodesare said to form the kernel of the game graph.) Such techniques, often called knowledge-basedtechniques in AI, have also been developed(albeit with lessersuccessand precision) for chess, yielding methods independent of minimax. See Pitrat (10) and Bratko (11) for examples. SearchReduction:Kernelsand Evaluations The minimax technique "beats" this classof methods in one Any method of game playing has to be basedon someprinciple way, of course. Minimax works for all games. These gamethat allows one to reduce the amount of searchinvolved in the specificmethods work only when someoneis clever enough to calculation of strategy. Such methods have been developed describe the winning nodes. What one needs is some method both in AI and in the mathematical theories of games. The whereby the computer, given the description of the game, can latter are describedfirst, since the former are somewhat more develop the description of the set of winning nodes by itself. Samuel's checker-playing program (L2) succeededin doing difficult to justify in any precise manner. this to an extent, and other programs have achieved similar The set of all nodes in the game graph whose minimax (seebelow). value is 1 are called the winning nodes and the others the successes (for There are times when one can approximate the calculation of losing nodes the time being, draws are not considered). The minimax principle can be stated in terms of these sets of winning nodes.The method for one such calculation was develnodes:A nodeis winning if there is at least one nodeconnected opedby Koffman (13) and by Citrenbaum (14) for another wide to it that is a losing node; a node is losing if all the nodes class of games which, for want of any better name, is called connected to it are winning nodes; and terminal nodes are positional. These include such trivial games like tic-tac-toe and includes more difficult games like Hex and Go-Moku. The winning if their value is 1. Consider the property of being winning. Of course,every game 4 x 4 x 4 tic-tac-toe ("Qubic," as it is often called) is leaf node of the game tree (terminal node of the game graph) another nontrivial member of this class. The major strategy for this class of games is the formation has this property if its value is 1. Also, if a node lacks this of forks. Indeed, in some senseany winning node in any game property, every node connectedto it has this property. Again, each node having this property is connectedto at least one is a fork, but in the positional class of games the fork has a clear visual significance that can be very efficiently reprenode lacking this property. It can be seen that if a property of nodes,which is easy to sented in a computer. The most elementary fork, of course,is calculate, is shared by all leaf nodeswith value 1 and has the when one encounters a board position like in Figure 4. There above characteristics, the nodes having the property are pre- are two lines of squares each of which has two empty squares, cisely the winning nodes.Having minimax value 1 at a play- one of which is commonto the two lines. In his turn, X can play er's move is one such property. However, to check this prop- at this common intersection and produce two potentially winerty, one has to search the whole game tree. In many special ning lines, and the opponentcannot block both. games,however, these properties can be calculated in terms of The conceptof a force can be pushedback much further,14 the board configuration in the node itself or just of a few or 15 move deepforcesof this nature can be recognizedon a gomoku board just by the configuration of pieces alone (15). nearby nodes. A standard example is the following simple take-away game played in elementary school mathematics classes. There is a pile of sticks on the table. Each player in his turn removes at least one and no more than three sticks from the pile. The first player who cannot move (i.e., facesan empty table) loses.It is clear that a player can win if he is on move /\ and there is three or less (i.e., less than four) sticks on the table. Suchpositionscan be consideredleaf nodeswith value 1. If there are four sticks on the table, the player on move has to leave lessthan four and at least one stick. Hence the node with (b) (a) four sticks is a losing node. Induction readily shows that any node with a multiple of four sticks is losing and the rest are Figure 4. (o) Simple fork in tic-tac-toe. The two lines shown dotted each have two empty cells, and they intersect in an empty square winning. where the forcing move is to be made. Graph (b) expresses the same There is thus no need to do a minimax calculation when fact. The two circles stand for the two lines (the numbers indicating playing this game: If the player on her move doesnot have a the number of empty cells). The empty intersection is the solid node. multiple of four sticks on the table, she can reducethe number Graph (b) describes notjust the figure (o) but many other strategically of sticks to a multiple of four, and her opponent cannot but equivalent positions.
A
6b
316
GAME PLAYING
Without going into the depths of the discussion of the data representation needed to do this [6], the reader is asked to convincehimself that the empty board in 3 x 3 x 3 tic-tac-toe is a forcing configuration: The player on move can assurehimself of a win just by playing at the center. The trouble with the Koffman-Citrenbaum technique (13 forceswas that some deep forcescould and 14) for recognuzurrg be upset if one of the defensivemoves of the opponent poseda direct threat to the attacking player. The calculation of the winning nodes is thus only approximate in their method. A node recognizedto be only three moves away from a win may be further away or even not be a winning move. Thus, the calculation has to be backed up by some search (albeit not a minimax search) of the game tree. Evaluations.Many approximatemeasuresof the "goodness" of a position (with "goodness"not necessarily defined in terms of kernels as above)have been suggestedfor various games.In most casesthis has been done with the purpose of simplifying the minimax searchfor games.The arguments that have led to such efforts at simplification are as follows. Consider the case where there is a method for finding whether a node is winning just by looking at the board configuration. In such a case, if a node is given the value 1 if it is winning and 0 if it is losing, this "static" value will be the same as its minimax value. On the other hand, if nothing is known as to whether a node is winning or losihg, I minimax searchdone to the end of the game also leads to the determination as to whether a node is winning. This seemsto indicate that if one has a very goodevaluation for a node, very shallow minimax is needed to determine whether a node is winnitg; conversely, if there is a very bad evaluation available for a node (e.g., ro evaluation at all), a very deep minimax is needed. One can make a very unsophisticated interpolation from this that seemsto indicate that it is possibleto get a good idea as to whether a node is winning even if there is an imperfect evaluation function but the minimax on it is deepenough. This has led to someextensive experimentation with gameplaying programs along the following lines. Given a game to play with a computer, the researcher uses his own intuition and the literature on the specific game to decide a method for attaching a value to the nodesin such a way the most winning nodeshave a higher value than most losing nodes.One then choosesthe "best" move at a node as follows: One considersall the moves one can make and then all the moves an opponent can make from each of the resulting nodes.A part of the game tree is producedthis way by alternating these "expansions"of nodes to a certain depth. The leaves of the resulting tree are then evaluated by calculating the static evaluation function. These values are then propagated"up the tree" by minimax to obtain the best move at the node. Such evaluations have been found useful in constructing game-playing programs in some cases,although there is very little known as to why this is so (see below). However, the general experiencehas been that the technique is useful only when the depth of the tree is chosencarefully. It may even be that certain parts of the search tree should be explored deeper than some other parts. For a thorough discussion of the strengths and weaknessesof such procedures,see Refs. 3,16, and 17. OnIy one such technique is given here. This technique deals with the concept of "stability" of a position and is relevant to games like chessand checkers.The basic idea is that one should not evaluate a board position in
the middle of a major skirmish, where pieces are being exchanged by a sequenceof kills. In a part of the game tree where this is happening, it is better to explore the tree until the end of the skirmish, so the evaluation function does not oscillate violently. At the resulting position, the evaluation can be expectedto be stable. This technique is not foolproof since an apparently stable position can be hiding a threat that can be pushedback due to irrelevant but powerful moves demanding answers, so the threat (on the part of either player) can remain invisible. For a discussion of this so-calledhorizon effect, see Ref. 3 (see also Horizon effect). Many of the game-playing programs augment the minimax evaluation with a static evaluation of moves with respect to their relevance to a position. For instance, one may not want to consider moving a rook's pawn while one is busily engaged building up one's forces at the center of the board. Pruning away the branches of the game tree makes the tree grow at a smaller rate, so that greater depths can be explored. Three distinct heuristics (qv) are being suggestedat this point: one for choosing the intermediate evaluation, one for choosingthe depth at which the evaluation is to be made, and one for pruning the search. How the accuracies of the three different heuristics co-operateto increase the accuracy of the final evaluation, indeed what one means by "accuracy" in the caseof depth limitation, is not known. AII one knows is that if the pruning heuristic is exact, it alone suffi.cesto determine a winning strategy. Meanwhile, there is always an effort to make the minimax search as deep as resourcespermit, the conventionalwisdom being the argument given a few paragraphs earlier. We shall have occassionfor discussingthis conventionalwisdom again later. However, we should mention another game-independent technique that allows one to increase the depth of search by avoiding the evaluation of every "Ieaf' at the end of the specified depth of search. This method, known as the alpha-beta search in the literature (L7,18) was informally suggestedby Simon, Newell, and Shaw (19) and placed on a formal basis by McCarthy (seeAlpha-beta pruning). In Figure 3 the circled leaves of the tree are the only ones that would be evaluated by an alpha-beta procedure. In recent years a number of improvements on the alphabeta pruning technique have been suggested(20-22). The popular line of attack on gameshas been on improving intermediate static evaluation of positions and efforts made at deepening the minimax by the use of some ad hoc pruning strategies and by some formal pruning techniques like alphabeta. It has not been terribly popular to ask questions like, "Why is it better to minimax on static evaluations than to use the evaluation directly in choosinga move?" or "How can one judge the effectivenessof a static evaluation?" Nevertheless, efforts have been made to face these questions. A DisturbingResult:The lmportanceof Evaluationand Learning A recent result obtained by Nau (23) seemsto confirm (hinted at by all that is gone above) that successat game playing is obtained only if one can find automatic methods for calculating kernels (i.e., finding good static evaluation): Efforts at finding more efficient methods of minimaxing is the wrong way for going about the writing of strong game-playing programs. Nau's result can be paraphrased by saying that If one has
GAMEPLAYING an imperfect evaluation function, in a large classof gamesthe quality of the evaluation function deteriorates rather than improves with minimaxing, and its use leads to lower probability of winning than would have if the evaluation function was used directly. Since the result is counterintuitive and since many gameplaying programs (especiallyin the caseof chess)seemto yield greater strength the deeper the minimaxing goes, there is need to understand what Nau's large class of games consist of and why chessdoesnot seemto belong to this class.A number of efforts have been made by Nau to resolve this question (24). In those caseswhere Nau's results apply, however, it seems futile to use approximate evaluation functions and deep search:The only hope seemsto be in the use of exact determination of the kernel. Unfortunately, no one knows any general method to find them. The fact that intuitive methods are inadequate is already obvious. The case where mathematically solid methods have succeeded,on the other hand, have not been very general or exciting. Also, the realization has remained that these methods were by themselvesthe product of hard and sustained human analysis. The automation of such analysis, then, becomesthe real challenge. No easy shortcut around this challengeis available, as any considerationof past history would show. Another difficult question that arises is one on the nature of approximate strategies. After all, human game playing has always been based on such approximations. Nau's results seem to indicate that approximations cannot be improved by minimaxing. It is not known, given an approximate (in some sense)evaluation, to what extent gamesplayed on their basis approximate wins. More precisely, if an evaluation function predicts winnitrg positions with 807ocorrectness,would a person using a strategy basedon it win 807oof the gamesor would he win lUVoof the games? Pearl (seeRef. 24) has suggestedthat perhaps one can skirt the question raised by Nau's results by bypassingminimaxing as a method of move choice. Instead, he suggeststhat one consider the evaluation as a measure of the probability of a win. So if pt pz, . . are the probabilities of win of the nodes reachable from a given node,the probability of win of the node itself, instead of being the maximum of pr, pz, . . would be 1
(1 -Pr)(1 -P),.
The consequenceof this suggestion on the previous questions apparently has not been analyzed. Meanwhile, in relation to the search for better evaluation functions, there have been some successfulexperimentations done with the automatic development of evaluation functions. Learning. So far as present intuition goes, there are two possibleways one can go about developing kernel descriptions or evaluations on the basis of the description of a game. One can use deduction.One can, if one knows that in go-moku one can win by making five in a row, figure out that two simultaneous four in a rows would be impossible to beat. Such deductive techniques have not been tried in the field so far. Alternatively, one could develop the description of good strategies by making generalization from experience.A technique that was tried in the very early days of AI was for the computer to "remember" the positions that lead to a win for either player. In so far as such a large set can be remembered, this is a perfectly reasonable way of going about the business. Good performance can be obtained if one can develop an encoding
317
method that would enable the storing of a large list of positions. The method becomesinadequateagainst strong players, however, even in such comparatively simple gameslike threedimensional tic-tac-toe. What is neededhere is not a method of encodingindividual nodes so that they can be listed and accessedeasily, but a method by which one can describe sets of nodes in implicit form, that is, be developingdescriptionsfor them in somelanguage. The importance of langu age to facilitate description has been discussedat some depth in pattern recognition (qv) literature (occassionally under the presently popular term "learning") (6,25). Learning (qv) techniques of various kinds are also known. Two techniques by which strategieshave been learned on the basis of game experienceare describedbelow. It may be significant that in both these techniques the basis for the learning has not been pure experiencebut a reliance on the formal definition of a kernel. It will be recalled that one of the important properties of a winning position is that its minimax value and its static value are the same. The basis of Samuel's checker-playing program was a learning technique where the description was modified any time the minimax and the static value were not close enough. The description language chosenwas one that gained great popularity at the time through the independently describedperceptron (qv) (26,2T. Given a node in checkers,certain measurementswere made on the board configuration to yield numerical values for such conceptsas mobility, center control, material balance,and so on. A linear combination of these numbers was taken as the value of the node. The learning done by Samuel's program consistedof modifying the coefficientsof this linear combination so that the static and minimax values of nodescame to be closeto one another. The evaluation so obtained would satisfy one property of an evaluation function leading to a kernel. The other property, that is, that the value should be high for winning positionsand low for losing positions,was not confirmed. Also, there was no assurancethat the technique used for modifying the coefficientswould lead to convergence;as a matter of fact, there was a good bit of oscillation in the values found. Also, it was not clear that the kernel would be a linearly separable function of the measurements performed. The fact remains, however, that the program played checkers very well after a period of learning. Efforts were later made (28) to improve the learning performanceby the use of a technique that had been used previously in the field of pattern recognition under the name of learning logical descriptions (zil. This technique has been discussedquite a bit in the literature in recent times (30). It seemsthat the expressiveability of the descriptions learned is somewhat easy to control: One can trade expressivepower for efficiency of learning. However, if the basic measurementsused in constructing descriptionsis well suited, both efficiency and expressive power can be obtained. The effect of a good choice of language was demonstrated very well in the late sixties in the work of Koffman (13), whose program learned approximations to the kernels of positional games in stages. Also, the nature of the approximation was clearly understood: Any winning node would satisfy the approximate description, but not all nodessatisfying the approximate description would be a winning node. However, the reason for this discrepancy was well understood so that once a node satisfied the approximate description, one could determine with a very limited searchof the game tree as to whether the node was winning.
318
GAME PTAYINC Programs have been described that in the past could develop such evaluations from experience. These learning have been heavily dependent on the quality of the language used in these evaluations. Recent work on automatic modification of languages by definition have been mentioned.
BIBIOGRAPHY
(b) Figure 5. (o) Description of a five deep force in positional games; the language is identical to the one used in Figure 4a. (b) Two planes on a qubic (three-dimensional tic-tac-toe) board which obey the description. The reader is encouraged to work through the force. Notice that the two positions are not symmetrical with one another.
The nature of the descriptions has been indicated above in connectionwith positional games.The basic measurementsof the language consisted of looking at the winning paths on the board (e.g.,row, columns, and diagonals at every plane of Qubic) and noting which of these were unobstructed by the opponent and among these the number of empty squares on each and which of the paths had empty intersections.Figure 5 indicates a seven deep force in the language and two of its interpretations on a plane of the Qubic board. It will be noticed that the two positions cannot be obtained one from the other by any symmetry of the board. The basic measurementshave yielded a language of considerablePower. The design of the language in the sixties could not be automatic. The problem of learning, whether in gamesor any other activity, lies with discovering the basic measurements.Until very recently, no method was known for the automatic discovery for such measurements. Some recent work on problem solving (qv) (25,31) has thrown some light on learning. The following were developed in the study of problem-solving: A class of nodes exists on problem graphs that has a clear analogy with the winning nodes of game graphs. Languages have been automatically developedfor writing easy descriptionsfor these nodes.A program developedby Ernst and Goldstein (31) has been effective also in discovering the similarity between a given game and games with known winning strategies. The interested reader is referred to the literature on problem solving and learning for details. Summary In what has gone above, a number of concepts that are of importance in research on game-playing programs has been elucidated. Conceptsof game graphs and game trees have been introduced as well as the idea of evaluating a position by complete search of game trees. Due to the prohibitive amount of computation involved in such evaluation, one is forced to introduce the idea of shallow search and intermediate evaluations. Precisediscussionshave been included to explain when such an evaluation can be considereduseful. The difficulties in the way of improving a bad evaluation have been indicated.
1. L. J. Stockmeyer and A. K. Chandra, Intrinically difficult games, Scientif.Am. 240(5),140 (1979). 2. P. Frey P. (ed.), Chess Skil| in Man and Machine, SpringerVerlag, (1977). g. H. J. Berliner, A chronology of computer chessand its literature, Artif. InteII. 10, 201 (1978). 4. M. A. Bramer, Computer Game-Playing: Therory and Practice, Ellis Horwood Series,Wiley, New York, 1983. b. J. von Neumann and O. Morgenstern, Theory of Gamesand Economic Belwuior, Princeton University Press Princeton, NJ, 0944). G. R. B. Bane{i , Artificial Intelligence: A Theoretical Approach, North Holland, Amsterdam (1990). 7. J. H. Conway, On Numbers and Games, Academic Press, New York, (1976). 8. E. R. BerlekaffiP, J. H. Conway, and R. K. Gty, Winning Ways, Academic Press,New York, (1982). 9. J. H. Conway, "All gamesbright and beautiful," Am. Math. Mon. 84, 417 (1977). 10. J. Pitrat, "A chesscombination program which uses plans," Artif. InteII. 8,275 (1977). 11. I. Bratko, "Advice and Planning in Chess Endgam€s," in A. Elithorn and R. B. Bane{i (eds.),Artificial & Human Intelligence, North Holland, Amsterdam, 1984. 12. A. L. Samuel,"Somestudiesin machine learning using the game of checkers,"I.B.M. J. Res.Deuel.3,210 (1959). 13. E. B. Koffman, "Learning through pattern recognition applied to a class of gam€s," IEEE Trans. Sys. Sci. Cybern SSC'4, (March 1968). 14. R. L. Citrenbaum, "strategic pattern generation: A solution technique for a classof games,"Patt. Recog.,41317 (L972). 15. E. W. Elcock and A. M. Murray, Experiments with a Learning Component in a Go-Moku Playing Program, rn Machine Intelli' gence,Vol. 1, Oliver & Boyd, Edinburgh, UK, L967. 16. P. C. Jackson, Introduction to Artificial Intelligence, Petrocelli, Princeton, NJ, (L974). L7. E. Rich, Artifi,cial Intelligence,McGraw-Hill, New York, (1983). 18. D. E. Knuth and R. W. Moore, "An analysis of alpha-beta prunirg," Artif. Intell. 6,293 (1975). 19. A. Newell, J. C. Shaw and H. A. Simon, "Chessprogramsand the problem of complexity,"IBM J. Res.Deu.,2,320 (1958). 20. G. M. Baudet, "On the branching factor of the alpha-beta pruning algorithm," Artif. Intell. 10, 173 (1978). 2I. G. A. Stockman, "A minmax algorithm better than the alphabeta?,"Artif. Intell. t2, L79 (19?9). 22. J. Pearl, "Asymptotic proerties of minmax trees and game searching procedures,"Artif. InteII. 14, I13 (1980). 23. D. S. Nau, "Decision quality as a function of searchdepth on game trees,"J. Assoc.Comp.Mach.30' 687 (1983). 24. D. S. Nau, "Pathology on game trees revisited, and an alternative to min-maxingr" Artif. Intell. 21,222 (1983). 25. T. M. Mitchell, Learning and Problem Solving, Proceedingsof the International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, p. 1139,(1983).
CAME TREES 26. F. Rosenblatt, "Two Theorems on Statistical Separability in the Perceptron,Proceedingsof the Symposium on the Mechanization of Thought Proce.sses,Her Majesties Stationery Office, London, (1e59). 27. M. Minsky and S. Papert, Perceptrons:An Introduction to Computational Geometry,M I T Press,Cambridge,MA, (1969). 28. M. Samuel, "Some studies in machine learning using the game of checkersII," IBM J. Res.Deu. 11, 601 (1985). 29. R. B. Bane{i , The Logic of Learning: A Basisfor Pattern Recognition and Improuement of Performance, Progressiz Computers 24, Academic,New York, (1985). 30. L. G. Valiant, A Theory of the Learnable, Proceedingsof the 16th Annual Symposium on Theory of Computing, washington, DC, p. 436, 1984. 31. G. W. Ernst and M. Goldstein, M., "Mechanical discovery of classesof problem solving strategies,"J. Assoc.Comp. Mach.2g, ( 1982). R. BarvERJr St. Joseph'sUniversity The preparation of this paper was supportedby the National Science Foundation under grant MCS-82L7964and forms a part of an ongoing researchon knowledge-basedlearning and problem-solvingheuristics.
GAMETREES Most gamesplayed by computer prograffis, including chess, checkers,and Go, are two-player, perfect-information games (see Checker-playing programs; Computer chess methods). There are two adversary players who alternate in making moves,each viewing the opponent'sfailure as his own success. At each turn the rules of the game define both what moves are legal and what effect each possiblemove will have, leaving no room for chance. In contrast to card games in which the player's hands are hidden or to the game of backgammon,where the outcome of a die determines the available moves, each player has complete information about his opponent's position and about the choices available to him. The game begins from a specified initial state and ends in a position which, using a simple criterion, can be declared a win for one player and a loss for the other, or possibly as a draw. A game tree is an explicit representation of all possible plays of the game. The root node is the initial position of the game, its successorsare the positions the first player can reach in one move, their successorsare the positions resulting from the secondplayer's replies, and so on. Terminal or leaf nodes are those representing win, loss, or draw. Each path from the root to a terminal node represents a different completeplay of the game. The correspondencebetween game trees and AND/OR graphs (qv) is obvious.The moves available to one player from a given position can be representedby OR links, whereas the moves available to his opponent are AND links since a response must be contemplated to each one of them. Another way of obtaining this correspondenceis to view each game position J as a problem statement: "Find a winning strategy (for the first player) from J" or, equivalently, ,,Showthat the first player can force a win from J." Clearly, if J admits the first player's moves, this problem is solved if a winning strategy can be found from any one of J's successors,hence the OR
319
Iinks. Similarly, if it is the opponent's turn to move from J, then J is solvedif the first player can force a win from each and every one of J's successors,hence the AND links. Thus, in games the processof problem reduction is completely dictated by the rules of the game; each legal move available to the opponent defines a subproblem or a subgoal, and all these subproblemsmust be solved before the parent problem is declared solved. It is common to call the first player max and his opponent min. Correspondingly, one refers to game positions where it is max's or min's turn to move as max or min positions, respectively. The trees representing the games contain two types of nodes:max nodes,at even levels from the root, and min nodes, at odd levels from the root. Graphically, max and min positions are distinguished by the use of a different node shape; the former is represented by squares and the latter by circles (seeFig. 1). The leaf nodesin a game tree are labeledwin, loss, or draw, depending on whether they represent a win, loss, or draw position from max's viewpoint (see also Minimax procedure). Once the leaf nodesare assignedtheir win-loss-draw status, each node in the game tree can be labeled win, loss, or draw by the following bottom-up process: Status labeling procedure: If J is a nonterminal max node, then fwin loss Status(J) : I atu* I t
if any of J's successorsis a win if all J's successorsare loss if any of J's successorsis a draw and none is a win
,I \ (r)
If J is a nonterminal min node, then fwin loss Status(J) _ J atu* I t
if all J's successorsare win if any of J's successorsis a loss if any of J's successorsis a draw and none is a loss
Q)
The function Status(J) should be interpreted as the best terminal status max can achievefrom positionJ if he plays optimally against a perfect opponent. Figure 1 depicts a simple game tree together with the status of all nodes.The status of the leaf nodes are assignedby the rules of the game, whereas those of nonterminal nodes are determined by the preceding procedure. Solving a game tree 7 means labeling the root node s as win, loss, or draw. Associatedwith each root label there is an optimal playing strategy that prescribeshow that label can be guaranteed regardlessof how min plays. A strategy for max is a subtree T + of T called a solution tree, which is rooted at s and contains one successorof every nonterminal max node in ?+ and all successorsof every nonterminal min nodein ?+. A game-playing strateg;z T - for min will contain, of course,the opposite types of nodes; one successorof every nonterminal min node and all successorsof every nonterminal max node included in T - . Of particular interest are winning strategies, that is, strategies that guarantee a win for max regardless of how min plays. Clearly, dwinning strategy for max is a solution tree 7+ whose terminal nodesare all win. Figure 1 shows a winning stratery for max (in heavy lines) and one nonwinning stratery for min (following broken lines). Consider now an arbitrary pair of strategies; one for max, T +, and one for min , T -. It is not hard to see that the
320
GAME TREES
_
MAX NODE
MIN NODES
MAX NODES
MIN NODES
MAX NODES Figure 1. An evaluated win-loss-draw game tree showing a max strategy (boldfacetree) and a min strategy (in broken lines).
two sets of terminal nodes associatedwith the two subtrees have exactly one leaf node in common.Indeed,the intersection of the two strategies definesthe unique play path that results if both players adhere to their corresponding strategies, and the one common leaf node is, in fact, the end position that results from this play. -) Let (T * , T denote the leaf node common to strategies 7 + and T . Supposemax is forced to choosea strat egy T * ahead of the play, to show it to the opponent, and then stick to it during the play: What 7 + would be his best choice?Being at such a disadvantage,max should reason as follows: If I choose T * , my opponent, knowing all my plans, would definitely respond so as to lead me toward the least favorable leaf tn T + with label minT- Status(T*, T-). Now that I have the option of choosing T * , I can guarantee myself ffI?Xp+minT-). On the other hand, supposethe roles are reStatus(T * , T versed and min is put at the disadvantage of adhering to a predisclosedstrategy T . By a similar argument, min could guarantee that max would not achieve any status better than -). minT- rrlsXa+Status(?+, T An important consequenceof the assumption of perfect information games is that these two guarantees are equal to each other, and moreover, they are given by the status of the root node as computedby Eqs. (1) and (2). Thus, Status(s): maxminStatus (T',7-)
(3)
Status(s):%it
(4)
T+T
"l,q"Status(Tr,T-)
The validity of these two equalities can be easily proven using bottom-up induction on a general game tree. The interpretation of these alternate definitions, however, is rather profound; it implies that in perfect information games it doesnot
matter if you choosea rigid plan ahead of time or make your decisionsas the game goes along. Moreover, rather than conducting an optimization exercise over the enormous spaceof strategy pairs, one can find an optimal playing stratery using the status labeling procedureof Eqs. (1) and (2). Although the significance of this result is mainly theoretical, it is sometimes more convenient to view Status(s) as a product of optimi zation over strategies rather than the value returned by a labeling procedure.An example of such an occasion arises when one wishes to answer the following question: Supposesomeoneclaims that the root node of a certain game tree evaluates to a draw, what kind of information must he furnish to substantiate this claim? Had the claim been that s is a win, then clearly all that is neededis to exhibit one winning stratery. Similarly, to defend the assertion "s is a loss," one need only demonstrate the existenceof one winning stratery for min, that is, a min strategy with all loss leaf nodes. However, now that the claim is "s is a draw," would a single stratery suffice? Equations (3) and (4) imply that two strategies are now needed.From Eq. (3) one seesthat if there exists a max strategy T + containing no loss leaves, then no matter what min does, max can guarantee at least a draw. Moreover, if there exists a min stratery T with no win nodes,Eq. (4) implies that, no matter what max does, min can prevent him from obtaining a win. Thus, two adversary strategies with compatible values are both necessaryand sufficient to verify that the game is a draw. This result establishes an absolute limit on the number of nodesthat must be examined before a game tree can be solved. Att the leaf nodes of two compatible strategies, T + and T must be examined in casethe game is a draw, whereasa single strategy is sufficient in case of a win or a loss. Equivalently, the task of solving a game can be viewed as the task of finding
CYLINDERREPRESENTATION 321 GENERATIZED
at most one pair of compatible strategies. This statement is tion of a specificobject(2), e.9.,the representationof a specific true in general, even when the leaf nodes can take on more block as a cylinder with specifiedparameters. An ordinary cylinder is the volume swept out by translatthan three possiblevalues (e.g.,continuous);a pair of strategies is required to certify the value of any game tree. Since ing an arbitrary crosssection along an infinite straight line. A each strategy tree branches out once in every two movesof the cylinder is translationally invariant along its axis of rotagame, the number of nodes contained in a typical strategy is tional symmetry. It may be truncated at either end. A circular about the square root of the number of nodesin the game tree. cylinder has circular cross section, and a prism has polygonal Therefore, every search stratery that solves or evaluates a crosssection.Both are special casesof cylinders. A cylinder may be generalized in two ways, by sweeping game tree must examine at least twice the square root of the along a space curve called the spine or axis, instead of a number of nodes in the entire game tree. In practice, this lower bound of twice the square root is straight line, and by transforming the cross section as it is rarely achievedbecauseone doesnot know in advancewhich swept. Sweeping a circle along a circle generates a torus. of the partially exposedstrategies are in fact compatible, and Sweeping a circle along a helix generates a helicoid, €.9., & so, many incompatible strategies are partially searchedonly coiled spring. A coneis the volume swept out by a circle as it is to be abandonedwhen more of their leaves are exposed.The translated and scaledlinearly along a straight line. The cross knowledge required for guiding the searchtoward finding two section can be transformed by rotation, scalirg, or distortion. compatibte strategies is equivalent to knowing, at each game A screw is a circle with a notch that is rotated while it is swept configuration, what the best next move is for each player. along a straight line. A screw is also the set differencebetween Search strategies (seeAlpha-beta pruning) that use no heuris- a cylinder and a helicoid. If the sweepfunction is not constant, tic information regarding the relative merits of the pending a generalized cone (GC) is generated. The terms generalized moves will explore, on the average, roughly the four-thirds cylinder and generalized cone are often used interchangeably. root of the number of nodes in the game tree (seeBranching GCs may be expressedby gen eralizedtranslational invariance factor). As the move-rating information becomesmore accu- in which one crosssection is mappedinto another by a translarate, the number of nodesexamined gradually approachesthe tion followed by a congruenceoperation. The spine is often not unique. absolute square-root bound. GC primitives are segmentedat discontinuities in crosssection. They may be truncated by a surface,e.9.,a plane face or General References hemisphere. Primitives may be formed by smooth joining of A. Barr and E. A. Feigenbaum, Th.e Handbook of Artificial Intellielements,as in splines.A GC primitive is related to a complete gence, Vol.1, William Kaufmann, Los Altos, CA, 1981. shapeby a rotation and translation, which may be parameterN. J. Nilsson , Principles of Artificial Intelligence, Tioga: Palo Alto, CA, ized by an articulation, e.g., a ball-and-socketjoint. This has 1980. long been standard in many disciplines, especially, computer J. Pearl, "Asymptotic properties of minimax trees and games-searchgraphics and physics. ing procedures," Artif. Intell. l4(2), 113-28 (1980).
J. Pearl, Heuristics: Intelligence Search Strategies for Computer Problem Soluing, Addison-Wesley: Reading, MA, Chapters 8, 9, and 10, 1995.
Representation lssues
I. Roizen and J. Pearl, "A minimax algorithm better than alpha-beta? (1983). Yes and ho," Artif. Intell.2l(L-2),199-220
Considerations for representation of volumes were discussed in Refs. 1 and 3-6. Someof these representationissuesfollow.
C. E. Shannon, "Programming a computer for playing chess," Philos. Mag. 4l(7\, 256-275 ( 1950).
CompleteSpecification.Specifyingthe crosssection,sweep function, and spine determines a generalizedcylinder. A complete specificationis locally generative, i.e., it approximates local forms and piecesthem together globally to cover a large G. Stockman, "A minimax algorithm better than alpha-b eta?" Artif. classof primitives (1). A GC may be specifiedby sample cross Intell. l2(2), L79-196 ( L979). sectionsand interpolating functions, i.e., 8s a spline. It resemJ. Pnenr, bles a loaf of sliced bread. It may be thought of as a mapping UCLA between cross-sections.The spine is represented as a spline space curve. The cross-sectionis a compound of primitive cross-sections,each of which is a generalized cylinder in a lower dimension. GENERALIZED CYLIN DERREPRESENTATION Generalized cylinders combine surface and volume repreGeneralized cylinders are used to represent primitive shape sentations. A volume GC primitive is generatedby sweepinga elements, i.e., volumes, surfaces,or plane figures. Complete surface cross section that may be planar. A surface GC primishapes are then represented as part-whole graphs of joined tive is generatedby sweepinga curve crosssectionthat may be primitive shapes (1). Complete shapes are like splines, i.€., planar. A surface crosssection is made from surfaceGC primipiecewise smooth combinations of generaltzed cylinders with tives, a part-whole graph. A cross section primitive is specified as a generalized cylinder, i.e., by sweeping. The same continuity conditions or discontinuities at joints. A representation of a system A is a map from A to a system issues are relevant for cross sections and plane figures as for B that preserves the structure of A. Frequently the map is volumes. Implementations to date have used a weaker reprefrom an abstract to a more concretetype, BSin representation sentation of cross sections by their boundary segments,e.g., of the rotation group by the group of orthogonal matrix trans- Ref. 7. Figure 1-shows examples of volume and surface primiformations of Cartesian space.A description is a representa- tives. J. R. Slagle and J. K. Dixon, "Experiments with some programs that search game trees," JACM 16(2),189-207 0964\.
322
GENERALIZED CYLINDERREPRESENTATION
Figure 2. Area andcurverelations.The utility of arearelationson a branchingstructure.Analogousconditionshold for volumeand surfacerelations.
an area sensecan be arbitrarily far apart in a boundary sense. Surfacesthat are closein a volume sensecan be very far apart in boundary order. The volume elements thus defined are convenient for important physical operations. GC volumes provide relations among surface elements in the sense of twofinger experiments, grasping, as contrasted with one-finger experiments, surface tracing. GCs are locally reahzable, i.e., cross-sectionalslices are closed and nonintersecting. This issue is equally important for representing areas. (b)
Similarityand Object Class.A key problem in computational vision is identification of object classescomposedof objects elementsof a volumegeneralized Figure 1. (o)Cross-section cylinder. that are not identical but are "similar." Object classesmore (b) SurfaceGC generatedby sweepinga curve. resemble functional classesthan shape classes;shape is an indicator of function (8). A representation of shape enables a similarity classification. Generalized cylinders provide a Continuity. Primitives are defined by continuity of cross structure of similar shapes.Shapesare similar if they have the section.Intuitively, a block is a physical part; its facesare not samepart-whole structure and approximately similar proportions of parts. Spine, crosssection,and sweepingrule form the parts. GCs attempt to capture this senseof volume continuity. basis for a taxonomy of primitives (3,9,10). Surface continuity determines surface primitives, i.e., faces. Product Decomposition.GCs are specifiedas a product of spine and crosssection. A surface crosssection is itself a product. This param eterization of three-dimensional shapealong a curve is especially simple for simple shapes and may define small, additive complexity in typical casesof branching objects.
Disjoint. Primitives should be constructed of disjoint elements. Fourier eigenfunctions and Blum transform neighborhoodsare not disjoint (11).Cross-sectionalslicesof generalized cylinders are disjoint. This criterion has intuitive value becauseit leads to local representation,not becausedecomposition into orthogonal functions is difficult.
Interior. GCs combineinterior (volume)and boundary (surface)representation of three-dimensional shapein three space and interior (area) and boundary (curve) representations of two-dimensional shape in the plane. GCs are composedof cross-sectionalslices, elements that have the same dimension as the shaperepresented.A finite number of volume elements cover a well-behaved volume. FigUre 1 shows an example.
Formalization.A generalizedcylinder maps one cross section into another, as discussedabove.The map can be singular and crossing,&sat the apex of a cone.The map must be continuous along the spine and within a cross section. To avoid "kinking," the axis of any rotation should not lie within a cross section. An effective definition of maps of cross sections is to transform the axis of a cross section into a new axis and to transform its cross section.
Structure. GCs define a boundary representation different from typical surface representations. GCs define a volume relation between "opposing surfaces" whereas typical surface representations relate adjacent surfaces acrossedges.The issue is not only interior vs. boundary representations but whether volume adjacency or surface adjacencyrelations are specified.Figure 2 shows the difference on a branchittg structure in two dimensions.In that example curvesthat are closein
Adequacy. It is often stated that generalized cylinders are adequate for elongated objects. They are also adequate for short, wide objectsthat are not elongated at all, e.9.,coins,but that have a direction of generalized translational invariance. Generalizedcylinders are not apt for spheresfor which there is no such direction. The correspondenceof "opposite" surface elements, which is central to generalized cylinders, is useful
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
for spheres or quasi-spheres.GCs are not apt for crumpled pieces of paper or rocks, which may not have compact representations. There may be no systematic representation, which is better. The opposite relation among surfacesis still useful in these cases.One theme of representation is to model fabrication. That is, a heart may be representedby a volume model. However, a better model is to representindividual musclesas generalized cylinders and to represent volume relations between them to the extent that they are coordinated. To the extent that independent objects are unaligned or have different shapes, volumes of free space are often complex and not easily represented by generalized cylinders. Free spacein architecture is often well described by generalized cylinders, however. Levelsof Detail. Typical objects have branching structure. The importance of parts is not entirely related to their size, i.e., fingers are important in a model of a human, but they are small compared to the torso. It is useful to include in one description small detail, down to the level of cells if necessary, along with gross detail. Branching structures have exponential detail; typical joints preserve area and branch sizes decreaseexponential. More generally, a discrete structure of fabrication gives a natural level of detail, Iike the human body built of muscles, organs, etc, each built of layers.
4 . A. J. Thomas and T. O. Binford, Information ProcessingAnalysis of Visual Perception: A Review, Stanford Artificial Intelligence Memo AIM-227, Stanford, CA, L974. 5 . T. O. Binford, "survey of model-basedimage analysis systems,"J. Robot.Res. 1, 18 (1982). 6 . D. Maru, Vision, W. H. Freeman, San Francisco, CA, L982. Also D. Marr and K. Nishihara, Roy. Soc.Lond. B 200,269-294 (1978). 7 . R. A. Brooks,R. Greiner, and T. O. Binford, A Model-BasedVision System, Proceedings of the Image Understanding Workshop, Boston, May 1978.Also R. A. Brooks,"Symbolicreasoningamong3-D models and 2-D images," Artif. Intell. J., August 1981. 8. P. H. Winstotr, T. O. Binford, B. T. Katz, and M. Lowry, Learning Physical Descriptionsfrom Functional Definitions, Examples and Precedents, Stanford University AIM-349 Report STAN-CS-82950, Stanford, CA, 1983; MIT AI Memo 679, Cambridge, MA, 1983. 9. J. Hollerbach, MIT A1 Tech. Rept. AI-TR-346, Cambridge, MA, Nov. L975. 10. S. Shafer and T. Kanade, The Theory of Straight Homogeneous GeneralizedCylinders and A Taxonomy of GeneralizedCylinders, Carnegie-Mellon University CMU-CS-83-105,Pittsburgh, PA, 1983. 11. H. Blum, A Transformation for Extracting New Descriptors of Shape, in W. Dunn (ed.), Models for Perception of Speech and Visual Form, MIT Press, Cambridg", MA, pp. 362-380, L967. 12. M. Brady and H. Asada; Smoothed Local Symmetries and their Implementation, MIT AI memo757, Cambridge,MA, 1984.
ComputerVision Generalized cylinders were invented in L97I (1) with strong influence from the Blum transform (11). They were intended for use in computer vision (qv) for symbolic description of object classes.Agin used them in describing primitive curved objects in depth data Q). Nevatia used them in segmenting depth data into complex objects,in structuring a visual memory, in indexing into the visual memory, and in recognition (3). They were used in the ACRONYM model-basedsystem (7). A subclassof generalized cylinders was consideredin Ref. 6. For vision, it is essential to compute descriptionsby feasible algorithms. An original motivation was that much of the partwhole structure and shapes of parts were recoverable from images, i.e., quasi-invariant. Levels of detail could be accommodated in the part-whole graph. Two-dimensional projections of generalized cylinders have been called ribbons. Extracting generalized cylinders from depth data and from images has primarily depended on the opposite relation between curve boundary elements (3,12). This has been called "smoothedlocal symmetries" (L2). In dealing with depth data, much current activity is concerned with surface representations, e.9.,lines of curvature. Volume representationsprovide more global structure, e.9., Figure 2.
BIBLIOGRAPHY 1. T. O. Binford, Visual Perception by Computer, invited paper at the IEEE Conference on Systems, Man and Cybernetics, Miami, FL, 1971. 2. G. J. Agin, Representation and Description of Curved Objects, Stanford Artificial Intelligence Memo AIM-L73, Stanford, CA, L972. 3. R. Nevatia, Structured Descriptions of Complex Curved Objects for Recognition and Visual Memory, Stanford Artificial Intelligence Memo AIM-250, Stanford, CA, 1974.
323
T. O. Brur'onr Stanford University GENERATION OF EXPIANATIONS. See Expert Systems.
COAI DRIVEN PROCESSING.See Processing.
GPS Developedby Newell, Sh&w, and Simon, GPS is an inference (qt) system for general problem solving (qt). It solves a probIem by findirg, through means-ends analysis (qv), a sequence of operators that eliminate the differences between the given initial and goal states (seeA. Newell, J. C. Shaw, and H. A. Simon, Report on a General Problem-Solving Program for a Computer, in Information Processing:Proceedingsof the International Conference on Information Processing, UNESCO, Paris, 1960, and A. Newell and H. A. Simon, GPS, a Program that Simulates Human Thought, in E. A. Feigenbaum and J. Feldman (eds.),Computersand Thought, McGraw-Hill, New York, 1963,pp., 279-293). YuueN A. HnxYoNG SUNY at Buffalo
INTERACTIONS.SeeNatural-language interfaces. GRACEFUL
TRANSITION GRAMMAR,AUGMENTED NETWORK Augumented transition network grammars (ATNs) have been highty successfulas a formalism for expressing the syntactic
324
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
rules of natural languages in a form that can be used efficiently by a computer. They were developedfor use in naturallanguage understanding (qv) systems such as LUNAR (1), a system that answers English questions about the Apollo 11 moon rocks. However, they have also been used as a model to predict aspects of human performance in language-understanding tasks (2) and for linguistic field work in exotic languages (3) and have influenced the development of modern linguistic theory (4). ATNs are now taught as a standard technique for constructing computerizedgrammars of natural language (5,6) and serve as the basis for products such as naturallanguage interfaces to database systems and other applications. Generalizations of ATNs have been used for modeling discourse structure (38) and for natural-language generation (7,39).An article by Bates (7) and the text by Winograd (5) are excellent sourcesfor further information. There are two principal kinds of transition network grammars, recursive transition networks (RTNs) and augmented transition networks (ATNs), the latter being defined as an extension of the former. A recursive transition network is essentially a network of nodes representing partial states of knowledge that arise in the course of parsing (qv) a sentence. States are connectedby "arcs" indicating kinds of constituents (words or phrases)that can causetransitions from one state to another. The states in the network can be conceptuallydivided into "levels" corresponding to the different kinds of phrase that can be recognized.Each such level has a start state and one or more final states and can be thought of as a recognition automation for one particular kind of phrase. A simple pictorial example of a transition network grammar is illustrated in Figure 1. In Figure 1, states are representedby small circles and arcs are represented by arrows connecting states. Each arc is labeled with the name of the kind of constituent that will enable that transition if it is found at that point in the input string. This sample grammar has three levels: S for sentence,NP for
noun phrase, and PP for prepositional phrase. Each level begins with a state whose name indicates the kind of constituent being searched for. In the naming convention used here, a state name consists of the name of the constituent being sought, followed by a slash (l), followed by a brief mnemonic indication of what has been found so far. This naming convention is not an essential part of a transition network grammar but is a useful device for making grammars readable. Each level ends with one or more final states (indicated by a short arrow labeled POP), which mark the successfulcompletionof a phrase. A sequenceof arcs from a start state to a final state defines a sequenceof constituents that can make up a phrase of the kind sought by the start state. The first state in the sample grammar (S/) is the state in which the parser begins and is the state of knowledge corresponding to the initial assumption that a sentence is to be parsed. The topmost arc sequencein the figure shows that a sentence(S) can consist of a noun phrase (NP), followed by a verb (V), followed by another noun phrase (NP), followed by any number of prepositional phrases (PPs).Alternatively, the first noun phrase can be followed by an auxiliary (AUX) before the verb, or the sentencecan begin with an AUX followed by an NP before picking up in state S/NA with the samepredicted verb phrase constituents as in the first two cases. RTNsand ATNs The grammar model describedaboveis called a recursive transition network or RTN becausethe arcs of the grammar can invoke other levels of the network to recognize subordinate constituents that can in turn invoke other levels (recursively). This processmay eventually reinvoke some level "inside itself" (genuine recursion). In the above example a prepositional phrase (PP) contains a noun phrase (NP), which can contain another PP, which contains another NP, and so on for as many levels as one might care to go. This gives rise to
ADJ
\/ DET \ NP/ - N Y r v r r\/4 \
--(^,-\
\/ N
:/ /\
NP/N
NPR
: auxilFigure 1. Sample transition-network grammar: S : sentence, NP : noun phrase, AUX : : ADJ _ : determiner. phrase, DET of end phrase, POP prepositional verb, PP iary, V adjective, N - noun, PREP - preposition.
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
sentencessuch as "John saw the man in the park with a telescope,"which can contain as many modifiers as desired. An augmented transition network grammar (ATN) is a recursive transition network that is augmented with a set of registers that can hold partial parse structures and with conditions and actions on the arcs that can test and set these registers. The registers can be set to record attributes of the phrases being recognizedand tested to determine the acceptability of a particular analysis. For example, registers can be used to record the person and number of the subject of a sentence,and a condition can be used to check that the subject and the subsequent verb agree in person and number (thus rejecting such sentencesas "the boysis tall"). The registerscan alsobe usedto record the piecesof structure that will eventually make up the analysis of the phrase being parsed, and actions on the arcs can build a variety of useful structures beyond simply a literal record of the sequenceof input phrases consumed.In particular, register-setting actions can be used to build structures correspondingto deepstructure (qv) analysesd la Chomsky (8) in which, for example, passive transformations have been undone so that the surface subject of a passive sentenceoccupies its logical object position in the resulting structure. (Thus, the sentence"John was shot" may be parsed into a structure equivalent to "someoneshot John.") Although the above presentation has described ATNs as recognizersthat parse sentences,they can also be thought of (dually) as generators that produce acceptablesentences. ATNs were the first grammar formalism that could produce deep structure analyses of the sophistication and complexity of a transformational grammar (qv) for a substantial range of English and do so rapidly and efficiently on a digital computer. As Bates (7) reports: "They have proved to be flexible, easy to write and debug, able to handle a wide variety of syntactic constructions, and easy to interface to other components of a total system. They provide a useful way to give an account of linguistic structures which can be easily communicated to both humans and computers, and they may be partially presentedby easily visualized diagrams."This last point is one that has been referred to as the "perspicuity" of ATNs relative to transformational grammars (g). Of course, &try grammar for a substantial portion of a natural language will contain a lot of detail that requires effort to understand. What ATNs specifically have to offer in this respect is the ability to follow arcs forward and backward through the pictorial representation of the network to determine where registers might be set or tested and what the major sequencescan be. This is in contrast to a transformational grammar, where the only way to tell whether one transformation can apply to the output of an earlier one is to imagine all of the intermediate structures that could be produced"in between" by other transformations.
325
kinds of efficiency result from a principle of "factoring" (10I2).Factoring amounts to merging common parts of alternative paths in order to reduce the number of alternative combinations explicitly enumerated. Two kinds of factoring can be distinguished: "Conceptual factoring" results from merging common parts of the grammar to make the grammar as compact as possible, whereas "hypothesis factoring" results from arranging the grammar so as to merge common parts of hypothesesthat will be enumerated at parse time. Conceptual factoring promotes easeof human comprehensionof the grammar and should facilitate learning of grammars by machine. Hypothesis factoring promotes efficiency of run time execution. The merging of common parts of otherwise separate grammar rules promotes an efficient branching decision structure analogous in some respectsto a decision tree. In fact, one can think of an ATN as a generalization of the notion of decision tree to permit recursion, looping, register augmentation, and recombination of paths. History
Augmented transition network grammars (ATNs), as known today, derive from the work of this author (9, 13, L4),although similar, lesswell-developedmodelsappearedindependentlyin the work of several others (15, 16).The ATN model was developedat Harvard University as a means of efficiently producing syntactic analyses for input to a semantic interpretation system (seeSemantics,procedural).ATNs were first applied as a front end to a natural-language question-answering (qv) system dealing with airline flight schedules(17), which was then extendedto a system that could interrogate the ATN grammar itself as if it were a database(18). The first major test of the ATN formalism was in the Lunar SciencesNatural Language Information System (LUNAR) developedat Bolt, Beranek and Newman Inc. for the NASA Manned SpacecraftCenter (1, 1g). The earliest widely available publication describittg ATN grammars is Ref. 9. An earlier Harvard University technical report (13) contains a more complete description, including some theoretical results on the elimination of left and right recursion, the minimization of branching in an RTN network, and the use of RTNs in a generahzation of Earley's algorithm (20), none of which has been published elsewhere. ATN grammars can be motivated by a chain of reasoning that begins with notations commonly used by linguists in the 1960s to abbreviate certain patterns of context-free grammar rules. Specifically, linguists frequently used (and still use) the following notational devices in the right sides of context-free grammar rules: curly brackets ({ }) to indicate alternative choices,the Kleene star operator (*.)to indicate arbitrarily repeatableconstituents,and parenthesesto indicate optional constituents. An example would be S + NP (AUX) V (NP) pp*, Factoring indicating that the auxiliary verb and the object noun phrase ATNs have the advantage of being a class of automata into are optional and any number (zero ro more) of prepositionwhich ordinary context-free phrase structure grammars (see al phrases are permissible. Such notations are typically Grammar, phrase structure) and "augmented,. phrase struc- thought of as abbreviations for sets of ordinary context-free ture grammars have a straightforward embedding but that grammar rules, even though the use of the star operator abpermit various transformations to be performed to produce breviates what woutd be an infinite set of equivalent contextgrammars that can be more efficient than the original. Such free rules. transformations can reduce the number of states or arcs in the Prior to the invention of recursive transition networks grammar or can reduce the number of alternative hypotheses there was no recognizedparsing formalism that could directly that need to be explicitly consideredduring purritrg. Both handle alternative and arbitrarily repeatable constituents.
326
GRAMMAR,AUGMENTEDTRANSTTTON NETWORK
The insight that led to RTN grammars was to observethat the concisenotations used by the linguists were equivalent in expressive power to Kleene's formulation of regular sets (40). Regular sets are sets of strings over some vocabulary formed by the closure of the finite strings under the operations of concatenation,set union (+), and arbitrary repeatability (x). (Set union with the empty string as an alternative can be used to indicate optionality of constituents.) Regular sets are known to be equivalent to finite-state machines,which in turn can be expressedin the form of a finite-state transition diagram QD. (A finite-state transition diagram is a labeled, directed graph similar to Figure 1, exceptthat all transitions are labeled with elements from the terminal vocabulary of the language (i.e., one cannot expressa transition that invokes a subportion of the network recursively as in RTNs). Finitestate transition diagrams were in fact consideredas a possible formalism for natural-language grammars in the early days of computational linguistics, but they failed to deal successfully with self-embeddingconstructions. RTNsand Context-FreeGrammars RTNs provide a formalism that can be used by generalizations of ordinary context-free parsing algorithms to deal directly with conceptssuch as alternative sequences,optional constituents, and arbitrarily repeatable constituents without having
1. 5 ->
to treat them as abbreviations for (possibly infinite) sets of rules or to reexpress them in terms of rules that introduce "fictitious" phrase types whose sole purpose is to share common parts of different rules or to express iteration of repeatable constituents. One can obtain an equivalent recursive transition network from a given context-free grammar by collecting all of the rules that share a given "left side" (i.e. all of the rules for forming a grven phrase type) and replacing them with a single rule whose right side is a regular expressioncorrespondingto the union of the right sidesof the original rules. One can then convert that right side regular expression to an equivalent transition diagram by a standard mechanical algorithm (22). A result of this author (13) shows that the resulting recursive transition network can be further optimized by the elimination of left and right recursion and the application of standard state minimization techniques (originally developedfor finitestate machines),whose effect when applied to a recursive transition network yields a transition network grammar with greatly reduced branching. Figure 2 illustrates this sequence. A standard theorem of formal language theory (23) proves that a language acceptedby a context-free glammar can be acceptedby a finite state machine unless every context-free grammar for the language contains at least one self-embedding symbol (i.e., a phrase type that can contain a proper internal embedding of the same type of phrase such as the middle S
l F S T H E N5
2.5->5AMS 3.5->5ORS P
4' 5-> (o)
Figure
2. (a) Sample context-free grammar.
(b) Equivalent
RTN. (c) Optimized RTN.
TRANSITION NETWORK AUGMENTED CRAMMAR, in the rule: S -+ if S then S). The RTN optimization results show that a given context-free grammar can be converted to an RTN, which can then be optimized until the only remaining PUSH transitions are for self-embedding constituents. Together, these results suggest that a context-free grammar can be thought of as having a finite-state part and a recursive part. The RTN optimizattonconstructions show how to extract all of the finite-state part into transition network form to which conventional finite-state optimization techniques can be applied. Note that when the standard state minimtzatton transformations are applied to a recursive transition network, they do not quite produce a deterministic network as they do for finitestate g1ammars,although they do producea network in which no two transitions leavin g a given state will have the same label. This is not sufficient to guarantee determinism for an RTN becausetwo transitions that push for different types of phrases may nevertheless recognize a common sequenceof input symbols (i.e., the grammar may be ambiguous).Even if the grammar is not ambiguous, two different phrase types may begin with somecommon initial sequence,and the grammar would not be able to tell which of the two phrase types were present before examining the sequencefurther. However, the results of such transformations can produce grammars with very little nondeterminism that can be parsed quite efficiently. (In an ATN one can exploit techniques such as finite look-aheadconditions and merged subordinate networks to produce grammars whose nondeterminism is reduced still further.) Another result (13) shows that such reduced branching RTNs can be used by a generalization of Earley's parsing algorithm Q0 to minimize the number of state transitions that need to be considered in the course of parsing. That is, an optimized RTN is used more efficiently by a generahzation of Earley's algorithm than an unoptimized RTN or an unaltered context-free grammar. RTNs are equivalent in weak generative power (i.e., can charactertzethe same sets of strings) as context-freegrammars or pushdown store automata. RTNs are slightly stronger than context-free grammars in terms of the tree structures they can assign (strong generative power) since they can characterize structures with unboundedbranching at a single level as in Figure 3. AugmentedTransitionNetworks As mentioned above, an ATN consists of an RTN augmented with a set of registers and with arbitrary conditions and actions associatedwith the arcs of the grammar. ATNs were developedin order to obtain a grammar formalism with the linguistic adequacy of a transformational grammar and the efficiency of the various context-free parsing algorithms. As a sentenceis parsed with an ATN grammar, the conditions and actions associatedwith the transitions can put pieces of the input string into registers, use the contents of registers to
327
build larger structures, check whether two registers are equal, and so on. It turns out that this model can construct the same kinds of structural descriptions as those of a transformational grammar and can do it in a much more economicalway. The merging of common parts of alternative structures, which the network grammar provides, permits a very compactrepresentation of quite large grammars, and this model has served as the basis for several natural-language-understanding systems. ATNs have also been used in systemsfor understanding continuous speech such as the Bolt, Beranek, and Newman HWIM system (24,25). For speech understanding (qv) the transition network grammar is one of the few linguistically adequategrammars for natural English that are at all amenable to coping with the combinatorial problems. A state in an ATN can be thought of dually as a concise representation of a set of alternative possible sequencesof elements leading to it from the left or as a conciseprediction of a set of possiblesequencesof elements to be found on the right. (Alternatively, it can be thought of in a right-to-left mode.) The reification of these states as concreteentities that can be used to represent partial states of knowledge and prediction during parsing is one of the major contributions of ATN grammars to the theory and practice of natural-language understanding. They are especially important in representing states of partial knowledge in the course of speechunderstanding. The ATN formalism suggestsa way of viewing a grammar as a map with various landmarks that one encounters in the course of traversing a sentence. Viewed in this way, ATN grammars serve as a conceptual map of possible sentence structures and a framework on which to hang information about constraints that apply between separateconstituents of a phrase and the output structure that the grammar should assign to a phrase. For speechunderstanding this perspective is beneficial, for example, in attempting to correlate various prosodic characteristics of sentencessuch as intonation and rhythm with "geographical landmarks" within the structure of a sentence. Another advantage of the transition network formalism is the ease with which one can follow the arcs backward and forward in order to predict the types of constituents or words that could occur to the right or left of a given word or phrase. One of the important roles of a syntactic componentin speech understanding is to predict those places where small function words such as "a", "arr," and *ofn should occur since such words are almost always unstressedand difficult to distinguish from accidentally similar acoustic patterns in spoken sentences.In the HWIM speechsystem such words are almost always found as a result of syntactic prediction and are not even looked for during lexical analysis, where more spurious matches would be found than correct ones. Other types of grammars, such as context-free grammars, can be augmented by conditions and actions associatedwith the grammar rules. However, such grammars lose someof the benefits of the recursive tranbition networks, such as merging common parts of different rules and applying optimi zung transformations. Specifyingan ATN
DET N
PP
PP
PP
Figure 3. Illustration of unbounded branching.
It is important to maintain a distinction between the underlying abstract state transition automaton that constitutes the essenceof an ATN and the various surface notations that can
328
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
be used to specify an ATN grammar. A variety of notations have been developedfor specifying ATN grammars. This author's original ATN parser was written in LISP and used a notation in which the conditions and actions on the arcs were specifiedin LISP, but this is not essential.Later ATN implementations have simplified and streamlined the notations for expressing conditions and actions, and a number of other grammar formalisms can be thought of as specialrzedspecification languages whose underlying parsing automaton is an ATN (e.g., Ref. 26). With the advent of widely available graphics interfaces, one can even visualize using the gfaphic presentation of an ATN transition diagr&ffi, coupled with an interactive specification of the conditions and actions on the arcs, as a specification medium. Figure 4 gives a BNF specificationfor one notation that can be used to specify an ATN grammar. It is similar to most ATN formalisms, except that conditions on arcs are expressedin terms of an action (VERIFY (condition)), an infix assignment operator ( + ) is used in place of the more customary SETR function, and functions (NE and PC) are used to refer to the next input element and the parsed constituent of a push arc, respectively (in place of the asterisk, which served both purposesin Ref. 9.) In this notation an ATN specification consists of a list of state specificationseach of which consistsof a state name and a set of arc specifications.Arcs can be one of the five indicated types. A CAT arc aceeptsa word that is recorded in a dictionary as belonging to the specifiedsyntactic (or semantic) category; a WRD arc acceptsthe specificword named on the arc;a PUSH arc invokes a subordinate level of the ATN to recognrze a phrase beginning with the specifiedstate; a POP arc signals the completion of a phrase and specifiesan expressionfor the value that is to be returned as the structure for that phrase. A JUMP arc specifiesa transfer of control from one state to another without consuming any input.
+ ->
(state) <arc)
+
(<state (<state-name> <arc) (arc)*) (CA T (ca te g or y- name ( W R D < E n g l i s h - w o r d> < a u g m e n t a t i o n > * ( T O < s t a t e - n a m e > ) ) | ( P U S H( s t a t e - n a m e > < a u g m e n t a t i o n > * ( T O < s t a t e - n a m e > ) ) I ( P O P< e x p r e s s i o n> ( a u g m e n t a t i o n > . ) I ( J U M P( s t a t e - n a m e > < a u g m e n t a t i o n > * )
( a u g me n ta ti o n )
+
( V E R I F Y< c o n d i t i o n > ) | ( action )
( action > ->
( re gister - name> €- < expr ession> ( 5 E N D R< r e g i s t e r - n a m e ( < d efined- oper ator > < expr ession) *)
< e x p r e s s i o>n - >
( N E )l ( P C ) ( r e g i s t e r - n a m>e) | (GETR ( s t r u c t u r es c h e m > (BUILDQ a < e x p r e s s i o>n * ) | ( < defined-operator > *) ) ( expression
Figure 4. BNF Specification of ATN grammar notation: NE - next element, PC : parsed constituent, GETR - get contents of a register.
Augmentations on an arc indicate further conditions under which the arc may be taken and actions to be performed when the arc is taken. A (VERIFY (condition)) operation will block the transition if the condition is not satisfied. An assignment operation ( + ) will set a register to the value of the specified expression (this operation is known as SETR in most ATN specification languages). A SENDR action specifiesan initial value to be used for a register in a subordinate invocation about to be initiated by a PUSH arc (SENDR only makes sense on a PUSH arc and is executedbefore the subordinate computation is begun). In addition, one can define other operators that can abbreviate complex''manipulations of register contents and complex conditions under which to abort computation paths. In experimental parsing implementations, one can even send information to the parsing algorithm and/or manipulate its agendasand tables. The expressionsused in register assignments and as arguments to other actions can accessthe next element of the input string via the function NE' accessthe parsed constituent on a push arc via the function PC, accessthe contents of registers using GETR, and build structures by substituting the values of other expressions into open positions in a specified schematic structure (e.9., using BUILDQ, a primitive form of the LISP "back quote" operation). One can also invoke defined structure-building operators that encapsulate complex register manipulations and/or accessto other information outside the ATN (such as potential antecedenttables for interpreting pronouns). The parsed constituent function (PC) refers to the constituent returned by a subordinate network invocation (on a PUSH arc). LinguisticExperimentation ATNs have been used to explore a variety of issuesin linguistic theory relating to extending the abilities of grammars to specify difficult linguistic phenomena and to parse them efficiently. A number of experimental explorations are described in Ref. 14 including: 1. VIR (virtual) arcs and HOLD actions for dealing with "left extraposition" transformations such as those that move the relativtzed constituent; from its logical place in the structure of a relative clause to the position of the relative pronoun at the beginning of the clause (e.g.,"the man that I saw," "the man that Mary said ran away.") A HOLD action can make an entry on the stack when the extraposedconstituent is found, which then enablesa matching VIR arc to use the extraposedconstituent from the stack at the position where the grammar would normally expect it. This stack entry will also block the acceptanceof the phrase until some VIR arc has used the held constituent. 2. RESUMETAG and RESUME actions for dealing with "right extraposition" transformations that leave dangling modifiers that logically belong with constituents that have been fronted or otherwise moved to the left. For example,in "What papers has Dan Bobrow written that are about natural language?" the relative clause "that are about natural langua ge" clearly modifies the questioned noun phrase "what papers" but is not adjacent to it. A RESUMETAG action can be executed before popping a constituent that the grammar writer knows could have been moved to the left, away from a detached right-extraposed modifier. This
AUGMENTED TRANSITION NETWORK GRAMMAR, enables such a constituent to be reentered by a RESUME action at any point where dangling modifiers might occur, enabling the resumed constituents to consume any modifiers that it can accept at those points. 3. Selective modifier placement for dealing with the ambiguous scoping of movable modifiers such as prepositional phrases(e.g.,"I saw the man in the park with a telescope"). A special pop arc (SPOP) causesmanipulation of the parser's agendasand stacks to determine all of the placeswhere a given movable modifier might be attached. These are then evaluated to determine which is the most likely candidate given a set of semantic preference criteria. The most preferred alternative is then pursued and any others are saved on the agenda to be pursued at a later time if necessary. 4. A metagrammatical conjunction facility for handling a wide variety of conjunction constructions, including reduced conjunctions that result in apparently conjoinedsentence fragments. For example, "Give me the best methods to grow and properties of alkali iodates" involves an apparent conjunction of the fragments "best methods to grow" and "properties of." A special SYSCONJ action, invoked on specialactive arcsassociatedwith the conjunctionsAND and OR, trigger a complexmanipulation ofthe agendasand parsing configurations of the ATN so that the parsing of the sentenceup to the occurrenceof the conjunction is temporarily suspended, and some earlier configuration is restarted to parse the string beginning after the conjunction. When the restarted configuration has completed the constituent it was working on, the suspendedconfiguration is resumed in a special mode to complete its corresponding constituent on some tail of the constituent just completed. After this, the two coordinate constituents are conjoined and the two separateconfigurations merged to continue the parsing. (This produces an analysis of the above example equivalent to "Give me the best methods to grow alkali iodates and the properties of alkali iodates" by conjoining two noun phrase constituents). A schematic charactenzation of the phenomenon in question is that a string of the form "r lc u and u y t" can be analyzed as equivalent to "r s t" where s is a constituent whose structure is a conjunction of the form "bc u,/l and lx u yl."
329
are to be done when the parse returns to that level and a set of register contents to be used by those actions). As pointed out above, an RTN is equivalent in generative power to a context-free grammar or pushdown store automaton. Adding augmentations to make an ATN producesan automaton that is equivalent in power to an arbitrary Turing machine if no restriction is imposed on the conditions and actions on the arcs. This is useful in the sensethat one can be confident that any linguistic phenomenon that might be discoveredcan be characterizedwithin the formalism but has the disadvantage that one cannot guarantee that the sentences acceptableby such a grammar would be a decidableset. However, there are simple restrictions on an ATN (14) that guarantee a decidable grammar model. If one blocks infinite looping and restricts the conditions and actions on the arcs to be totally recursive (i.e., decidable),then the resulting automaton will be totally recursive. The loop-blocking restrictions merely amount to forbidding closed loops of nonconsuming arcs (such as JUMP arcs) and forbidding arbitrary "looping" of self-embeddingsingleton recursion (pushing for a single constituent, which in turn pushes for a single constituent, and so on, potentially without limit). These two mechanismsare the only ones that would let an ATN parser compute for an arbitrary amount of time without consuming anything. Perrault (27) gives a restricted class of ATNs, equivalent to finite-state tree transducers, that are known to lie within the power of a context-sensitivegrammar (a decidableclass).Finally, although the proof has not been published,this author has shown that restricting the conditions and actions of an ATN to be primitive recursive, coupledwith the loop-blocking restrictions described above, results in a parsing automaton that is itself primitive recursive (a powerful subclassof totally recursive functions). The interesting thing about this result is that almost any "sensible" ATN grammar that anyone would write automatically satisfies these restrictions so it is reasonable to think of both ATN grammars and natural English syntax as lying in the realm of primitive recursive computation. The ATN Perspective
One can think of ATNs as an efficient, abstract parsing automaton that can serve as a unifying underlying model for a variety of different high-level syntactic specification languages. FormalPropertiesof ATN Grammars For example, Swartout (28) has shown that Marcus's pARSIIn the face of various implementations of ATN parsers and FAL (29) can be viewed as a specialized ATN, and one can different formulations of the specification language for ATN think of lexical functional grammars (4) as a high-level specifigrammars, it is important to remember that the essenceof an cation language that could be parsed by an underlying ATN ATN is an abstract formal automaton in a class with finite- whosebasic arc action is a kind of "unification" ofsets of equastate machines, pushdown store autom ata, and Turing ma- tions. chines (qr). Such automata are typically defined by specifying Moreover, the operational semantics of definite clause the structure of an instantaneous configuration of a computa- grammars (qv) (30) executedin PROLOG is almost identical to tion and specifying a transition function that expressesthe a standard top-down, left-to-right parser for a special class of relationship between any instantaneous configuration and ATN whose states correspondto the "joints" between the subthose that can result from it in one "step" of the computation. goals in a rule and whose registers are the variable bindings of A nondeterministic automaton is one in which the transition the environment. function determines a set rather than a single next configuraViewed as ATNs, definite clause grammars use a powerful tion. From this perspective an ATN can be defined as an au- unification operator as a universal condition-action, whose tomaton whose instantaneous configurations record the posi- effect is to establish bindings of registers (variables) to struction in the input string, the name of the state that is currently tures. (These structures may in turn contain variables that active, a set of register contents, and a stack context (a list of point to other structures). Alternatively, one could use only stack entries each of which recordsthe push arc whose actions one register to contain the PROLOG environment as a list of
330
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
bindings. The action associatedwith a final state is to return the variable bindings that were established in the embedded constituent to the higher level environment that pushed for it (invoked it as a subgoal). This requires PROLOG's ability to effectively rename variables when pushing for a constituent (invoking a subgoal) in order to keep the bindings straight, and uses an open-endedset of register names, but otherwise the mechanism is very like a direct implementation of an ATN automaton. From this point of view, a definite clause grammar can be seen as more like an augmented phrase structure grammar than a full ATN sinceit doesnot exploit the ability of its states (the "joints" between the subgoals) to support arbitrary repeatability and alternative subsequencesof transitions (subgoals).Rather, such phenomenawould be handled by creating new kinds of phrases. From the ATN perspective one can see a deep similarity between definite clause grammars and lexical functional grammars in the way that the equations of LFGs are used to add constraints to an environment similar to the variable bindings of DCGs. One major difference seemsto be the way LFGs use accesspaths through the functional structure in place of some of the things DCGs would do with variables. LFGs thus appear to avoid the need to rename variables. Otherwise, both have a similar emphasis toward specifying syntactic facts in the form of constraints on attributes of phrases that are then realized by some form of unification. The abovediscussionis one example of the way that one can use the perspective of an abstract ATN automaton to understand a variety of different parsing formalisms and syntactic specificationnotations. Without such a perspectiveit would be difficult to see a similarity between two formalisms whose surface presentation is as dramatically different as DCGs and LFGs. Coupledwith an understanding of the formal properties of various restrictions on the conditions, actions, and transition structure of an ATN, this perspectivecan also shed light on the expressivepower of other formalisms.
(SPLITS). This parser contains the experimental linguistic capabilities described above and a fairly powerful trace facility capableof producing a detailed record of the individual stepsof an ATN analysis of a sentence. The generalization of Earley's algorithm for RTNs, discussedabove, can be extended in a natural way to a general ATN parser (though not maintaining Earley's n3 time bound results if nontrivial use is made of the registers). In general, most of the parsing algorithms for context-free grammars have analogous versions for RTNs and can be extended to handle ATNs. Other implementations of ATN parsers include three middle-out parsers for ATNs used in the context of speech-understanding systems: One by Bates (31), one by Paxton (26), and one by this author (32). These are bottom-uP, data-directed parsers that can begin in the middle of a sentenceand work upward and outward in either direction. The Bates parser is capable of working on several different parts of the utterance as part of a single hypothesis. The Paxton parser provided an especially clean restricted form of ATN grammar specification (although he did not characterize it as one).The Woodsparser constructs an index that recordsfor any pair of states whether they can be connectedby chains ofjuffiP, push, and pop transitions used to quickly determine whether a new word can be connectedto an existing island and to guide the computation that establishes such a connection. One can also implement ATN grammars in languagessuch as PROLOG in a style similar to Pereira and Warren (30), where the unification and backtracking capabilities inherent in the language can be exploited to reduce (or even eliminate) the effort of writing a parsing algorithm. Finally, one can compile ATN grammars into object code that efficiently implements a combination of the parser and the grammar (33), a technique that has producedparsing programs that are roughly 10 times faster than a general ATN parsing algorithm interpreting a grammar. Misconceptionsabout ATNs
ATN Parsers A variety of different parsing algorithms have been implemented for ATN g3ammars. The most straightforward is a simple top-down, depth-first, backtracking implementation of the ATN as a parsing automaton. A slightly more powerful implementation is describedin Ref. 19. The main implementation technique is to create a data structure correspondingto an instantaneous confi.guration (ic) of an abstract ATN automaton and to implement the abstract transition function of the automaton as a procedurethat computesthe successoric's of a given ic. The ic's of the LUNAR parser are extendedfrom the formal definition above to include a "weight" expressing a degree of goodnessof the parse so far (allowing grammars to specify d.egreesof grammaticality via actions on the arcs that adjust the weight), a hotd list (for the HOLD-VIR mechanism described above),and a historical path (used for the experimental SYSCONJ features describedabove).By the setting of various flags, this parser is able to pursue parses according to a variety of control strategies including depth first, breadth first, best first, and a variety of combinations of depth first with priority ordering. There are also some special casessuch as pursuing small identified sets of alternatives in parallel
ATNs are frequently seen in different ways by different people. A common misconception is the belief that ATNs are strictly top-down, Ieft-to-right parsing algorithms. Another is that an ATN is specifiedin LISP or contains LISP codeor can only be written in LISP. As the preceding discussion makes clear, many of these beliefs are incorrect. ATNs can be defined as abstract autom ata, independent of any proglamming language, and can be implemented in a variety of progTamming languages.Similarly, many different parsittg algorithms have been implemented for ATN grammars, including bottom-up and even middle-out parsing algorithms. Another common misconception is that ATNs cannot handle unordered constituents (i.e., sequencesof constituents whoserelative order is unspecified)without enumerating all of the possibleorderings. Such phenomena,in fact, are routinely handled by use of self-looping arcs' as shown in Figure 5. In FigUre 5 three arcs accept locative, time, and manner adverbial phrases in arbitrary order at the end of a verb phrase. Conditions on the arcs restrict the parse to not more than one of each kind. (This could be relaxed to permit more than one manner adverbial, for example, by removing the VERIFY condition on that arc.) A11three of these adverbials are optional. If one or more such constituents were to be oblig-
GRAMMAR,AUGMENTEDTRANSITIONNETWORK P U S HL O C A T I V E( V E R I F Y( N O T ( G er n I O C ) ) ) L O Ce - ( P C )
P U S HT I M E ( V E R I F Y( N O T ( G E T RT I M E ) ) ) T I M E + - -( P C )
POP
(VERIFY (NOT(GETR PUSHMANNER MANNER))) MANNER<--(PC) Figure 5. Illustration of unordered constituents: LOC = locative, PQ = parsed constituent, Yp = verb phrase, V : verb, POP = end ofphrase.
atory, a condition could be addedto the JUMP arc to block the transition if the appropriate registers are not set. One could expressweak constraints on the order in which these modifiers could occur (e.g.,not allowing manner adverbialsto occurafter time adverbials) by placing conditions on someof the arcs that would block them if certain registers had already been set or had not been set (e.g.,adding a condition INOT (GETR TIME)I to the PUSH MANNER arc). Figure 5 also illustrates how oneshot self-loopscan be used to indicate optional constituents by conditioning the arc on the emptiness of the register that the arc sets. GeneralizedTransitionNetworks ATN grammars are very effective for specifying complex grammars of natural language as well as for a variety of other structured entities. One can think of them as a class of abstract perceptual automata for recognizing structured sequencesof elements.To capture this insight, d generalization of ATN grammars has been formulated, called generalized transition networks, or GTNs (34). The idea of a GTN stems from the following observation: In an ATN the set of transitions leaving a given state of the network does double duty, both specifying the alternative possible next states that one can reach as a result of measuring additional information about the input utterance as well as specifying implicitly that the measurement is to be made immediately to the right of the previous measurement. Thus, in following a sequenceof arcs through an ATN grammar, one is both following a sequenceof tests and hypothesis refinements in an abstract search space and also following the left-to-right sequenceof constituents in the input sentence. For many potential applications the input to recognition is not simply a linear sequenceof symbols. In such casesthe above characterization of a sequenceof information-gathering activities is still desirable even though the idea of a left-toright sequenceof constituents does not make sense.A GTN provides the appropriate automaton for such applications by keeping the general state transition structure of an ATN but removittg the implicit assumptions about the kind and loca-
tion of the information-gathering operationsthat causetransitions. When following a sequence of transitions through a GTN, one will still be following the sequenceof hypothesis refinement operations, but there witl no longer be an implicit left-to-right assumption about the successivemeasurements. Rather, explicit instructions at the state nodes or on the arcs will indicate how successivemeasurementsrelate to previous ones,and registers can be used to keep track of the positions of measurements and reference points in an arbitrary "perceptual space." For example, such GTNs could be used to parse two-dimensional mathematical equations, to analyze visual scenes,or to perform knowledge-basedperceptual tasks such as medical diagnosis. CascadedATNs One of the long-standing problems in natural-language understanding has been dealing with the interaction of syntactic and semantic information. Ways of achieving closeinteraction between syntax and semantics have traditionally involved writing semantic interpretation rules in 1 : 1 correspondence with phrase structure rules (e.g.,Ref. Bb), writing ,,semantic grammars" (qv) that integrate syntactic and semantic constraints in a single grammar (e.g., Ref. BG),or writirrg ad hoc programs that combine such information in unformalized ways. The first approach requires as many syntactic rules as semantic rules and hence is not really much different from the semantic grammar approach. The third approach, of course, may yield some level of operational system but does not usually shed any light on how such interaction should be organized and is difficult to extend. The semantic grammar approach,though effective,tends to miss generalizations, and its results do not extend well to new domains. It misses syntactic gen eraltzations, for example, by having to duplicate the syntactic information necessary to charactenze the determiner structures of NPs for each of the different semantic kinds of NP that can be accepted.Likewise, it tends to miss semantic generalizations by repeating the same semantic tests in various placesin the grammar where a given semantic constituent can occur.
332
GRAMMAR,AUGMENTEDTRANSTTTON NETWORK
An extension of the ATN formalism, called a cascadedATN (CATN) (12) addressesthis problem. A CATN is essentially a sequenceof ATN transducers with each successivemachine taking input from the output of the previous one. Specifically, a CATN is a sequenceof ordinary ATNs whose arc actions include operations that transmit elementsto the next machine in the sequence.The first machine in the cascadetakes its input from the input sequence,and subsequentmachines take their input from the transmit operations of the previous ones. The output of the machine as a whole is the output of the final machine in the cascade.Feedbackfrom later stagesto earlier ones is provided by an implicit filtering function that causes paths of the nondeterministic computation to die if a later stage cannot accept the output of an earlier one. A cascadeof ATNs provides a way to reduce having to say the same thing multiple times or in multiple places,provides efficiency comparable to a semantic grammar, and maintains a clean separation between syntactic and semantic levels of description. It permits the decompositionof an ATN grammar into an assemblyof cooperatingATNs, each with its own characteristic domain of responsibility. One instance of a CATN parser is R. Bobrow's RUS parser (37). Another is the interaction between the lexical retrieval componentand the linguistic componentof the HWIM speechunderstanding system (24,25).
Harvard University, December 1969. (Available from ERIC as ED-037-733; also from NTIS as Microfiche PB-203-527.) L4. W. A. Woods, An Experimental Parsing System for Transition Network Grammars, in R. Rustin (ed.), Natural Language Processing,Algorithmics, New York, L973. 1 5 . J. Thortr, P. Bratley, and H. Dewar, The Syntactic Analysis of English by Machine, in D. Michie (ed.),MachineIntelligence,Vol. 3, American Elsevier, New York, 281-309, 1968. 16. D. G. Bobrow, and J. B. Fraser, An Augmented State Transition Network Analysis Procedure, Proceedings of the First International Joint Conferenceon Artificial Intelligence,Washington, DC, pp. 557-567,1969. L7. W. A. Woods, Procedural Semantics for a Question-Answering Machine, AFIPS ConferenceProceedings,Fall Joint Computer Conference,457-47 1, 1968. 18. W. A. Woods, Semantics for a Question-AnsweringSystem, Garland Publishitg, New York, 1979. 19. W. A. Woods,R. M. Kaplan and B. L. Nash-Webber,The Lunar Sciences Natural Language Information System: Final Report, BBN Report No. 2378, Bolt Beranek and Newman, Cambridge, MA, June L972. (Available from NTIS as N72-28984.) 20. J. Earley, An Efficient Context-free Parsing Algorithm, Ph.D. Thesis, Department of Computer Science,Carnegie-Mellon University, Pittsburgh, PA, 1968. 2L. R. F. NcNaughton, and H. Yamada, "Regular expressionsand state graphs for automata," IRE Trans. Electron. Comput. EC-9, 39-47 (1960).
22. G. Ott, and N. H. Feinstein, "Design of sequential machines from their regular expressions,"ACM 8(4), 585-600 (1961). 23. N. Chomsky,Formal Propertiesof Grammars,in R. D. Luce, R. R. An 1. W. A. Woods,Progressin Natural LanguageUnderstanding: Bush, and E. Galanter (eds.),Handbook of Mathematical PsycholApplication to Lunar Geology,AFIPS ConferenceProceedings, og!, Vol. 2, Wiley, New York, 1963. and Exposition,1973. Vol. 42, National ComputerConference 24. W. A. Woods,M. Bates,G. Brown, B. Bruc€,C. Cook,J. Klovstad, 2. R. M. Kaplan,"Augmentedtransitionnetworksas psychological J. Makhoul, B. Nash-Webber,R. Schwarts, J. WoIf, ad V. Zue, Artif. Intell. 3(2), 77-L00 modelsof sentencecomprehension," SpeechUnderstanding Systems-FinalReport, October 30, 1974 to (Le72). October 29, L976,BBN Report No. 3438,Vols. I-V, Bolt Beranek and Newman, Cambridge, MA. 3. J. E. Grimes(ed.;,NetworkGrammars)SummerInstitute of Linguistics,Universityof Oklahoma,Norman,Oklahoma,L975. 25. J. J. Wolf, and W. A. Woods,The HWIM SpeechUnderstanding System, in W. A. Lea (ed.;, Trends in SpeechRecognition, Pren4. J. W. Bresnan (ed.), The Mental Representationof Grammatical tice-Hall, EnglewoodCliffs, NJ, pp. 316-339, 1980. Relations, MIT Press, Cambridge, MA, L982. 5. T. Winogfad, Language a,sa Cognitiue Process,Vol. 1, Addison- 26. W. H. Paxton, A Framework for SpeechUnderstanding,Technical Note !42, Artificial Intelligence Center, Stanford ResearchInstiWesley, Reading, MA, 1981. tute, June 1977. (Ph.D. Thesis, Stanford University.) G. H. Tennant, Natural Language Processing,Petrocelli, Princeton, 27. C. R. Perrault, "Augmented transition networks and their relaNJ, 1981. tion to tree transducers,"Inf. Sci. 11, 93-119 (1976). NetTransition 7 . M. Bates, The Theory and Practice of Augmented 28. W. Swartout, A Comparisonof PARSIFAL utith Augmented Tranworks, in L. Bolc (ed.),Natural Language Communication with sition Networks,A.1. Memo 462,MIT, Artificial IntelligenceLabocomputers,springer-verlag,Berlin, 191-259, 1978. ratory, Cambridge,MA, March 1978. Cam8. N. Chomsky, Aspectsof the Theory of Syntax, MIT Press, A Theory of SyntacticRecognitionfor Natural LanMarcus, M. P. 29. 1965. MA, bridge, guage, MIT Press,Cambridge,MA, 1980. g. W. A. Woods, "Transition network grammars for natural Lan30. F. Pereira and D. Warren, "Definite clause grammars for language analysis," CACM 13{J0),591-606 (1970). guage analysis-A survey of the formalism and a comparisonwith 10. W. A. Woods, Spinoffs from SpeechUnderstanding Research,in augmented transition networks," Artif. Intell, 13, 231-27 8 $980). the of Proceedings in AI, Understanding Panel Sessionon Speech 31. M. Bates, Syntactic Analysis in a SpeechUnderstanding Syst€D, Fifth International Joint Conference on Artificial Intelligence, Ph.D. Thesis,Harvard University, Cambridge,MA, August L975. Cambridge, MA, August 22-25, p. 972, 1977' Recognifor Situation W. A. Woods,Language Processingfor SpeechUnderstanding, in Structures 32. Lattice Taxonomic 11. W. A. Woods, F. Fallside and W. A. Woods (eds.),Computer SpeechProcessing, tion, in TINLAP-2, Conferenceon Theoretical Issues in Natural Urbana-Chamat Illinois of Prentice-Hall, EnglewoodCliffs, NJ, 1985. University Language Processing-2, paign, pp. 33-41, July 25-27, 1978. lAlso in AJCL 3, Microfiche 33. R. Burton and W. A. Woods,A Compiling System for Augmented 78 (1e78)1. Transition Networks, in Sixth International Conferenceon Com' grammars," putational Ling. Linguistics (COLING 76), Ottawa, Canada,June, 1976, Am. J. Computat. ATN "Cascaded 12. W. A. Woods, pp. 65-83. 6(1), 1-1.5(1980). 34. W. A. Woods,Generalizationsof ATN Grammars, in W. A. Woods 18. W. A. Woods,Augmented Transition Networks for Natural Lanand R. Brachman, Researchin Natural Language Understanding, guageAnalysis, Report No. CS-l, Aiken Computation Laboratory,
BIBLIOGRAPHY
CASE GRAMMAR, BBN Report No. 3963, Bolt Beranek and Newman, cambridge, MA, 1978. Bb. F. B. Thompson, The Semantic Interface in Man-Machine Communication, Report No. RM 63TMP-35,Tempo, General Electric Co., Santa Barbara, CA, September1963. BO. R. R. Burton, Semantic Gramrrlar: An Engineering Techniquefor Constructing Natural Language Understanding Systems, BBN Report No. 3453, Bolt Beranek and Newman, Cambridgu, MA, December1976. 37. R. J. Bobrow, The RUS System,in B. L. Weber and R. J. Bobrow (eds.),Research in Natural Language Understanding, Quarterly Technical Progress Report No. 3, BBN Report No. 3878, Bolt Beranek and Newman, Cambridge,MA' July 1978g8. R. Reichman, Getting Computers to Talk Like You and Me: Discourse Context, Focus, and Semantics (An ATN Model), MIT Press,Cambridge,MA, 1985. gg. S. C. Shapiro, "Generalized Augmented Transition Network Grammars for Generation from Semantic Networks," The American Journal of Computational Linguistics 8(1), L2-25 (JanuaryMarch, 1982). 40. S. Ginsberg, The Mathematical Theory of Context-FreeLanguages, McGraw-Hill, New York, NY, 1966. W. A. Woons Applied Expert Systems
GRAMMAR,CASE
333
Structural features together with lexical and morphological information (seeMorphology) indicate the semantic role each NP plays in the meaning of the sentence.One can thus determinl that sentence 1 describes an event of kicking in which Susan is the kicker, the agent; the football is the kickee, the object; and her foot is used to perform the kicking, the instrument. Another senseof "case" (also called "deep case,""semantic case," or "theta role") is a categortzation of NPs according to their conceptual roles in the action described by a sentence. Conceptual roles are independent of the particular verb or predicate being expressed.The agent case,then, is a generalization of many ideas:kicker, reader, walker, and dancer;one who performs an action. Deep casetheory and issues are also describedbelow. Much of the discussionof deepcaseshas focusedon identifying a small number of these conceptual roles that can be used to describethe meaning of any sentencein any language. Becausedeep casesdescribe meanings rather than the words and structure that expressthose meanirgs, they are claimed to be language independent. Such a set of cases is called a case system. SurfaceCases The introduction discussed categorizing nouns according to their endings or inflections. For the purposes of natural-language processing,it is more useful to define surface case as a general syntactic categonzation of noun phrases.Another way to think of surface case is as a property that is assignedto an NP manifested in the sentence as some syntactic marker or signal, called a casemarker. Various linguistic elements can be casemarkers. The primary one is the caseaffix, that is, an ending attached to a noun form. Many would considerthat prepositions (or postpositions) serve a similar function. Word order, as in English, can also be viewed as a case marker. In addition, case assignment interacts with such features as gender and definiteness of the NP. This view of case,then, generalizesthe notion of surface case from simple noun inflections to a property that all NPs have and that may be expressedwith word endings, word order, and so on. How many distinct surface cases are there? One way to determine this is to consider a language in which casesare expressedby nominal inflections. In Latin, for example, five or six cases are usually distinguished: nominative, accusative, genitive, dative, ablative, and sometimes vocative. But simply identifying surface casesis not that helpful in processing natural language since surface cases are merely signals for which deep caseto assign. In other words, for each conceptualrole one needsto account for the casemarkers that identify it. The degree to which a case-basedtheory can account for linguistic behavior depends on the way the cases mediate between surface forms and conceptual structures.
This entry examines the linguistic notion of caseas it applies to natural-language processing.Case theory suggestsan approach to the representation of sentencemeaning and is important in accounting for the way the structure of sentences relates to those meanings.Applications of casetheory to intelIigent systemshave ranged from a medical model of glaucoma (see Medical advice systems) (1,2) to speechunderstanding (qv) (3,4). Most natural-language systemsmake use of these ideas in some form. For a survey of implemented systems using case grammar (qv) see Ref. 5. The problem of meaning representation appears in discussions of deep structures (gv) for natural-language utterances and storage structures for AI programs. General issuesof efficiency, flexibility, scope,and grain all need to be considered (6-9). The focus here is on a particular class of such representations, namely, case structures for natural language. The notion of "case" has been used to refer to several reIated concepts.Traditionally, it has meant the classificationof nouns according to their syntactic role in a sentence,signaled by various inflected forms. In English, only pronouns have these case inflections. For instance, the first person singular pronoun is c(I" (nominative case), "me" (accusative/objective case)accordingto its use as case),or "my" (genitive/possessive subject, object, or possessivearticle. In languages such as Greek all nouns are given affixes that indicate their case. The idea of a direct relationship between inflections and casesis one kind of case, also called "surface" or "syntactic level" case,discussedfurther below. However, in understand- Deep Casesand GrammaticalExplanation ing language, it is not sufficient to recognizethe syntactic role The notion of deep cases is not new. For instance, Sonnenof noun phrases (NPs). For example, in the sentence cheins's demand that cases "denote categories of meaning" 1. Susan kicked the football with her foot. (10) is in effect a statement that there are two levels of cases, each NP has a syntactic role: Subject-Susan, direct objectthe surface level indicated by case affixes and a deeper level the football, and object of the preposition "with"-her foot. that may be common to more than one language. Fillmore (11)
334
GRAMMAR, CASE
presents a good argument for the universality of deep casesin natural langu&ge,saying that: "What is neededis a conception of base structure in which case relationships are primitive terms of the theory and in which such conceptsas 'subject' and 'direct object' are missing. The latter are regarded as proper only to the surface structure of some (but possibly not all) languages." Becausedeepcasesfocuson (conceptual)events rather than on syntactic constructions, they can help explain the relative "acceptability" of certain sentences.For example, one concept of the event "kicking" is that in sentence 1. This conceptencompassessuch notions as agent, object, instrument, location, and so on. Knowledge of this concept, along with an understanding of conceptssuch as "football" and "foot," give an account of how to understand sentence 1. At the same time it leads one to question sentencessuch as:
Otherwise, it can be consideredungrammatical or at least as grounds to reinterpret the event. Some language-understanding systems use caseframes for semantic checking. A parser must check that the features of nominal constituents in the sentencesatisfy the selection restrictions for the verb. The case frame may help to disambiguate among senses of the verb; either the case structure or the selection restrictions will distinguish the two senses.Furthermore, the selection restrictions can help the system identify the referent of a pronoun. For instance, consider sentence 1. The indicated casesare [agent, object, instrument], each of which are present in case frame 5, and the required object case is present. Susan is animate; the football and her foot are physical. Thus, sentence1 can be easily mapped into the case structure for "kick." Sentence 2 also indicates an acceptable case structure, [agent, object], but a new idea is not a physical object. Since the selection restrictions for the object slot are violated, the 2. Susan kicked the new idea. sentenceis less easily mapped into the caseframe and is hence 3. Susan kicked. less comprehensible.In contrast, sentence 3 obeys the selec4. Susan and her foot kicked the football. tion restrictions of caseframe 5. Susan, the agent, is animate. However, its indicated case structure, lagentl, does not conSentence 2 seemsstrange becausethe senseof "kick" used tain the object case required by case frame 5. Thus, it too is 3 seems object. Sentence here seems to require a concrete problematic. strange because the object of the kicking needs to be menFor sentence4 the casestructure seemsto be lagent/instrutioned explicitly. Strong clues from the discourse as to what ment, objectJ.Although either the agent or the instrument can was kicked (or a different interpretation of "kick") are needed the subject of a kicking sentence,a casecannot be assigned be to make the sentencecomprehensible.Sentence4 is also unacwhen they are conjoined. them to ceptablebecause,although either "susan" or "her foot" or "SuOtten, discourse information can affect sentence undersan and Joe" could be the subject of the sentence,objectsthat (seeDiscourseunderstanding). If sentence2 follows a standing play different roles in the meaning of the sentencecannot be discussionof Susan's invention that would not work, the conconjoined. text allows the object case to be interpreted as a physical obTheseideas can be formalized by postulating for each verb a ject. Or, suppose one describes Susan running toward a footcaseframe consisting of two elements: ball and utters sentence 3. In that situation one could infer that the object is the football. Case structure: What are the case slots or set of casesthat Note that in neither of these situations has the casestruca for example, play a role in the event denoted by the verb, ture or selection restrictions been violated, but, rather, the "kicking." Which of these slots are optional, which are obcontext provides information that is missing from the sentence ligatory? in isolation. Some language-understanding systems allow elSelection restrictions: What are the semantic constraints lipsis of the obligatory slots in case structures, that is, if there on the objects that filI each slot in the case structure? is no filler, the system looks for recent NPs to fill the slot. Selection restrictions may vary from global constraints on the use of a casewith any predicate (e.g.,"every agent must be animate") to local constraints on the use of a casewith a par'spend' must be a reticular predicate (e.g., "the object of source"). So, for kicking, one might infer a caseframe with the following slots and restrictions: 5. [{agent}: animate object, object: physical object, {instrument}: physical object, {source}:location, {goal}: locationl.
Deep Casesand MeaningRepresentation Underlying the discussion thus far is the idea that peoplehave a generic conceptof an event such as kicking. A sentencesuch as 1 serves to describe a particular instance of such an event. That is, events are the primary entities under discussion.Of course, not everything communicated is a description of an event. Objects, states, and perhaps other entities are also described. However, events are of fundamental importance, and it is often useful to see state descriptions and even objectsas special types of events. To represent the concept of kicking, imagine a unary predicate, kicking*, which can determine whether an event is a kicking. To express the existence of a kicking event, one can then quantify over the set of all events:
The curly brackets are used here to indicate that the slot in the case structure is oPtional. As discussed under Surface Cases, the prepositions and 6. (Ex) [kicking* (x) ]. word order in a sentencemay indicate which caseis intended Usually, the kind of event is expressedas a verb, for examfor each NP. If the indicated casespass the appropriate selecple, "to kick." However, an event description can also be realtion restrictions and if they correspondto the casesallowed by ized as an NP. One could say "they prepared the meal" or understand. to easy be the casestructure, the sentenceshould
CASE GRAMMAR, "their preparation of the meal." By choosingevents as primary entities, the semantic similarities among these phrasesis captured naturally. An event description distinguishes a particular event from other events of the same type by specifying various properties or relationships between objects and the event. Each NP denotes an object, and the relationship is conveyedby the NP's syntactic role in the sentence.By asserting several propositions about the event, sentence1, for example, indicates which kicking is being discussed.This set of propositions can be expressedas a conjunction of binary relations: 7. (Ex) lkicking* (r) and agent (x, Susan) and object (x, the football) and instrument (r, her foot) and time (r, past)J.
335
lian (17) to capture the objective aspectof word meaning. The associativelinks between verb concepts(caseframes) and realworld knowledge facilitate inferences made from sentence meanings. Semantic network representations with structured inheritance [e.9., KL-ONE (18)] allow information about the syntactic and semantic regularities among verbs to be shared. Discussions of inferencing and case frame representation can be found in Refs. 9, 19, and 20. One problem is that an indefinite number of properties can be specifiedfor a given event. For example, 11. Becauseher arm hurt, Susan awkwardly kicked the football to Mary in the park rather than throw it. could be represented as
Phrasessuch as "Susan" and "the football" would in general be inadequate representations of the objectsparticipating in the event but are suitable for the purpose here. These relations suggest a formalism for representing sentence meaning. Some understanding systems, assuming a small number of these fixed relations, parse sentencesinto their deep case structure rather than the traditional surface structure parse shown in Figure 1. Because of a class of verbs with related meanings can be used to describe similar events, these verbs share aspectsof their case frames. For instance: 8. Fred bought some pickles from Reuben. 9. Reuben sold some pickles to Fred. A case theory should capture the fact that sentences8 and g describe the same event from a different perspective. The meaning of 8 could be represented as: 10. (Ex) [exchange* (r) and agent (r, Fred) and goal (r, Fred) and object (r, some pickles) and source (r, Reuben)1. The meaning of 9 differs only in that its agent is Reuben. Note that this account requires that the subject have two deep cases.Jackendoff (L2) uses this example to justify his claim that an NP can have multiple deep cases.Systems that make use of these semantic similarities among verbs are described in Refs. 13-15. Identifying the casegeneralizationsfor classes of verbs based on cross-linguistic evidence is the subject of ongoing research (16). One formalism for representing case frames is that of semantic networks (qv). These were originally proposedby euil-
12. (Ex) lkicking* (r) and reason (r, her arm hurt) and agent (r, Susan)and object (r, the football) and time (r, past) and manner (x, awkward) and goal (r, Mary) and location (r, the park) and preference (r, throw it)1. Some of these properties distinguish one event from another, whereas somemerely modify or provide additional information. For instance, the thing Susan kicks seemsmore significant than that she kicks it awkwardly. Unfortunately, the labeling of a property as "distinguishing" or "modifyittg" is rarely obvious. It is not difficult to imagine a context in which the manner in which an event happens is the distinguishing property, and the object of the event is relatively insignificant. The distinction among properties is sensitive to the purposeof the speaker and the beliefs of both speaker and hearer. Nevertheless,there is often a strong intuition that certain properties belong with certain events. One could say that properties vary in their degreeof binding to an event and that those properties that are most tightly bound are the deep cases. CaseSystems Despite the compromisesthat seem necessaryto dichotomize properties of events, there is a strong motivation to do so. By postulating a set of binary relations that represent the distinguishing properties of some generic event, one can define events as structures-known configurations that facilitate parsing (qu) and inference (qv) (L4,20,21-25). The complete set of deepcasesavailable for describing events is called a case system. This section covers only four of the many proposed case systems.
PredicateAgent
lnn
| l/
Kick
/ \
\/
Object
/ \
\
Susan The football
fnstrument T i m e
I
Past
*'l:i",. Figure 1. Surface (left) and deep (right) parse for "susan kicked the football with her foot": S = sentence,INFL: inflection, vP: verb phrase, v = verb, pp = prepositional phrase.
336
GRAMMAR, CASE
Fillmore. Fillmore has proposeda deep structure basedon cases(11,26).A sentence(S) in this deepstructure consistsof a modality plus a proposition: 13.S-> M+P. The modality constituent (M) includes negation, tense, mood, and aspect.The proposition (P) is a tenselessstructure consisting of a verb and cases: 14.P+V+Cr*
Cz*
.+ Cn.
where each C; is a casename that generateseither an NP or an embeddedS. There is a global constraint on rules of the form 14: At lease one casemust be present but no casemay appear twice. Rules 13 and 14 arc argued to be universal. Casemarkers are produced by the language-specificKasus element (K): 15.C;-K+NP. The Kasus element K generatesa preposition, postposition, or caseaffix. One could generalize this notion to a Kasus function, which maps a deep structure proposition into a surface structure clause with possible word order changes. Fillmore shows by example the deep case markers (Kasus functions) of various languages. He also gives some tentative rules for English. For example (27):
(causal-actant Z, theme) (theme) Note that a paradigm consistsof both the casestructure for the verb and constraints on the order of the case fillers. For example, the ergative paradigm says that the theme can never precedethe causal-actant. "Break" is an example of an ergative verb. Thus, 16. 17. 18. 19.
John broke the window with a hammer. John broke the window. The hammer broke the window. The window broke.
are all well formed since in each sentence one of the case sequencesis matched (where "John" is the causal-actant 1, "window" is the theme, and "hammer" is the causal-actant2). Another example is the reflexive-deletion paradigffi, in which the theme is deleted if it matches the causal-actant 1. Thus, "run" may be used in several ways: 20. 21. 22. 23.
John ran to school. John ran a machine. The machine ran. The brook ran.
"The A preposition is by,'the I preposition is by if there is no A, otherwise it is with;the O and F lfactitive case]prepositions are typically zero; the B [benefactivecase]preposition rs for; the D ldative case] preposition is typically /o . "If there is an A it becomesthe subject;otherwise, if there is an I, it becomesthe subject;otherwise the subjectis the O."
In each of the sentencesthere is a theme-John, machine, or brook. The paradigm allows the deletion of the theme if it is the same as the causal-actant. Thus the paradigm is
Fillmore makes an argument for deepcaserelations in analyzing verbs of any language, including English. He has proposed several systems that capture various aspects of the meaning of certain verbs. An example of his casesystemsappears in Table 1. In addition to these casesthere are also other relations "that identify the limits and extents in space and time that are required by verbs of motion, location, duration, etc." (26).
Grimes. Grimes has developeda casesystem to serve as a foundation for discourse analysis (29). The definitions of the casesand their organization reflect his concernwith event and episode representations. Grimes distinguishes between two kinds of generic events each with its own set of roles or deep cases. MotiorVposition events have orientation roles, and changesof state have processroles. In addition, the agent and benefactive roles are common to all events. These casesare shown in Table 2. The following examples (29) illustrate the use of these
Celce. Celce'ssystem (28) is based on five deep case relations: causal-actant, theme, locus, source,and goal. Verbs are classifiedinto paradigms according to the casesequencesthey allow. For example, the ergative paradigm consists of the sequences(for the active voice): (causal-actant1, theme, causal-actant(2) (causal-actantI, theme)
(causal-actant,goal) (causal-actant,theme) (theme)
CASES:
24. 25. 26. 27.
The letter (O) fell to the floor (G) His house (O) is situated on top of a hill (R). The tide (V) floated the oil slick (O) into the harbor (G). This idea (O) came to me (G) from Austin HaIe (S).
Table 1. Fillmore's Case System" Agent (A) Counter agent (C) Object (O) Result (R) Instrument (I) Source (S) Goal (G) Experience (E) o From Ref. 26.
The The The The The The The The
instigator of the event force or resistance against which the action is carried out entity that moves or changes or whose position or existence is in consideration entity that comes into existence as a result of the action stimulus or immediate physical cause of an event place from which something moves place to which something moves entity that receives or accepts or experiences or undergoes the effect of an action
GRAMMAR, CASE
337
Table 2. Grime's Case System' Orientation roles: Object (O) Source (S) Goal (G) Range (R) Vehicle (V) Processroles: Patient (P) Material (M) Result (Re) Referent (Rf) Agentive complex: Agent (A) Instrument (I) Force (F) Benefactive role: Benefactive (B)
The thing whose position or motion is being described The location of the object at the beginning of a motion The location of the object at the end of a motion The path or area traversed during a motion The thing that conveysthe object and moves along with it The thing changed by a processor the thing whose state is being described The thing changed by a processin its state before the change The thing changed by a processin its state after the change The field or object that defines the limitation of a process(as opposedto the thing affected by the process) The one who is responsiblefor an action The tool used in performing an action The noninstigative causeof an action The someoneor something on whom an action has a secondaryeffect
o From Ref. 29.
28. This book (P) costs three dollars (Rf). 29. She (A) makes dresses(P Re) from flour sacks (P M). 30. Fred (A) fixed the engine (P) with this screwdriver (I). 31. Sally (A) handed John (G) the biscuits (O). 32. He (A) parted the rope (P G) with an axe (O I). 33. The girl (P) died of malaria (F). 34. The milk (P) turned sour on me G). 35. We (A) talked about politics (Rf). 36. A breeze (O) came to him (G) from the sea (R).
Actors perform actions. Actions have objects. Actions have instruments. Actions may have recipients. Actions may have directions.
One kind of conceptual structure or "conceptualization" comprisesan act, with its actor, and the relations object,direction, and either recipient or instrument. Each of these relations must be present (except that only one of direction or The casesGrimes distinguishes are strongly influenced by recipient is present). Schank argues that a small number of conceptscorrespondlinguistic, not conceptual, considerations,for example in sening to "primitive acts" can be used to construct meaning r"pr.tence 27 the transfer of the idea is not a physical movement. sentations for most descriptions of events. These primitive Sentence 27 has the same surface form as sentence86, which concepts are simple actions of the kind "move a body part,, is a description of a physical transfer, so the two have similar (MOVE)' "build a thought" (MBUILD), "transfer u physical case assignments. (PTRANS), and "transfer mental information', object" Grimes (29) also suggeststhe possibility of a more tightly (MTRANS). The primitive ACTS together with the conceptual defined role structure based on certain similarities in the cases are the components of meaning representation with a roles: "unique representation" feature (80): "we have required of our representation that if two sentences,whether in the same or "The roles set up for orientation all have counterparts on different language are agreed to have the same meaning, they the processside,and vice versa. Both kinds could be considmust have identical representations.,, ered complementary variants of a single set of roles.,' It is questionable whether such a criterion can be met non"For example, object and patient both identify what is aftrivially. Do distinct utterances (by different speakers using fected, the one in terms of motion or position and the other different phrasings, dt different times, in different situations) in terms of change of state in a process.,' share significant portions of a conceptual network? FurtherThese observations suggest the combined role structure Table 3. Interrelationships among Roleso shown in Table 3. Orientation
Schank.Schank's cases (15,23,24,90)(see Corrceptualdependency), unlike those of Fillmore (11) or Celce (28), are purely conceptual.Neither the primitive act nor its casesneed be explicitly mentioned in an utterance. Instead, the argument for conceptual cases depends on considerations of the pragmatics of human communication. One postulates a conceptual case because it is a relation relevant to the typical kinds of tasks people addressvia language. An essential element of most communication is the description of actions. Knowledge of actions implies a "conceptual structure" built out of actions and their role fillers (80):
-)
Combined A agent Fc force
I instrument
V vehicle O object S source G goal RRange
-+ --) -) +
V vehicle P patient F former L latter Rrange
B benefactive
" From Ref. 29.
e <e e-
P patient M material Rs result Rf referent
more, a nonredundant representation such as Schank's raises serious questions of both psychologicalvalidity and efficiency for diverse tasks. Nevertheless, in many casesthe mapping of utterances to conceptualizations seems to be exactly the processthat humans exhibit. The unique representation also facilitates general inferencing by reducing the number of cases to be considered(30): "The use of such primitives severely reducesthe inferences problem in AI (seeRef. 31), since inference rules need only be written once for any ACT rather than many times for each verb that referencesthat ACT. For example, one rule is that if you MTRANS something to your LTM fiong-term memoryl, then it is present there (i.e.,you know it). This is true whether the verb of MTRANSing was see, hear, inform, remember or whatever. The inference comesfrom the ACT rather than the verb." Conclusion
conla,,Technical Report CBM-TR-3, Computer Science Department, Rutgers University, New Brunswick, NJ, 1971. 3. S. Baranofsky, Semantic and Pragmatic Processing in the SRI Speech Understanding System, Technical Report, Stanford Research Institute, Stanford, CA, I974. 4. B. Nash-Webber, Semantic Support for a SpeechUnderstanding System, In D. G. Bobrow and A. Collins (eds.),Representa,tion and Understanding: Studies in Cognitiue Science, Academic Press, New York, pp 351-382, L975. 5. B. Bruce, "Case systems for natural languagei' Artif. Intell. 6, 327-360 (1975). 6. D. G. Bobrow and T. Winograd, "An overview of KRL, a knowledge representation language:' Cog. Sci. 1, 3-46 (1977). 7. J. Moore and A. Newell, "How can MERLIN understand?" in L. Gregg (ed.;, Knowledge and Cognition, Lawrence, Los Angeles, CA, L973. 8. T. Winograd, Frame Representations and the Declarative/Procedural Controversy, in D. G. Bobrow and A. Collins (eds.),Representation and Understanding: Studies in Cognitiue Science, Academic Press,New York, pp. 185-2L0, L975. 9. C. Sidner, M. Bates,R. Bobrow,R. Brachman,P. Cohen,D. Israel,
In summary, the notion of casehas evolved from an account of B. Webber, and W. Woods,Research in Knowledge Representation noun affixes to an account of how syntactic relations between for Natural Language Understanding: Annual Report, Technical NPs and sentencesmap into deep relations between objects Report 4785,Bolt Beranek and Newman, Cambridge, MA, 1981. and events. Application of these ideas to natural-language 10. O. Jespersen, The Philosophy of Grammnr, Norton, New York, processing has two basic forms, semantic checking and mean1965. ing representation. 11. C. Fillmore, The Casefor Case,in E. Bach and R. T. Harms (eds.), In many language systems a caseframe is associatedwith Uniuersals in Linguistic Theory, Holt, Rinehart and Winston, each verb (and sometimes nouns). In recogltizing the syntactic New York, pp. 1-88, 1968. role of an NP in a sentence,the parser uses the caseframe to L2. Reference32, pp. 34-35. verify that the semantic properties of the NP are consistent 13. G. Hendrix, C. Thompson, and J. Slocum, Language Processing with some casethat can occur in that syntactic position. This via Canonical Verbs and Semantic Models, in Proc. of the Third processcan be used to block a parse path, to reject a sentence IJCAI, Stanford, CA, pp. 262-269, L973. as ungrammatical, and to identify constraints for an ellipsed L4. D. A. Normatr, D. E. Rumelhart, and the LNR ResearchGroup, item or the referent of a pronoun. Explorations in Cognition, Freeman, San Francisco,CA, L975. Deep case systems are an attempt to identify a fixed num- 15. R. Schank, Conceptual Information Processing,North-Holland, ber of conceptual roles that can be used to describeany event. Amsterdam, 1975. Representing deep cases as bin ary relations thus provides a 16. B. Levin et al., Lexical Semanticsin Reuiew,Technical Report 1, formalization of the meaning of a sentence.This structure for Lexicon Project, Center for Cognitive Science,MIT, 1985. describing knowledge has led to extensive researchon seman- L7. M. R. Quillian, Semantic Memory, Technical Report AFCRL-66tic networks for knowledge representation. Out of this re189, Bolt Beranek and Newman, Cambridge, MA, 1966. search, standard techniques for parsing and understanding 18. R. J. Brachman, On the EpistemologicalStatus of Semantic Networks, In N. V. Findler (ed.;, AssociatiueNetworks, Academic have evolved to the extent that most current natural-language Press,New York, L979. form. in some techniques these systems incorporate An important aspect of any case system is an account of 19. E. Charniak, A Brief on Case,Technical Report 22, Instituto per gli studi Semantici e Cognitivi, Castagnola, Switzerland, 1975. how the deep cases are realized in a sentence.Many issues 20. R. F. Simmons, SernanticNetworks: Their Computation and Use related to this accounting remain unresolved, such as whether (32), the to capture how and an NP can have multiple cases fo, (Jnderstanding English Sentences, Technical Report CAI NL-6, Computer ScienceDepartment, University of Texas, Ausregularities in the way the casesare realized (16). tin, TX, L972. Case grammar per se is no longer an active area of research within AI. However, research on the issuesdiscussedhere has 2I. B. Bruce, Case Structure Systems, in Proc. of the Third IJCAI, Stanford,CA, pp. 364-371, 1973. continued in the work on relational grammar, lexical func(qv), glammar phrase 22. A. Martin, Translation of English into MAPL using WinoW. structure generalized grammar, tional grad's Syntatc, State Transition Networks, and a Semantic Case and semantic grammar (qv).
Grammar, Technical Report Internal Memo 11, Project MAC, Automatic Programming Group, MIT, Cambridge, MA, 1973. 23. R. Schank, Finding the Conceptual Content and Intention in an BIBLIOGRAPHY Utterance in Natural Language Conversation,in Proc. of the Sec' ond IJCAI, London, pp. 444-454, I97L. 1. S. Chokhani,The defi.nitionand interpretationof a CausalNetR. Schank, The Fourteen Primitiue Actions and Their Inferences, 24. work Modelfor DiseaseWithin the CHRONOSSystem,Technical Technical Report AIM-183, Stanford University, Stanford, CA, Rutgers Department, Science Computer NIH CBM-TM-17, Report 1973. University,New Brunswick,NJ, L973Shapiro, A Net Structure for Semantic Information Storage, 25. S. S. Weiss,ComputerBasedModels Glau'
2. C. Kulikowski and
for
DEFINITE-CIAUSE 339 GRAMMAR, Deduction and Retrieval, in Proc. of the Seuenth Intern'ational Joint Conferenceon Artificial Intelligence, London, pp. 5L2-523, 1 9 71 . ZG. C. Fillmore, Types of Lexical Information, in D. D. Steinberg and L. A. Jakobovits (eds.),Semantics: An Interdisciplinary Reader, Cambridge University Press, London, pp. 370-392, I97L. 27. Reference11, pp. 32-33 ZB. M. Celce-Murcia, Paradigms for SentenceRecognition, Technical Report HRT-1509217907, System Development Corporation, Santa Monica, CA, L972. Zg. J. Grim es,The Thread of Discou,rse,Technical Report NSF 1, Cornell University, Ithaca, NY, L972. 90. R. Schank,Causality andReasoning,Technical Report 1, Instituto per gli studi Semantici e Cognitivi, Castagnola, Switzerland, L974. 31. R. Schank, N. Goldman, C. Rieger III, and C. Riesbeck,MARGIE: Memory Analysis, ResponseGeneration, and Inference on English, in Proc. of the Third IJCAI, Stanford, CA, pp. 255-26L, 1973. 32. R. S. Jackendoff, Semantic Interpretation In GeneratiueGrammar, The MIT Press, Cambridge, MA, 1972B. Bnucs and M. G. Mossn Bolt Beranek & Newman
GRAMMAR,DEFIN ITE-CLAUSE Research into building grammars for understanding natural language (seeNatural-language understanding) becamemore popular after the introduction of grammar formalisms based on Horn clauses by Colmerauer in 1975 (1). The so-called metamorphosisgrammars (MGs) started a growing interest in expressing linguistic concepts in logi. (qt) and supported the construction of robust front-ends and interfaces (seeNaturallanguage interfaces). The primary applications of this research were the consultation and creation of databases through natural languages, generation of answers and questions, text translation, and text synthesis from formal specifications. The notion of definite-clause grammars (DCGs), a special caseof MGs, was introduced in L978by Pereira and Warren (2) as a grammar formalism for which PROLOG provides an efficient parsing mechanism. Somepractical systemswere architectured around the concurrent application of syntactic and semantic linguistic knowledge to yield a logical structure, also containing the information for semantic interpretation (3-5). Other systems were architectured on more than one translation level; the application of syntactic and semantic knowledge was separated, and the final product was a PROLOG Horn clausewhoseexecution was governedby a planning (qv) mechanism (6). The technique of extraposition grammars was proposedby Pereira (7) to describe certain global relationships or extrapositions, such as the connectionbetween a relative pronoun and its trace. Finally , developmentssuch as the modifier structure grammars of Dahl and McCord (8,9) the tree grammars of Colmerauer, and the puzzle grammars of Sabatier (10-12) increasedthe power to express linguistic concepts. All this research in logic-basedgrammar formalisms was made possible, and easier, by choosingthe programming language PROLOG, itself based on a subset of first-order logic.
Logic Grammars Grammars describe the structure (syntax) of languages through a set of productions (rewriting rules). For example, the rule sentence+ noun-phrase verb-phrase states a relation between three nonterminals: A sentencemay consist of as a noun phrase followed by a verb phrase. Such rules can be mapped into PROLOG clauses in the following way: S3). sentence(Sl,53) :- noun-phrase(S1,S2),verb-phrase(S2, 52) :- connects(Sl,writes, S2). verb-phrase(S1-, connects(l, each, 2). connects(2author, 3). connects(3,writes, 4). (Note: Compound predicates are written in PROLOG with commas.Variables are distinguished from atoms by an initial capital letter.) In this representation numbers are used to indicate the beginning and end of each word: 1€ech2author3 writesa In order to verify that the sentenceis well formed, it is necessary to add the goal ?- sentnece(l, 4). (where ?- is a binary functor supplied by any PROLOG system) and demonstrate that it is provable from the previous clauses. By using a list as the data structure to represent the sentence, numbers are no longer necessarybecausePROLOG has a parsing machinery able to interpret it: ?- sentence([each,author, writes]. [ ]). Definite-clause grammars (DCGs) are an extension of context-free grammars, which can likewise be translated into PROLOG clauses.DCGs allow any logic term to be a nonterminal, and they are built upon logic symbols (atoms,variables, and terms) instead of only atomic constants. They likewise have only one nonterminal symbol on the left side of each rule. Context dependenciesare specified by logic variables within the arguments of grammar symbols. A DCG rule has the following form: nonterminal-symbol + body. where "body" is a sequenceof one or more items separatedby commas. Each item is either a nonterminal symbol or a sequence of terminal symbols. The meaning of the rule is that "body" is a possible form for a phrase of type nonterminal symbol. A nonterminal symbol is written as a PROLOG term (other than a list), and a sequenceof terminals is written as a PROLOG list. In the right side of a rule, in addition to nonterminals and lists of terminals, there may also be sequencesof procedure calls, written within curly brackets ({and}). ttrese are used to express extra conditions that must be satisfied for the rule to be valiC. A nonterminal symbol is translated into an (N + 2)place predicate (having the same name) whose first N argu-
340
GRAMMAR,DEHNTTE-CLAUSE
ments are those explicit in the nonterminal and whoselast two arguments are as in the translation of a context-freenonterminal. Procedurecalls in the right side of a rule are simply translated as themselves. Each grammar rule, such as p(X) -+ q(X). receivesan input string, analyzessomeinitial part, and generates a remainder for further analysis. This particular rule is translated by the PROLOG system into p(X, SO, S) :- q(X, SO, S). Therefore, the PROLOG grammar notation provides a more concisenotation by making the arguments for the input a.nd output strings implicit. When a rule has terminals, they are translated by the predicate connects.For example,
s2rhe rri]1,:ilHlJ;i3ar Xtopoint thatpoint means rule p(X) -+ [older],q(X), thighl. is translated into older, 51), p(X, S0, S) : connects(S0, q(X, 51, S2), high, S). connects(S2,
constructed with a verb ("write"), a noun ("book"), and an article ("a"), may be replacedby the following paraphrase, for a B such that B is (a) book it is true that Chomsky writes B
(/ ) (2)
where (I ) and (2) are elementary statements. This paraphrase is a logical structure that can be written in a shorthand notation: a(B, book (B), writes(Chomsky),B)). Note that statements (/ ) and (2) are translated into formulas "book(B)" and "writes(Chomsky, B)," respectively. The logical structure is the meaning of the sentence,and each of its constituent parts comespondsto the sensesof individual words, accordingto Frege'sprinciple (14). Representations of such meaning are referred to as logical structures since the only aspectsof meanings one knows how to represent rigorously are logical relations. Each article a introduces a three-branched quantifier e, which creates a new formula from a variable x and two formulas fl and f2, q(x, fl, f2) correspondingto the statement for a x such that e1, it is true that e2
Analysisof Natural Language
where e1 and e2 are the elementary statements corresponding to fl and f2. For example, the sentence,
In 1977 Colmerauer introduced a framework (13) for NL analysis that was a crucial step forward and attracted a great Chomsky writes a book for each publisher interest in the use of logic grammars as an alternative to the well-established ATNs (see Grammar, augmented transition constructedwith a verb ("write"), two nouns ("book" and "pubnetwork). From a historical point of view, it may be considered lisher"), and two articles ("a" and "each"),may be replacedby today as a landmark becauseit provided a method for translat- the paraphrase ing natural-language sentencesinto logical structures. The for each P such that P is a publisher it is true that for a B method consistedof considering elementary statements based that B is a book, it is true that Chomsky writes B for P such on proper nouns, each article as a three-branched quantifier, hiquantification governing the and four precedencerules for The sentence is translated into the logical structure erarchy problem. each(P, A short overview of the framework provides further motivapublisher(P), tion for the techniques of writing grammars in logic. For exa(8, ample, the sentence, Chomsky is (a) writer is constructed with a noun and the verb "to be" is translated into the formula writer(chomsky) In general, verbs, adjectives, and nouns introduce properties with n arguments.For verbs, n rrraybe equal to 1 (intransitive verbs) or N + 1 (transitive verbs, where N is the number of complements).For adjectives and nouns z is equal to 1 or greater than 1 (relations, where n is the n-place of its arguments). The arguments represent objects, whose role in a sentence is the complement of a noun' verb, or adjective. For example, the sentence Chomsky writes a book
B,P)) ) 3i:l"t3li'ro'omsky, The logical structure displays the following precedencerule: In a construction involving a noun and a complement of this noun, the quantification introduced by the article of the complement dominates the quantification introduced by the article of the noun. Besidesthis rule, Colmerauer proposedthree more precedencerules to organize the scopeof quantification. ColmerauerAnalysisApplied to a SpecificNatural Language Colmerauer's framework was originally proposedfor French and English. Daht (6) later adapted it to Spanish, and Pique (15) suggesteda different semantics for French articles. The application of the framework to Portuguese was done by Coelho(3).
GRAMMAR,DEFINITE-CLAUSE
Colmerauer'sFrameworkas a DCG Expressing DCGs support the parsing and the translation processesby capturing some of the syntax and the semantics of the naturallanguage subset relevant for the application chosen.The parsing processconsistsof the proof that a string of words is a legal and a well-formed sentence according to the chosen syntax. The proof procedure is achievedby the search strategy (depthfirst, top-down, and left-to-right) and the inference [resolution (qv)1rule behind the PROLOG system. The translation consists of the assignment of a logical structure as the interpretation of every sentence. This structure is composedof wellformed formulas of a certain logical system based on an extension of predicate logic (qv). The translation machinery is expressedas a set of definite clausesof logic through PROLOG glammar rules. It may contain, together or separated,syntactic and semantic knowledge of the subset of the natural language considered.The parsing machinery depends on the PROLOG syst€D, and it may be uncovered by switching on the trace facility. Translation and parsing are independent processes,and therefore grammar changes are made easier. A simplified grammar, called G, is consideredin the following (3). It parses English sentences,and at the same time it producestheir corresponding logical structures. The grammar is defined by two modules, the syntax plus semantics and the morphology, and it covers sentencessuch as
341
noun-phrase(PN,O, O, S0, S) :-connects(S0,PN,S), proper-noun(PN). The first rule of the grammar G, under presentation, allows only for sentencescomprising a noun phrase followed by a verb with possibly some complements.The first grammar rule for complementsadmits their absence(the terminal [ ] stands for the empty list), and the second rule defines the sequenceof complementsas a string composedby complement, & caseand a noun phrase. Different arguments of different nonterminals are linked by the same logic variable. This allows building up structures in the course of the unification process. The noun phrase "a publisher" is parsed and translated by the grammar rule, noun-phrase(N,Oa, Ob) -> article(N, Oc, Od, Oe),
'd;]' o.,od,oe, r:iTilff,l,",t6fl of)).
Note that this rule is a simplified version of fourth rule of the grammar G presented.The nonterminal for a noun phrase has three arguments. The interpretation of the last argument Ob will depend on a property Oa of an individual N becausein general a noun phrase contains an article such as "a." The word "a" has the interpretation Oe, and(Oc,Od)
Hodges writes for Penguin.
in the context of two properties Oc and Od of an individual N. The property Oc will correspondto the rest of the noun phrase contaiiring the word "a," and the property Od will come from SyntaxPlusSemantics the rest of the sentence.Therefore, Oe will contain an overall interpretation, and it is linked to Ob by the same variable. As sentences(S)--+noun-phrase(NP, 52, O), Of is the property of the common noun, it is linked to Oc by the verb([subject-XI L], O1), same variable. Oa has the description of the properties of O2). 01 complements(L, , N, and it will depend on the properties coming from the rest complements(f ], O, O) - [ ]. of the sentence.Therefore, Oa is linked to Od by the same complements(tK-NI Ll, 01, 03) - complements(L,01, C2), variable. case(K), Each word is associated to a property. For example, the noun-phrase(N,02, O3). meaning of the verb "writes" is introduced by the relation "ispublished-by(A,P)." The verb rule contains also information ((A" noun-phrase(N,02, 04) + article(N, 01, 02, O3), regarding the arguments of the relation, namely that ('P" imposes plays the role of subject in the sentenceand that the use of preposition "for." The meaning of the indefinite noun-phrase(PN,O, O) + [PN], {proper-noun(PN)}. article caa"is introduced by the conjunction "and (Of , O2)" according to the definition often adopted in classical logic. articlelA, 01 , 02 and (O1, O2))-- [a]. A more advancedgrammar than G would have more elaborated definitions for nouns, verbs, adjectives,and articles (3), case(for)- [for]. such as: case(direct)-+ [ ]. noun([A-[ ] & author& type-Xl, pr(author(X)))- no(author, A). Morphology
|Lr'o1)' :ffirffi::;,if:3J:til
-' ;::::::"":;l;1, ffii;l: ":;:::ilililil: proper-noun(hodges).
proper-noun(penguin). For example, the rule noun-phrase(PN, O, O) --+ [PN], {proper-noun (PN)}. represents the clause
no(Type,GN) + [Nosn1,{nol(Noun, Type, GN)}. nol(author, author, mas-sin). verb([(G-N)-V&type-X,dir-A-W&title-Y], pr(author(X, Y))) ve(writes, N). ve(Type,N) + [Verb], {vel(Verb, Type, N)}. vel(writes, writes, sin). adjective([A-[]&author&typ€-X, prep(by)- -- [ 1&pub&type-Y], pr(published(Y, X))) -' ad(pub, A).
342
PHRASESTRUCTURE CRAMMAR,CENERALIZED
ad(Type,GN) -- [Adj], {adl(Adj, Type, GN}. adl(published, pub, mas-sin). article(G-sin)-D-X,01, C2, for([X, D] and (Of , OZ),cardinality(X, greater, 0))) art-ind(G-sin). art-ind(mas-sin)+ [a]; [some]. '-'.) (Note; Anonymous variables are written in PROLOG These definitions include syntactic and semantic checks, such as the gender and number and semantic types. The meaning of the article is also different. Instead of a two-branched quantifier, it was introduced by a three-branched quantifier: The first branch for the variable X to be quantified, the second for the general property "and" of X's, and the third for a property (cardinality) to specify and constrain the domain of x's. of DCGs Extensions Extraposition grammars (XGs) extend the power of DCGs to specify context dependencies(7). XG rules may have, on their left side, more than one nonterminal symbol, and a "gap" symexpressesa nonspecified and arbitrary string of bol logic symbols (terminals and nonterminals). For example, the XG rule Relative-market
complement -+ [that].
states that the relative pronoun "that" can be analyzed as a relative marker followed by someunknown phrases and then a complement. XGs simplify the expression of syntactic concepts and therefore allow easier treatments of semantic and logic descriptions. Arguments to nonterminals are used (as in DCGs) for agreement checks, for producing a parse tree, and to restrict the attachment possibilities of postmodifiers. Modifier structure grammars (MSGs) improve the possibility to specify nonsyntactic representations in a clearer way. MSGs simplify the automatic construction of such representations while the analysis is processed(8). Tree grammars (TGs) allow a better handling of condination of linguistic constructions. Puzz\egrammars (PGs) are tools specially oriented toward linguists, where strategy rules describe assembly order and mode, and are specifiedindependently (12). Conclusion Logic grammars have evolved over the years into higher level tools, which allow users to concentrate on linguistic phenomena. Definite-clause grammars support the use of logic for natural-language processing, and they have paved the way for practical linguistic work basedon the programming language PROLOG.
3. H. Coelho,A Program Conuersing in PortugueseProuiding a Library Seruice,Ph.D. Thesis, University of Edinburgh, EdinburBh, U.K., and LNEC, Lisbon, Portugal, 1979. 4. V. Dahl, Un Systdme Deductif d'Intercogation de Banques de Donnds en Espagnol, These de Docteur de Troisibme Cycle, University of Aix-Marseille, Marseille, France, L977. 5. V. Dahl, "Translating Spanish into logic through logic," Am. J. Computat.Ling. 7(3), L49-164 (1981). O. D. H. D. Wamen and F. Pereira, An Efficient Easily Adaptable System for Interpreting Natural Language Queries, Research Paper 155, Department of Artificial Intelligence, University of Edinburgh, Edinburgh, U.K., 1981. 7. F. Pereira, "Extraposition grammars," Am. J. Computat. Ling. 7(4),243-256 (1981). 8. V. Dahl and M. McCord, Treating Coordination in Logic Grarnrrlars,Internal Report, Simon Fraser University, Burnaby, British Columbia, 1983. 9. V. Dahl and P. Saint-Dizier, Natural Language Understanding and Logic Programming, Elsevier Science,Amsterdam, The Netherlands, 1985. 10. J. F. Pique and P. Sabatier, An Informative Adaptable and Efficient Natural Language Consultable Database System, Proceed' ings of ECAI, pp. 250-254, 1982. 11. P. Sabatier, Dialogues en Francais auec un Ordinateur, G.I.A., University of Aix-Marseille, 1980. L2. P. Sabatier, Les Grammaires Logiqu,es,Actes du Colloque Traitement Automatique du Langage Natural, University of Nantes, Nantes, France, L984. 13. A. Colmerauer,An Interesting Natural Language Subsef,G.I.A., University of Aix-Marseilles, Marseilles, France, 1977. L4. G. Frege, Begriffsschrift, a Formula Language Modelled upon that of Arithmetic for Pure Thought, in J. Van Heijenoort (ed.), From Frege to Gddel:A SourceBook in Mathematical Logic, 18791931,Harvard University Press,Cambridge, MA, pp. 1-82, L967. 15. J. F. Pique, Interrogation en Francais d'une Base de Donn6sRelationnelle, G.I.S.,University of Aix-Marseilles,Marseilles,France, L978. General References V. Dahl, Un SystOmede Banques de Donnds en Logique du Premier Ord,re, En Vue de sa Consultation en Langue Naturelle; G.LA., University of Aix-Marseille, Marseille, France, I976. R. Kowalski, Logic for Problem Soluing, Elsevier North-Holland, New York, 1979. M. McCord, "Using slots and modifiers in logic grammars for natural language,"Artif. Intell. 18(3),327-367 (1982) E. Oliveira, L. M. Pereira, and P. Sabatier, "An expert systemenvironmental resource evaluation through natural language," Proceed' ings of the First International Logic Programming Conference, Marseille, France, 1982. J. Van Heijenoort (ed.), From Frege to Gddel:A SourceBook in Mathematical Logic, 1879-1931,Harvard University Press,Cambridge, MA, 1967. H. ConLHo Laborat6rio Nacional de Eugenharia Civil
BIBLIOGRAPHY G.I.A.,Uni1. A. Colmerauer,Les Grammairesde Metamorphose, L975. France, Marseilles, Aix-Marseilles, of versity 2. F. Pereiraand D. H. D. Warren,"Definiteclausegrammarsfor language analysis, a survey of the formalism and a comparison with augmented transition network," Artif. Intell. 13(3), 23I-278 (1e80).
RE STRUCTU PHRASE GRAMMAR,GENERALIZED Generalized phrase structure grammar (GPSG) is a framework for defining the syntax of natural languages (1). It was developedwithin theoretical linguistics in the early 1980sand
GRAMMAR,GENERALIZED PHRASESTRUCTURE
has been widely applied within computational linguistics (qu) (2). Mathematically, GPSG, 8s formulated in Ref. 1, is simply a variant of context-free phrase structure grammar (qr) (CFPSG). Historically, it falls within the family of theories that have developedout of Montague grammar. CF-PSGsattracted renewed interest within linguistics, following two decadesof neglect, when it was realized that all of the original arguments that apparently demonstrated their descriptive inadequacy for natural languages were either invalid or dependent on false premises (3). Despite their supposedlinguistic inadequacy, CF-PSGshad remained of interest within the computational linguistics community since such grammars are well understood mathematically and known to be computationally tractable. GPSG made this engineering interest theoretically respectableagain (4). In GPSG the implicit CF-PSG itself is not defined ostensively, but rather it is characterized indirectly by various techniques that have the effect of both allowing the grammar to capture linguistically significant generalizations and making the grammar several orders of magnitude more compact than a simple listing of rules would be. TheoreticalOutline GPSG defines syntactic categories as sets of syntactic feature specifications.A feature specification is an ordered pair consisting of a feature (e.g.,CASE) and a feature value. The latter may either be atomic (e.g., ACCUSATIVE) or it may be a syntactic category (i.e., features are allowed to take categories as their values). A syntactic category is then a partial function from features to their values. The internal make up of categories is further constrained by feature co-occurrencerestrictions (FCRs),which are simply Boolean conditions on combinations of feature specifications. Syntactic structures are phrase structure trees of the familiar kind whose nodes are labeled with syntactic categories as characteri zedabove. The well-formednessof the local substructures of a tree (and hence, recursively, of the tree as a whole) is determined by immediate dominance (ID) rules, linear precedence(LP) rules, principles of feature instantiation, and feature specification defaults (FSDs). ID rules are like ordinary CF-PSG rules except that they say nothing about the linear order of the items they introduce: S+ NP, VP As such, they simply permit a particular mother category to dominate the given daughter categories. Some ID rules are just listed, but others are derived from these and from each other by metarules. A metarule is a clause in the definition of the grammar that enables one to define one set of rules in terms of another set, antecedently given: VP-e X, NP > VPIPASI -- X Generalizationsthat would be lost if the two setsof rules (e.g., active VP rules and passive VP rules) were merely listed are captured by the metarule. LP rules state the relevant gen eralizations about the order of (classesof) sister constituents in the language: V
343
order of sister categories.In English, for example, they allow one to say that lexical items must always precede their phrasal sisters. GPSG employs three principles of feature instantiation: the head feature convention (HFC), the control agreement principle (CAP), and the foot feature principle (FFP). The HFC is responsible for equating one class of feature specificationsas they appear on the mother category and its head daughter(s). Thus, for example, a verb phrase inherits the tense of its verb. The CAP matches agreement features between locally connected agreeing categories (e.g., between a subject noun phrase and its verb phrase sister). And the FFP deals with the copying of category valued features between mother and daughter categories and is responsible for, for example, unbounded dependenciesin questions and relative clauses and for agreement with reflexive pronouns. The formal definitions of these principles crucially depend on notions of extension and unification definable in the partial function theory of categories sketched above. FSDs are Boolean conditions analogous to the FCRs but employed differently. FCRs are absolute conditions that have to be met, whereas FSDs are conditions that a category must meet if certain other conditions are not met. Thus, for example, the default value for CASE might be ACCUSATIVE, but a given noun phrase could appear in some other case if it was required to do so by, say, one of the feature instantiation principles. Applications Among the areas of syntax that GPSG work has covered in depth are the subcat egonzation of verbs, the English auxiliary system, coordination, questions,relative clauses,the passive construction, noun phrases, adjective phrases including comparatives, prepositional phrases,and infinitival and sentential complements. The earliest GPSG work concentrated on English, but in the last few years work has also been done on the grammars of Adyge, Arabic, Basque, Catalan, Chinese, Dutch, French, German, Greek, Hindi, Irish, Japanese, Korean, Latin, Makua, Palauan, Polish, Spanish, Swedish, and Welsh (1). Many recent natural-language processing projects have employed a GPSG grammar or some derivative thereof (z). GPSG parsers, often employing the Earley algorithm, have been written in LISP, Pascal,PRoLoG, and sNoBoL and are running on machines ranging from the IBM pC, through vAXen and Dandelions, to LISP machines, and the DEC 20. Most of the implementations are university based,and several were specifically designedfor grammar developmentpurposes. The largest existing commercial project to use a GPSG-based parser is the Hewlett Packard "HPSG" database front-end project (5).
BIBLIOGRAPHY 1. G. Gazdar, E. H. Klein, G. K. pulluffi, and I. A. sug, Generalized, Phrase Structure Grammclr, Harvard University Press, Cambridg", MA, and Blackwell, Oxford, 1985. Contains an extensive bibliography. 2- G. Gazdar, "Recent computer implementations of phrase structure grammars," Computat. Ling. 10, 2LZ-2L4 (1934).
344
GRAMMAR,PHRASE.STRUCTURE
3. G. K. Pullum and G. Gazdar, "Natural languagesand context free languag€s,"Ling. Philos. 4, 47L-504 (1982). 4. Proceedingsof the 21st Annual Meeting of the Association for Cornputational Linguistics, SRI, Menlo Park, 1983, papers by A, Joshi; C. R. Perrault; G. K. Pullum; S. M. Shieber;S. M. Shieberet al.; H. Thompson; and H. Uszkoreit. 5. Proceedingsof the 23rd Anuual Meeting of the Associationfor Carnputational Linguistics, Bell CORE, Murray Hill, 1985,papersby C. Pollard and L. Creary; D. Flicking€r, C. Pollard, and T. Wasow;and D. Proudian and C. Pollard.
/'
\
V
NP
I
NPR
/
John
G. GezDAR University of Sussex
RE GRAMMAR,PHRASE-STRUCTU Phrase-structure trees provide structural descriptions for sentences.Phrase structure grammars characterize phrase-structure trees. Both phrase structure trees and grammars, therefore, play a crucial role in natural-language processing for computing the structural description of a sentence,which can then be used for further processingin a language understanding or generation systems (1). In this entry phrase structure trees, phrase structure grammars, and related grammatical systems are described.
Trees Phrase-Structure A phrase-structure tree encodesa hierarchical structure of a sentence.This information is of two kinds: hierarchical grouping structure (constituent structure) and the syntactic categories of these groupings (2-4). The sentence (/) John wanted to publish the paper. has the following grouping structure (not necessarilyunique):
(2)
the
Here "John" is a proper noun (NPR), which is also a noun phrase (NP), "wanted" and "publish" are verbs (V), "to" *to" should be classifiedas is a preposition (P) (more correctly a particle or tense), "the" is a determiner (DET), "paper" is a noun (N), "the paper" is a noun phrase (NP), "to publish the paper" is a verb phrase (VP), "wanted to publish the paper" is also a verb phrase (VP), and finally, "John wanted to publish the paper" is a sentence (S). (Note that the notion of PSG should not be equated with the grammar of specificlinguistic content; PSG is a type of grammar in which different linguistic contents can be instantiated.) Corresponding to (3), a labeled bracketed structure is I lvp[vwanted][vp[pto]lvPublish] [s[NplNpsJohn] h..rplnprthellxPaperlllll (5) Phrase-structure trees such as (4) or labeled bracketed structures such as (5) are usuallY ftut not always) the outputs of parsers in a natural-language processingsystems. Grammars(PSG) Phrase-Structure Phrase structure trees can be characterized by grammatical systems, first by phrase structure gfammars and then some related systems.Phrase-structure grammars consist of a set of nonterminal symbols (phrase structure categories such as N, V, DET, P, NP, VP, S, etc.), a set of terminal symbols(lexical t'eatenr""in," o'ther"etc.), and a items such as "buYr" "Johnr" set of rewriting rules that allow one to rewrite a nonterminal symbol as a string of terminal or nonterminal symbols.If this rewriting is independent of the context surrounding the nonterminal, one has a context-free grammar (CFG); otherwise, one has a context-sensitive grammar (CSG).Thus a CFG has rewriting rules of the form A+X
publish
the
paper
This structure can also be representedby a bracketing structure as follows: [[John][[wanted][[to]lpublish][[the]lpaperl1111 (3) Both (2) and (3) describethe grouping structure without identifying the categoriesfor the constituents. Such structures are called "skeletons." Skeletons charactenze the phrase boundaries without assigning labels to the nodes (5). A skeleton with the category labels is a phrase-structure tree for a sentence.Thus a phrase-structure tree for (/) is
paper
(6)
where X is a sequenceof terminals or nonterminals and A is a nonterminal. A CSG has rewriting rules of the form ZAW + ZXW
Q)
where X, Z, W are strings of terminals and nonterminals, and A is a nonterminal. In (7) A is written as x in the environment 7'-W. The form (7) is very often written as
A+ xlz-w
(8)
[The right side of a rule in a CFG can be a null string (called null rules). For a CFG with null rules, it is possible to construct an equivalent CFG (i.e., generating the same set of
GRAMMAR,PHRASE.STRUCTURE 345
strings) without null rules. CFGs with null rules often simplify the glammars for many context-free languages.l Derivation in a CFG begins with the initial symbol S followed by successiveapplications of the rewriting rules until no further rules can be applied. It is easy to seehow the following context-free rules will help charactefize the phrase-structure tree in (4); and, of course,many others. The order in which the rewriting rules are applied is not relevant becausethe rewriting dependsonly on the symbol on the left side of the rule and not on the context around the occurrence of that symbol in a string. S+NPVP NP + NPR NP+DETN VP-+VVP VP-+PVNP
1. If t : 0 (the empty tree), then Pt : 0. 2, If t:
(9)
NPR -+ John, Mary, Bill N -+ paper, man, cow V -+ wanted, tried, publish, meet, published, want P+to DET -+ the Context-sensitive rules are useful in constraining the rewriting of a nonterminal by specifying a necessarycontext. For example, V -; wanted l-VP
tion applies: The set of proper analyses of a tree t, denotedPt, is defined as follows:
(10)
constrains the lexical insertion of "wanted" in V only if there is a VP to the right and V + publishl- NP V + publishedl- NP constrain the lexical insertion of "publish" or "published" in V only if there is a NP to the right. Thus, Q0) will not permit the insertion of "published" instead of "wanted" in the phrase structure tree in (4). SomeFormalPropertiesof PSCs.If all rules of a PSG, G, are context-free,then G is called a context-free PSG, or a "context free grammar" (CFG). If somerules of a PSG are context sensitive, then G is called a context-sensitive PSG, or a "contextsensitive grammar" (CSG). A string language of a PSG, G, is defined as the set of all terminal strings, derived in G and this set is denoted as L(G). A string w is derived in G if w can be obtained by successiverewriting of the initial symbol S using one of the rules in G. A string language L (i.e., a set of terminal strings) is called a "context-free language" (CFL) if there is CFG, G, such that L(G) : L. L is called a "strictly contextsensitive language" if there is no CFG, G, such that L(G) - L and there is a CSG, G, such that L(G) - L. Note that a grammar G may be context-sensitive,but its string language L(G) need not be a CSL. The class of CSLs properly contains the class of CFLs; in this sense, CSGs are more powerful than CFGs. There is a sense,however, in which CSGsdo not have more power than CFGs. If a CSG, G, is used for "analysis," then the language analyzed by G is context-free (6,7). In order to explain the use of a context-sensitivegrammar G for analysis for a given tree t let the set of proper analyses of t be defined as follows. Roughly speaking, a proper analysis of a tree is a slice acrossthe tree. More precisely, the following recursive defini-
thenPt: {A} U P(to).P(tr)' 'P(t")wher€to,tr,. .,tn are trees, and . denotesconcatenation(of sets);for example,
,=/\
^,/t\,
C
I
I
Pt : {S, AB, AE, A€, CdB, CdE, Cde, cdB, cdE, cde} Let G be a context-sensitive grammar; that is, its rules are of the form A --+ alrr - O where A e V - > ry is the alphabet and I is the set ofterminal symbols), ri.lC6
346
GRAMMAR,PHRASE.STRUCTURE
where Ce is a Boolean combination of proper analysis and domination predicates. Let G be a finite set of local constraint rules and r(G) the set of trees analyzable by G. It is assumedthat the trees in r(G) are sentential trees; that is, the root node of a tree in r(G) is labeled by the start symbol, S, and the terminal nodes are labeled by terminal symbols. It can be shown that the string language L(r(G)) : {*l*,is the terminal string of t and t C r(G)} is context free (7).
features). Also, there will be enormous redundancy in the grammar.] Each such "complex symbol" for a terminal symbol is a set of features. For example, in (4) the terminal symbols are replaced by the following complex symbols to give (4').
,/s /
\
VP
\u, v I t\ I _t /\ t-'rltI | .:X,; , l-l--l NP
NPR
Example. Let V - {S, T, &,b, c, e} and a finite set of local constraint rules:
f-wanteol
-DEr| | + | Animate I
1. S+e 2.S-aT 3. T+aS 4. S + bTc/(a-)) A DOM (T-) b. T + bSc/(a_))A DOM (S_)
l:l
LJ
In rules 1, 2, and 3 the context is null, and these rules are context free. In rule 4 (and in rule 5) the constraint requires an a on the left, and the node dominated (immediately) bV a T (and by an S in rule 5). The language generated by G can be derived by G1: S+e $-+aT T-+aS Sr + bTc
S--+aTr T-+aSr Tr-bSc
v
| lt"ll,T,i,^ L I l.r*' l' L
The possibility of associatingcomplex symbolswith intermediate nodesis not discussedin this entry. The form (4') is a "structural description" (SD) of the sentence (/ ): John wanted to publish the paper.
In G1 there are additional nonterminals Sr and T1 that enable the context checking of the local constraints grammar, G, in the generation process. It is easy to seethat under the homomorphismthat removes subscripts on the nonterminals T1 and 51, €&chtree generable in Gr is analyzable in G. Also, each tree analyzablein Ghas a homomorphic preimage in Gr. Consider once again the context-sensitive rule (10), V + wantedl- VP
P
Q0)
When (10) is interpreted as a "local constraint" as described above, the lexical item "wanted" will appear under a V node only if there is a VP node to its right (in the tree in which V appears).The predicate "VP to the right of V" is defined over the tree in which V and VP nodesappear and not on a string in which V and VP appear. Another way of saying the same thing is to say that to the right of V, there is a string that has an "analysis" VP in the tree. Context-sensitive rules in a PSG for describing linguistic gfammars are used in this "analyzabrlity" sense and not string-rewriting rules. TerminalSymbolsin a PSC. So far the terminal symbolsin a PSG have been presented as unanalyzed elements. This is done for simplicity. It is necessary to regard the terminal elements as complexes of phonological, syntactic, and semantic features (4,8). [In principle, it is possibleto eliminate aII these features complexes by introducing new nonterminals. However, the number of these new nonterminals will be extremely large (essentially coffespondingto all possiblecombinationsof
PSGsin a TransformationGrammar (TG). TGs are also not discussedin this entry. However, it is important to note that PSGs (and phrase structure trees) play a crucial role in a TG. The basic idea of a TG is that certain structural descriptions (SDs) are described in a component of a TG, called the "base component," and the other SDs are then obtained from these base-derived SDs by certain tree-transforming rules, called transformations. The base component is a phrase structure grammar and thus defines a set of base phrase structure trees. The trees obtained by using transformation rules are also phrasestructure trees.This view of TG is a more classicalview and also an oversimplified view, but it is adequate for this description. Thus, for example, the phrase structure tree for (11) below, which is shown in (12), can be basegenerated.The phrase structure tree for (/3) is then obtained by applying a transformational rule to (12),resulting in the phrase structure tree Q4). ( 11 )
John saw Mary
/'N
NP
t
nl,,x \^
I
NPRIv\
rriY outt I
John
,/utsee
(12) I
NPK
I
Mary Mary was seen by John.
(t3)
GRAMMAR, PHRASE.STRUCTURE
'N /
NP
I
NPR
I
Mary
AUX
/\
past
VP
be
en
/\,, vl
l/ t?
see I
by
347
node labeled ql P will dominate subtreesidentical to those that can be dominated by o, except that somewherein every subtree of the al B type there will occur a node of the form Bl B dominating a resumptive pronoun, a trace, or the empty string, and every node linking orlB and Bl B will be of the form ylP. Thus, al B labels a node of type a that dominates material (14) containing a hole of the type (i.e., Br extraction site in a F NP movement analysis). For example, S/NP is a sentencethat has an NP missing somewhere"The derived rules allow the propaNPR gation of a hole, and the linking rules allow the introduction of a category with a hole. For example, given the rule (15),
l
I
J o hn
Grammarsand Revivalof Phrase-Structure Trees Phrase-Structure Although a PSG is used in a TG, it plays a subsidiary role. Beginning around L975, it was becoming clear that when viewed in a certain w&y, PSGs had more descriptive power than one would have thought without necessarily going beyond the CFGs. The results on local constraints are a clear example of this point of view. In the late 1970s a number of grammatical formalisms were proposed that were nontransformational in character. Some of these are amendments to PSGs without necessarily going beyond CFGs (e.g., generalizedphrase structure grammars, GPSG (9,10),others are PSGs accompaniedby another level of representation to be used for filtering somestructures generated by PSGs (e.9.,lexical functional grammar, LFG (11)),and someothers are basedon treebuilding systems for generating phrase structure trees without the use of rewriting rules (e.9., tree-adjoining grammars, TAG (L2,13).Only GPSG and TAG are describedhere because they are directly related to phrase-structure grammars. GeneralizedPhrase-Structure Grammar(GPSG).Besidesthe ana\yzability (or node admissibility) notion described above, Gazdar (10) introduced two other notions in his framework, generalized phrase structure grammar (GPSG).These are categories with holes and an associatedset of derived rules and linking rules and metarules for deriving rules from one another. The categories with holes and the associatedrules do not increase the weak generative power beyond that of context-free grammars. The metarules, unless constrained in some fashion, will increase the generative power because,for example, a metarule can generate an infinite set of contextfree rules that can generate a strictly context-sensitive language. (The language {a"b"c"ln>L} can be generated in this way.) The metarules in the actual grammars written in the GPSG framework so far are constrained enough so that they do not increase the generative power. Gazdar introduced categories with holes and some associated rules in order to allow for the base generation of "unbounded" dependencies.Let VN be the set of "basic" nonterminal symbols.Then a set D(VN) of derived nonterminal symbols can be defined as follows. D(VN) _ lo.lBlo, F € VNI For example, if S and NP are the only two nonterminal symbols,then D(VN) would consist of S/S, S/NP, NP/NP, and NP/S. The intended interpretation of a derived category (slashed category or a category with a hole) is as follows: A
(15)
lsNPVPI
This is the same as the rule S + NP VP, but written as a node admissibility condition. Two derived rules Q6) and Qn can be obtained, [srNpNPNP/VP]
(16)
lsrNpNPVP/NPI
(17)
An example of a linking rule is a rule (rule schema) that introduces a category with a hole as needed for topicalization, for example, [sa S/a1
(/8)
[sPP S/PP]
(1e)
For a : PP this becomes
This rule will induce a structure like (20). The technique of categories with holes and the associatedderived and linking rules allows unbounded dependenciesto be accountedfor in a phrase-structure representation.
(20)
PP/PP
tl
a
book
The notion of categories with holes is not completely new. Harris (14) introducescategoriessuch as S-NP or S-pp(like S/ NP of Gazdar) to account for moved constituents. He doesnot, however, seemto provide, at least not explicitly, machinery for carrying the "hole" downward. He also has rules in his framework for introducing categorieswith holes. Thus, in his framework, something like (6) would be accomplishedby allowing for a sentence form (a center string) of the form (Z) (not entirely his notation), NP V O-Np
(21)
O - object or complement of V
This notion also appears in Kuno's context-free grammar (15). His grammar had node names with associateddescrip-
348
PHRASE.STRUCTURE GRAMMAR,
tions that reflected the missing constituent and were expanded as constituents, one of which similarly reflected the missing constituent. This was continued down to the hole. Sager (16), who has constructed a very substantial parser starting from some of these ideas and extending them significantly, has allowed for the propagation of the hole resulting in structures very similar to those of Gazdar. She has also used the notion of categories with holes in order to carry out some coordinate structure computation. For example, Sager allows for the coordination of S/a and S/a (16). Gazdar (10) is the first, however, to incorporate the notion of categories with holes and the associatedrules in a formal framework for his syntactical theory and also to exploit it in a systematic manner for explaining coordinate structure phenomena. Tree-AdjoiningGrammar(TAG). In a GPSGcertain amendations were made (e.g., the introduction of slashed categories) that allowed one to construct structural descriptions that incorporate certain aspectsof transformational grammars without transformational rules. Moreover, these amendations do not increase the generative power beyond that of CFG. It is possibleto capture many aspectsof a transformational grammar in a phrase structure tree-generating system consisting of tree-building rules rather than string-rewriting rules. The tree-adjoining glammar (TAG) is such a system. A TAG, G _ (I, A) consists of a finite set of "initial trees," I, a finite set of auxiliary trees, A, and a composition operation called "adjoining." The trees in I and A together are called "elementary trees." A tree a is an "intial tree" if it is of the form
labeled X, will correspond to a minimal recursive structure that must be brought into the derivation, if one recurseson X. A composition operation called adjoining (or adjunction) is now defined,which composesan auxiliary tree B with a tree y. Let y be a tree containing a node n bearing the label X and let F be an auxiliary tree whose root node is also labeled X. [Note that B must have, by definition, a node (and only one such) labeled X on the frontier. l Then the adjunction of p to y at node n will be the tree y' that results when the following complex operation is carried out: The subtree of 7 dominated by n, call it t, is excised,leaving a copy of n behind; the auxiliary tree B is attached at n and its root node is identified with n; and the subtree t is attached to the foot node of B and the root node n of t is identified with the foot node of B. Form (24) illustrates this operation. XS
'=/-\v,=A'
vithoutt
,\/f -
Node n
q$
L--F'Q4)
A t
The intuition underlying the adjoining operation is a simple one, but the operation is distinct from a substitution operation on trees. For a TAG, G : (I, A), T(G) is the set of all trees derived in (22) G starting from initial trees in I, and a string language L(G) is the set of all terminal strings of the trees in T(G). It can be shown that TAGs are more powerful than CFGs; is, there are string languages that can be generated by that Terminals TAGs but not by CFGs. For example, the language L That is, the root node of a is labeled S and the frontier nodes {a"bncnln= 1} can be generatedby a TAG but not by any CFG, lanare all terminal symbols. The internal nodes are nontermi- as is well known, becauseL is a strictly context-sensitive possible to it is language for a context-free guage. Moreover, form is of the nals. A tree F is an auxiliary tree if it construct a TAG, G, such that G generates the same contextfree language, but the set of phrase-structure trees generated by G cannot be generated by a CFG; that is, G provides structural descriptions for the strings of a context-free language : that no CFG can provide. In particular, for the languaE€,L (23) TAG can & langUagl, context-free > {a"eb"ln 1}, a well-known be constructed, G, that is able to provide structural descriptions for strings in L exhibiting cross-serialdependenciesbetween the a's and b's. Terminals Terminals For example, let G - (I, A), where: That is, the root node of p is labeled X, where X is a nonterminal and the frontier nodes are all terminals except one that is labeled X, the same label as that of the root. The node labeled X on the frontier will be called the foot node of B. The internal nodesare nonterminals. The initial and the auxiliary trees are not constrained in any manner other than as indicated above. The idea, however, is that both the initial and auxiliary trees will be minimal in some sense.An initial tree will correspond to a minimal sentential tree (i.e., without recursing on any nonterminal), and an auxiliary tree, with root and foot node
S I
I
I: 01= A:
Fz=
S
Ft=
/\ /\
aT
T
/
t\
t\ t\
Sb
(25)
\ S
t\
Tb
GRAMMAR,PHRASE.STRUCTURE 349 in G are shown below.
Some derivations 7o=
A
S* 12=
'ol" ,/\^
,rtt--t
--..
\
,,/' /
ia1 \
^:,,/ a 2 i \\-.-t-. \
r,
i\ l\0,
t
\..t-\- --/'f-\I
-r/
/ l \ \
AUX
i\
NP
VP
r+whl | €;
/ \ iNP
/\
S
,,/'F--.
NP
S
A /\
1r=
Fz= I
NP
Fr=
NPi
I
N
T,/N, I N
D i d J o h n p e r s u a d eB i l l S '
Who met Mary
l\0, ) \-- -'?-tl\- -'/
Fz=
bl
I
!
It : ys with Fr adjoined at S as indicated it yo by the asterisk
Tz : yr with Fz adjoined at T as indicated it y, by the asterisk
/\
NPV
t/
NV
K NP
I
S'
N Clearly, L(G) - {a"eb" ln =- 0}. The a and b in each auxiliary J o h n p e r s u a d e dB i l l S ' tree as it enters the derivation have been coindexedto indicate that they both belong to the same auxiliary tree, that is, they have a dependency relation between them. The terminal string of yz as shown in (27) below illustrates the cross-serial dependencies. The lexical string under each tree is an example string that would result by appropriate lexical insertions in the tree. The detailed structure of each tree is not relevant and should not (27) be taken as the unique structure assigned to the string. Much of the recent linguistic research can be characterized by the study of the constraints on the shape of the elementary trees, The ability to represent such cross-serial dependenciesper- initial and auxiliary. The coindexing of nodes in B1 and a3 is mits one to construct cross-serialdependenciesin natural lan- for the purpose of illsutrating the dependencies.The following phrase-structure tree (not the only ones)in this TAG can now guages(e.g.,in Dutch). In the following example, how a TAG can be used for char- be derived. By adjoining Fr to a1 at the NP node, (29) can be acterizing some linguistic structures has been illustrated very obtained. briefly and in a highly simplified manner. For example: let G - (I, A) where: CI3 =
S'
/\
(^7-\:)--:\ ,/t\
coilp \
t\,
VP
N
NPi
I
/
[+ wh]
V
DET
left
man
the
/r
I
I PROT
I
VP
/\
oi\
S
,/
/
\
NP
I
I
/
,/"\,ii ^^1"\)t DE+ :T'^/\t
l
NP
S'
d2=
NR..
I
v' V
NP
I
€i
\
VP
PRO To
/
\
V
W h o P R Ot o i n v i t e
(,li;,f ,/\) l,.l \
\ --- ----/'Y
The man who met Mary left
NP
I
N P R Ot o i n v i t e J o h n
(ze,
(28)
By adjoining Fz to a2 at the root node S of u2, (J0) can be obtained.
350
GRAMMAR,PHRASE.STRUCTURE
,/
RW (uhv, u'gv')
,-?l-t.
tt
rP,
|
tt
i^,/\,
where uhv and u'gv' are two strings with h and g as designated symbols, called heads. The result of applying the rule results in a string
I \[ /N,' \
uhu'ry'v
| /l
t
\_yl\
T/\
PRO TO
(30)
VP
/\ /\ V
NP
I
That is, the string to the right of the head of the first string is wrapped around the secondstring. The head of the resultant is the head of the first string. The adjoining operation in a TAG is very similar to wrapping operations in HG. It has been recently shown by Vijayshankar, Weir, and Joshi (18) that HGs are equivalent to TAGs (assuming head for an empty string is defined).
'l' Bill PROto inviteJane Johnpersuaded
Summary
By adjoining Fg to as dt S' under the root node of 43, one has
(3/):
*r,,,\ S'
, - - _ .-
/\
- -.F3 -\
NPi [+ whJ, I l
\ \
(3I)
Phrase-structure trees provide structural descriptions for sentences. Phrase-structure trees can be generated by phrase structure grammars. Phrase structure trees ean be shown to be appropriate to charactertze structural descriptions for sentences,including those aspectsthat are usually characterized by transformational gfammars, by making certain amendations to CFGs, without increasing their power, or by generating them from elementary trees by a suitable rule of composition, increasing the power only mildly beyond that of CFGs. Structural descriptions provided by phrase structure trees are used explicitly or implicitly in natural-language processing systems(1).
BTBLIOGRAPHY
I
to invite BillPRO WhodidJohnpersuade
I €;
Note that the dependencybetween NP[+wh] and e (the empty string, representing gap or trace) was stated locally in the auxiliary tree 43. In the tree resulting from adjoining Fs to a3 the dependent elements have moved away from each other, and in general, adjoining will make them unbounded.This is an example to show that dependenciescan be locally stated on the elementary trees, adjoining preservesthem, and may introduce unboundedness. The TAG grammar illustrates how phrase structure trees can be built out of elementary trees (elementary phrase structure trees) such that the cooccurencerelations between elements that are separated in surface constituent structure can be stated locally on elementary trees in which these elementary trees are copresent.This property of TAGs achievesthe results of transformational rules (without transformations), including the generation of phrase structure trees exhibiting cross-serialdePendencies. pollard (17) has proposeda rewriting system, called head grammars (HG), in which the rewriting rules not only allow concatenation of strings but also wrapping of one string around another. For example, HG has rules of the form
1. T. Winograd, Language a,sa Cognitiue Process,Academic Press, New York, 1983. 2. L. Bloomfield, Language, Holt, New York, 1933' 3. R. S. Wells, "Immediate constituents," Language 23, 212-226 fte47). 4. E. Bach, Syntactic Theory, Holt, Reinhart, and Winston, New York, t97 4. b. L. S. Levy and A. K. Joshi, "skeletel structural descriptions,"Inf. Cont. 39, 192-2L1 (1978). 6. S. peters and R. W. Ritchie, Context-sensitive Immediate Constituent Analysis, ProceedingsACM Symposium on Tft'eoryof Computing, pp. 150-161, 1969. 7. A. K. Joshi and L. S. Levy, "Constraints in structural descriptions," SIAM J. Comput. 6, 272-284 (1977)' 8. N. Chomsky, Aspectsof the Theory of Syntax,,The MIT Press, Cambridg., MA, PP. 131-186, 1965' g. G. J. M. Gazdar, Phrase Structure Grammar, in P. Jacobsenand G. K. pullum (eds.), The Nature of Syntactic Representation, Reidel, Boston, MA, L982" 10. G. Gazdar,J. M. E. Klein, G. K. Pullum, and I. A. S"g, Generalized Phrase Structure Gramrrlar' Blackwell, Oxford, 1985' 11. R. Kaplan and J. W. Bresnan, A Formal System for Grammatical Representation,in J. W. Bresnan (ed.),The.MentalRepresentation of Grarnmatical Relations, MIT Press, Cambridge, MA, pp' 17328L,1979. L2. A. K. Joshi, L. s. Levy, and M. Takashaski, "Tree adjunct grammars," J. Comput. Sys.Sci. 10, 136-163 (1975)'
GRAMMAR, SEMANTIC 1g. A. K. Joshi, How Much Context-sensitivity is necessaryfor Structural Description? Tree Adjoining Grammars in D. Dowty, L' Karttunen, and A. Zwicky (eds.), Natural Language Parsing, Cambridge University Press,Cambridge, MA, pp. 206-250,1984. 14. Z. S. Harris, String Analysis of Language Structure, Moutan and Co., The Hague, L962. lb. S. Kuno, The Current Grammar for the Multiple Path English Analyzer, Mathematical Linguistics and Automatic Translation, Report No. NSF 8, Computation Laboratory, Harvard University, Cambridge, MA, 1963. lG. N. Sager, Syntactic Analysis of Natural Languages, in M. Alt and M. Rubinoff (eds.), Aduances of Compu,ters,Vol. 8, Academic Press, New York, PP.202-240, 1967. L7. C. Pollard, Head Grammars, Ph.D. Dissertation, Stanford University, Stanford, 1984. 18. K. Vijay Shanker, D. Weir, and A. K. Joshi, Adjoining, Wrapping, and Headed Strings , Proceedingsof the 24th Annual Meeting of the AssociationFor Cornputational Linguistics, New York, June 1'986. A. JosHr University of Pennsylvania
GRAMMAR,SEMANTIC A "semantic glammar" is a grammar for language in which the categories refer to semantic as weII as syntactic concepts. It was first developedin the early 1970sin the attempt to build practical natural-language interfaces to educational environments, SOPHIE (qv) (L,2),and database,LIFER (qv) (3'4) and PLANES (qv) (5). It has continued to be used in a variety of commercial and other applications such as ROBOT [also known as INTELLECT (qv) (6), PHRAN (qv) (7)' XCALIBUR (8), and CLOUT. The distinguishing characteristics of a semantic grammar is the type of information it encodesand not the formalism used to represent it. Semantic grammars have been representedin many different formalisms including augmented transition networks (seeGrammar, augmentedtransition network) and augmented phrase structure grammars (see Grammar, phrase-structure). Unlike natural-language systems generally, the aim of semantic gTammarsis to characterize a subset of natural language well enough to support casual user interaction. As such, it is primarily a technique from the fietd of natural-language engineering rather than a scientific theory, [though some researchers have proposed semantic grammars as a psychological theory of language understandins (7).1 To understand semantic grammars, it is helpful to understand a little about theories of natural language. The goal of a theory of language is to explain the regularities of language. Transformational grammars (see Grammar, transformational) and lexical functional grammars are two good examples of theories of language. The syntax part of the theory explains the structural regularities of a languog€, for example, things that are true about word order and inflections. The theory doesthis by providing rules that the words and phrases must obey.This collection of rules is refened to as a grammar. An example of the kind of regularity that the syntactic part of a theory of language seeksto capture can be seen in the relationship between the following two sentences: 1. The boy hit the ball. 2. The ball was hit by the boy.
351
It is called the passive relationship and exists between an infinite number of other sentencesin English as well. A good syntactic grammar will have a small number of rules that account for the passive relationship between all of these sentences. To explain these relationships, the glammar must name and relate broad, abstract concepts.For example, introducing the conceptof a noun phrase (NP) as referring, roughly, to the collection of all possible phrases that name things allows a syntactic grammar to contain a rule like: (Noun PhraselXVerbXNounPhrase2):(NounPhrase2XAuxiliaryVerbXVerb) by (NounPhrasel) This gives rise to categories in the grammar that characterize the roles words and phrases play in the structure of language that is in the syntax. In semantic grammars, the choice of categoriesis based on the semantics of the world and the intended application domain as well as on the regularities of language. Thus, for example, in a system that was intended to answer questions about electronic circuits (such as SOPHIE), the categories might include measurement, measurable quantity, or part as well as standard categories such as determiner and preposition. For example, the rule (Measurement) ; (Determiner)(Measurable-Quantity)(PrepositionXPart) applies in the following Phrases: The voltage acrossR9. The current through the voltage reference capacitor. The power dissipation of the current-limiting transistor. In Figure 1 are two parse trees of the same sentence that might be generated by typical grammars, the left one with a standard grammar, the right one with a semantic glammar.
Advantagesof SemanticGrammars Semantic grammars provide engineering solutions to many problems that are important when building practical naturallanguage interfaces. These important issues are efficiency, habitability, discoursephenomena,bad inputs, and self-explanation. Efficiency is important because the user is waiting during the time the system spends understanding the input. Semantic grammars are efficient becausethey allow semantic constraints to be used to reduce the number of alternative parsings that must be considered.They are also efficient becausethe semantic interpretation (meanirg) of the expression follows directly from the grammar rules. When considering a natural-language interface, it is often useful to think of the interpretation of a statement as the command or query the u$er would have had to type had he or she been talking directly to the system. For example, in a databaseretrieval system the interpretation of the input is the query or queries in the retrieval language that answer the question (seeSemantics, procedural). Typically, in a semantic gfammar each rule has an augmentation associatedwith it that builds its interpretation from the interpretations of the constituents. For example, the interpretation of the rule (Query) : - (QuestionIntro) (Measurement) is a query to the databasethat retrieves the measurement specifiedin the interpretation of (Measurement). The interpretation of (Measurement) specifies the
352
GRAMMAR,SEMANTIC QUERY
MEASUREMENT QUESTTON IN T R O
What
MEASURABLE QUANTITY
QUESTTON WORD
Q/PRO
is
the
voltage
across
R9
What
S t a n d a r ds t r u c t u r e o f a n E n g l i s hq u e s t i o n
is
the
voltage
across
R9
S e m a n t i cg r a m m a r s t r u c t u r eo f a n E n g l i s hq u e s t i o n
FigUre l. Examples of two parse trees of the same sentence.
quantity being measured (e.g.,voltage) and where it should be measured (e.g., across Rg). The interpretation of (Measurement) can be used differently in, for example, a rule like (YesNo-Query) :- (Be-Verb) (Measurement) (Comparator),as in the question "is the voltage across R9 low?" Having the semantic interpretation associateddirectly with the grammar is efficient becauseit avoids a separateprocessthat doessemantic interpretation. The second important issue is habitability. It is unlikely that any natural-langu ageinterface written in the foreseeable future will understand all of natural language. What a good interface does is to provide a subset of the language in which users can expressthemselves naturally without straying over the language boundaries into unallowed sentences.This property is known as "habitability" (9). Although exactly what makes a system habitable is unknowr, certain properties make systems more or less habitable. Habitable systems accept minor or local variations of an acceptedinput and allow words and concepts that are accepted in one context to be accepted in others. For example, a system that accepts "Is something wrong?" but does not accept "fs there anything wrong?" is not very habitable. Any sublanguagethat doesnot maintain a high degree of habitability is apt to be worse than no natural-language capability becauseusers will continually be faced with the problem of revising their input. Lack of habitabitity has been found to be a major source of user frustration with natural-language systems. An important problem in designing habitable natural-language interfaces is the occurrence of discourse phenomena such as pronominal reference and ellipsis. When people interact with a system in natural language, they assumethat it is intelligent and can therefore follow a dialogue. If it does not, they have trouble adapting. The following sequenceof questions exemplifies these problems: 3. What is the population of Los Angeles? 4. What about San Diego? Input 3 contains all of the information necessary to specify a query. Input 4, however, contains only the information that is different from the previous input. Systems using semantic grammars handle sentence like 4 by recognizing the categories of the phrases that do occur in the ellided input. In this case,"San Diego" might be recogntzedas being an instance of (City;. The most recent occurrence of the same category is
Iocated in a previous input, and the new phrase is substituted for the old one. In some systems,such as SOPHIE, PLANES, and XCALIBUR, this is done using the interpretation structure of previous inputs. In some systems,such as PHRAN, the substitution is made in the previous input string, which is then reparsed.Input 4 is an example of the discoursephenomena called ellipsis (qt). Semantic grammars have also been used to handle classesof pronominal and anaphoric reference, as in the sentence "What is it for San Francisco?"Although the techniques used by semantic grammars work on many common cases of discourse constructs, there are many other more complex uses that they do not address (see Discourse understanding and Ref. 10 for more details). Another ramification of the fact that the natural-Ianguage interface will not understand everything is that it must deal effectively with inputs that lie outside its grammar, that is, sentences that do not parse. The standard solution to this problem is to understand part of the sentenceeither by ignoring words (sometimescalled "fitzzy parsing") or by recognizing phrases that do satisfy some of the grammar. A semantic grammar has the advant age that recogntzed phrases are meaningful and can be used to provide feedback to the user. For example, if the user's phrase contains the phrase "voltage across R9," the system can display the rules that use (Measurement) to give the user an idea of what sentencesthe system will accept. A related difficulty with natural-language interfaces is conveying to the user the capabilities of the system, for example, what the system can do and what conceptsit knows about. Semantic grammar systems chn use the information in the grammar to provide some help. For example, LIFER allows the user to ask about possible ways of completing a sentence. In the dialogue below, the user requestshelp in the middle of a sentence.The system respondswith the possibleways that the sentencecould be completed. Since the grammar is semantic, the terms are meaningful to the user. usER:What is the voltage (help) Inputs that would complete the (measuresysrEMRESpoNsE: ment) rule are: across (part) between (node) and (node) at (node)
GRAMMAR,TRANSFORMATIONAL
The NLMENU (11) system attacks this problem by constraining the user's input to a series of menu selections that only produce legal sentences.In addition to obviating the problem of ,rnr" cognizedsentences,the approach also has the benefit of presenting in the menus an explicit picture of what the system can do. Limitationsof SemanticGrammars Many limitations arise from the merger of semantic and syntactic knowledge that characterizes semantic glammars. The range of linguistic phenomena that have been coveredis limited and excludes,for example, complex forms of conjunctions, comparatives,or complex clause-embeddingconstructs,for example, "Which ships does the admiral think the fourth fleet can spare?" (L2). Moreover, although work in constructing semantic grammars is creating some generalizable principles of design, the grammar itself must be redone for each new domain. Categories that are appropriate to electronics are not applicable to the domain of censusdata. Even within a single domain, certain syntactic regularities, such as the passive transformation must be encodedfor each semantic class that allows active sentences.This not only increasesthe size of the grammar but, more importantly, results in a great deal of redundancy in the grammar, making it difficult to write or extend. Attempts have been made to overcomethis limitation by separating the syntactic knowledge. The simplest approach is to reformulate the categoriesin the grammar to make them more syntactic. In this casethe semantic distinctions that had previously been made by having distinct categoriesare made in the augmentations associatedwith each glammar rule that producethe interpretation. Another approachis to capture the syntactic knowledge in the program that applies the grammar rather than in the glammar itself. In PHRAN, for example, aspects of adverbs and relative clauses are handled by the matching processthat applies the grammar rules to the input. In a return to the more classicalbreakdown of linguistic information, some systems seek to maintain the advantages of semantic grammar by closely coupling separate syntactic and semantic components (13). This points to one contribution of semantic grammats to the theory of language (as contrasted to their contributions to the production of usable systems),the identification of phenomena that succumb to simple methods.
353
description of the LIFER system, which includes many elegant user interface features including the ability to change the grammar during the interaction. 4. G. G. Hendrix, E. D. Sacerdoti, D. Sagalowicz,and J. Slocum, "Developing a natural language interface to complexdata," ACM Trans. DatabaseSys.3(2), 105-147 (June 1978).Providesan overview of the LIFER System. 5. D. L. Waltz, "An English language question-answeringsystem for a large data base," ACM 2I, 526-539 (July 1978). Describesthe PLANES system that interfaces to relational databases. 6. L. R. Harris, "IJser-orienteddata basequery with the Robot natural language query system,"Int. J. Man-Mach. 9tud.9,697-713 (1977).Describesthe system ROBOT that is marketed as INTEL' LECT. 7. R. Wilensky, A. Yigal, and D. Chin, "Talking to UNIX in English: An Overview of UC," CACM 27(6), 574-593 (June 1984). Describesthe PHRAN system, which pushesthe domain dependence of semantic grammars. 8. J. G. Carbonell, DiscoursePragmatics in Task-Oriented Natural Language Interfaces, Proceeding of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp. 164-168, 1983.DescribesXCALIBUR, & general systemfor interfacing to expert systems. 9. W. C. Watt, "Habitability," Am. Document.L9,338-351 (1968). 10. B. L. Webber, So What Can We Talk About Now? in M. Brady and R. C. Berwick (ed.), Computational Models of Discourse, MIT Press,Cambridge, MA pp. 331-371, 1983. Describesthe difficult problems anaphoric reference that arise in natural discourse. 11. H. R. Tennant, K. M. Ross,R. M. Saenz,C. W. Thompson,and J. R. Miller, Menu-Based Natural Language Understanding, Proceeding of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 151-158, 1983.Describes NLMENU, a menu driven natural language input system. L2. T. Winograd, Language as a Cognitiue Process,Vol. 1, Syntax, Addison-Wesley,Menlo Park, CA, p. 381, 1983. Excellent introduction to the area of natural language understanding. 13. R. J. Bobrow and B. L. Webber, Knowledge Representationfor Syntacticlsemantic Processirg, Proceedings of the l st AAAI , Stanford CA, pp. 316-323,1980. Describesthe RUS system that arose from attempts to extract knowledge common to semantic gramamrs in several domains.
R. BunroN XeroxPARC
GRAMMAR,TRANSFORMATIONAL BIBLIOGRAPHY T?ansformational grammar is a theory for describing human 1. R. R. Burton,SemanticGrammar:An EngineeringTechniquefor languagesbasedon the idea that the full range of sentencesin a language can be describedby variations, or transformations, Constructing Natural Language Understanding Systems, BBN Report 4274, Bolt, Beranek and Newman, Cambridge,MA, L976. on a set of basic sentences.Developedby Noam Chomsky in Burton's Ph.D. thesis, University of California, Irvine, L976which the early 1950s and building on the earlier work of Zelhg introduced the term "semantic grammar" and described its use Harris (L,2), the theory of transformational grammar is now and advantages in building the SOPHIE natural-language frontprobably the single most widely studied and used linguistic end. Good introduction to the issues surrounding natural-lanmodel in the United States. (A revised version of Chomsky's guage engineering. thesis work of the early 1950s that initiated the study of 2. R. R. Burton and J. S. Brown, Toward a Natural Language Capatransformational grammar, Ref. 1 gives a brief review of the bility for Computer-AssistedInstruction," in H. O'Neill (ed.),Prointellectual background at the time. Although it is difficult cedures for Instructional Systems Deuelopment,Academic, New reading, is it still a good source on the overall framework of paper largely basedon York, pp.273-313, 1979.A more accessible "Semantic grammar: An engineering technique for constructing generative grammar, including the theory of linguistic levels of description.) Transformational grammar has also been the natural language understanding systems." subject of experiments in human language processingand the to Buildiig Practical Manual: A Guide 3. G. G. Hendrix, The LIFER basis for several computer models of language processing,data Natural Language Interfaces, Technical Note 138, SRI Artificial Intelligence Center, Menlo Park, CA, February 1977. Complete base retrieval, and language acquisition. The theory has had
354
TRANSFORMATIONAL GRAMMAR,
an enormous influence on the practice of linguistics as a scientific discipline, particularly as part of a general approach to the study of human cognition that posits the existence of mental representations that have a central role in mental processing. Many of the core proposalsof the theory, those regarding the exact representation of linguistic knowledge, remain controversial in nature and have given rise to a variety of competing linguistic models (2,3-9). Ianguage T?ansformational grammar seeks to answer three key questions about human language: what is knowledge of langUage; how is that knowledge put to use; and how is knowledge of Ianguage acquired? It aims to answer the first question by providing a finite representatior, 4 grammar, for eachpossible human language. This use of the term "grammar" in the transformational framework is to be contrasted with its colloquial usage.Given a particular human language like English, a grammar for that language is to show how each sentenceof that language is pronounced, and how its sound can be paired with its meaning (or meanings, if the sentencehas more than one meaning); that is, the grammar completely characterizesa set of (sound,meanirg) pairs for that language. Note that this description is not meant to have any direct computational interpretation but is just meant to describe in an abstract way the representation of grammatical knowledge that a person might have, sometimes called linguistic competence(seeLinguistics, competence and performance). Transformational grammar answers the question of how langUage is used by claiming that the grammar enters into the mental computations speakers and hearers carry out when they produce or understand sentences.A full theory of language use would thus include someaccount of actual computational algorithms, memory requirements, and the like, entering into langUage processing;this would be an account of linguistic performance. Finally, the theory tries to answer the question of how language is acquired by assuming that all human languages are cut from the same basic design, called universal grammar. Universal grammar is not itself the grammar for any natural language but is like a chassisthat is to be fleshedout and built upon by the actual linguistic experience of a child in a particular language community. Much of the theoretical effort in transformational grammar is directed to showing how human Ianguages vary from each other in small enough ways that a child can learn English or Chinese or German without explicit instruction.
branch of this mathematical study, known as formal language theory, grew out of Chomsky's study of rule systems for generating languages (10,11). Transformational grammar is thus part of the so-called generative paradigm in linguistic theory and is sometimes called transformational generative grammar. Other grammatical theories for human languages may be constructed that are generative but do not include transformations as part of their grammars. Over the past 30 years several such alternative theories have been advanced, such as relational gfammar (5), arc-pair grammar (6), and more recently lexical-functional grammar (3) and generalized phrase structure grammar (4). Syntacticand SemanticRules As mentioned, a central idea of transformational theory is that the variety of surface forms of any particular language-its sentences-are the result of the interaction of several modular subsystems.Most versions of transformational grammar assume that two of the basic subsystems are a set of syntactic rules or constraints and a set of semantic rules. The syntactic rules (from the Greek auura(t(, "arranged together") specify the legal arrangements of words in sentences,for example, that the English sentence"John will eat the ice cream" is legal becauseit consists of a subject noun phrase "John" preceding a verb phrase or predicate "will eat the ice cream." The semantic rules specify how a particular arrangement of words is to be interpreted, for example, that "WiII John eat the ice cream" is a question. The syntactic rules may be further subdivided into a set of rules, a base grammar that generates a set of basic sentences (at one time called kernel sentencesand later deep structures, though the terminolory is no longer applicable) and a set of transformations that operate on these basic sentencesto produced derived sentences or surface structures. Additional rules operate on the surface structures to yield pronounceable output sentences(1,10). Transformations
Roughly and intuitively, the transformations are designed to account for the systematic relationship between sentences such as active-passive sentence pairs; global sentence relationships, such as the relationship between "what" and "eat" in "what will John eat," where "what" is the questioned object of "eat"; and ambigUities in sentencessuch as "they are flying planes," where one and the same string of words is derived from two different base sentences(one where "flying planes" is a kind of plane, with "flying" an a{ective, and one where GenerativeGrammar "flying" is the main verb). For instance, in one version of (11),the senAlong with much other work in linguistics, transformational transformational grammar developedabout 1965 generated by a be would ice cream" the eat wilt "John grammar notes that since the speakers of a language like En- tence rule transforrnational a then and rules, syntactic of set produce infinite simple an or glish have the potential to understand and invert "John" would sentence basic this on of way operating have some must speakers such sentences, number of ice the eat John question "Will produce derived the generating an infinite number of sentencesfrom finite means. "wiII" to could operations of transformational series Another produce in cream." The use of the term "gener ate" here doesnot mean passive sentence the senseof a speaker being able to say some particular sen- act on this sentenceyet again to produce the last sequenceof This John." by eaten be cream ice the system "Will axiom an of sense mathematical the in rather but tence and "by," mov"be" elements being able to produce or derive a set of theorems. For this operations involves adding new of existing form the changing and around, elements ittg old purpose, transformational grammar relies on the mathematiblock diagives overall the 1 Figure in a sentence. one elements allow that onward 1930s cal devices formulated from the sounds are and meaning how showing system, gram this for rules. One finite of means by sets infinite specify to recursively
TRANSFORMATIONAL 355 GRAMMAR, Base grammar rules
J
Output: base (deep) structures -+ Semantic interpretation (meaning)
t TYansformational rules
t Output: surface stmctures
I
grammatical relations like Subj. In another current theory, generalized phrase structure grammar (4), the active-passive relationship is describedby a metagrammar that, given a rule system that can produce the active sentence, derives a rule system to produce the corresponding passive form. This derived gTammar contains no transformations but simply generates all surface forms directly without a derivation from a deep structure.
Phonological rules
J Output: sound
Variationsof TG
As a concrete example of how a transformational grammar of a transformational Figure l. A block diagramof the components factors apart the sound-meaning relationship and how the grammar,Ca 1965(11). form of transformations and base rules has changed,consider one version of transformational grammar, the so-called expaired. Note that in this version of the theory the meaning of a tended standard theory of the 1970s(2,L2,13).First it is shown sentenceis determined by rules operating on the output of the how this version of transformational grammar differs from the base grammar, that is, the deep structures. The workings of version of the mid-1960s, which was briefly sketched above. this model, known as the standard theory, are described in This gives a detailed example of how the components of a transformational grammar work together. Then it shows how more detail below. these components have been modified in the most recent version of transformational gTammar, known as governmentDerivation Process binding theory (14). Reviewing what was described earlier, the 1965 transforThe processof deriving a sentence (surface structure) such as "Will the ice cream be eaten by John" has been the source of mational theory had a syntactic component with two types of considerable confusion for computational analysis. A deriva- rules: First, it had a base grammar consisting of phrase struction does not give an algorithmic description or flowchart for ture rules, which represented or marked the basic categorial how a derived sentence could be mechanically produced or relationships or phrases of sentences,such as the fact that a analyzed by a machine; that is, it does not directly give a noun phrase (NP) follows a verb phrase ryP) in "John wiII eat parsing procedure for natural languages. The latter part of the ice cream." This defined what is called a set of phrase this article gives more detail on how transformational gram- markers or basic sentences. Semantic interpretation was asmars may actually be used for sentenceanalysis or production. sumed to take place via a set of rules operating on the output of the base grammar (called deepstructures). Second,this theOver the course of 30 years the theory of transformational grammar has greatly altered the mechanisms used to generate ory contained transformational rules that mapped phrase the basic sentences,the definition of transformations, and the markers to other phrase markers, producing surface strucway that the final complexity of sentencesis accounted for by tures as output. Phonological rules operated on the surface the various subcomponentsof the grammar. The general trend structures to yield sentences in their final "pronounceable" has been to have less and less of the final sentenceform be form (11). determined by particular transformations or rules in the base grammar. Instead, relatively more of the constraints are writBaseCrammar. The basic phrase markers are describedby ten in the form of principles common to all languages or en- a phrase-structure grammar, in the simplest casecontext-free coded in the diction ary entries associated with each word grammar (15). A simple example of a phrase structure gram(12,13). mar helps to clarify this notion and illustrates how a grammar can generate a language. This grammar is given in the form of This approach is controversial (2). Other researchers in generative gtammar have adopted quite different viewpoints context-freephrase structure rules (10,11,15): about how best to describe the variations within and across (1) S -+ NP Aux VP (2) VP -+ Verb NP natural languages. In general, these alternative accounts (3) NP -+ Name (4) NP -+ Determiner Noun adopt means other than transformations to model the basic (5) Auxiliary + will (6) Verb -+ eat variation in surface sentencesor assumeother predicatesthan (7) Determiner -+ the (8) Noun + ice cream phrase structure relations are centrally involved in grammati(9) Name -+ John cal descriptions (7). In the recent theory dubbed lexical-functional grammar (LFG) (3), there are no transformations. InThe first rule says that a sentence(S) is a noun phrase (NP) stead,the differencesbetween,for example, an active sentence followed by an auxiliary verb and then a verb phrase (VP). The like "John kissed Mary" and a passive sentence like "Mary arrow can be read as an abbreviation for "is a" or as an instrucwas kissed by John" is encodedin the form of different lexical tion to generate the sequenceof symbols NP Aux VP from the entries (dictionary entries) for "kiss" and "kissed" plus a con- symbol S. That is, this rule is a commandto replacethe symbol nection between those lexical entries and the glammatical S with the sequenceNP Aux VP. For the purposesof the rule, relations of subject and object. In this example, among other the symbols NP, VP and so on are regarded as atomic. Simithings the lexical entry for "kiss" says that "John" is the sub- larly, the secondrule says that a VP consistsof a verb followed ject, and that for "kissed" says that "Mary" is the object.There by an NP, while the third and fourth rules describe NPs as is no derivation from deep structure but simply the direct con- either a name or a determiner followed by a noun. The last five struction of a kind of surface structure plus the assignment of rules are lexical rules that introduce actual words like "ice
356
GRAMMAR,TRANSFORMATIONAL
cream" or "John." In a full grammar this representation would be in a form suitable for pronunciation, but conventionally printed versions just spell out words in their written form. Symbols like "ice cream" are called terminal elementsbecause they do not appear on the left side of any rules. Therefore, no more rules apply to them, and they terminate any further action. All other symbolslike S, NP, VP, Name, and so on, are called nonterminals. All the rules in this grammar are called context-free because they allow one to freely replace whatever symbol is to the left of the arrow with whatever sequenceof symbols is to the right of the arrow. Formally, context-freerules have only a single atomic symbol like S, VP, or NP to the left of the "is a" arrow. To use this grammar for generating a base phrase marker, one applies the rules of the grammar beginning with a designated initial symbol, in this case S, until no further rules can apply. This is called a derivation because it derives a new string of symbols from the starting symbol S. If the derivation consists of only words, it generates a sentence.The set of all sentencesthat is derived from S given somegrammar is called the language generated by the glammar. For example, applying rule 1, one can replaceS with NP Aux VP. Now apply rule 3 and replace the symbot NP with Name, producing the sequenceof symbols Name VP. Note how the presenceof the VP context to the right of the symbol NP did not matter here; that is why the application of rule 3 is called context free. Continuirg, now apply rule 9 and replace Name with "John," obtaining "John" VP. Since "John" is a terminal element, no more rules apply to it, but rule 5 applies to Aux. One can replaceit with "will" and get "John will" VP. Now rule 2 applies to VP. Replacing VP with Verb NP yields "John" Verb NP. Passing rapidly over the remaining steps,rules 6, 4,7, and 8 apply in turn, yielding "John will eat the ice cream," a sentencederived by this grammar. This derivation not only says that this sentence can be generatedby the given grammar but it also specifies, by means of the rules that were applied, what the implicit structural relationships are in a sentence,someof which are of linguistic importance. For instance, by defining the subject of a sentence as the first NP immediately contained in S, it is clear by inspecting the derivation that "John" is the subject of the sentence.This information can be made explicit either by recording the sequencerules that were applied to generate each sentenceor by associatingwith the grammar that explicitly marks phrase boundaries by wrapping left and right brackets around each nonterminal symbol, Iabeled with the name of that symbol. In addition, just to get things going, the grammar must include a special initial rule Start + [sS1:
will]lvp fv [s [Np [N"*"J ohn][Auxiliary "raeatf -cream]lJl [Np [o"t/he][Noonice Conventionally, rule systemslike the onejust describedare augmented to exclude the possibility of generating nonsentenceslike "The ice cream ate" or "John took." To do this, the context-freelexical rules of the original grammar are replaced with context-sensitiverules (10,15)that rewrite the symbols like Verb or Noun only in certain contexts. For example, the symbol Verb should be replaced with a verb like "took" only when there is an NP object to the verb's right. The theory of transformational grammar from the time of the 1965 work by Chomsky, "Aspectsof the Theory of Synto)c"(11), has placed such context-sensitive lexical insertion rules in the diction ary rather than in the base phrase structure component.That is, instead of using a context-freerule to replace the symbol Verb with an appropriate work, the basegrammar is expandeduntil there are just symbols like Verb, Noun, or Determiner left. Then the diction ary is consulted to see what words can be inserted given the surrounding context of the other symbols. For example, the dictionary entry for eat might look like this: eat: Verb, Noun[ f Animate] Auxili ary-,-Determiner Nounl -Abstract] This entry says that "eat" is a verb; that is can occur to the right of an animate noun (Iike "John") followed by an auxiliary verb; and that it can occur to the left of a determiner followed by a non-abstract noun (like "ice cream" but not like "fear"). In addition, the dictionary contains implicational statementsof the form: If a word is a person'sname, then it is also an animate noun. Therefore, the verb can be replaced in a sequenceof symbols like Name Verb Determiner Noun with the word "eat" because the diction ary entry for this word meets all the given context conditions. Together,the dictiondty, consistingof lexical insertion constraints and implicational rules, plus the base phrase structure rules generate the set of possiblebase phrase structures. At one time these were called deep structures, to indicate that they "underlay" the surface forms of sentences,but this terminolory proved confusitg; such forms are not "deeper" in the sense that they are more fundamental or their meaning is deeper.This terminology was therefore discarded (11,L2).
TransformationalComponent.Referring to Figure L, the basestructures may now be fed to the transformational component, where zero or more transformations can be applied to generate additional sentences;the output of this processis a surface structure, ready to be "spelled out" and pronouncedas a sentence.If no transformations apply, the surface structure (1) S + [NpNP] (0) Srart + [sS] is the same as the base phrase structure. This will roughly be [AuxliarrAuxil i arY] IvPVP] the casefor ordinary declarative sentences,such as "John will (3) NP + [N"rr,.Name] (2) VP + [veruVerb][NpNP] eat the ice cream." If transformations do apply, they produce new phrase markers, such as "Will John eat the ice-cream." (5) Auxiliary + will (4) NP -+ [o"t"r-ir,".Determiner] Each transformation is defined by a structural description INoorrNoun] defines the phrase markers to which it can apply and a that (T) Determiner + the (6) Verb + ate struetural change that describeshow that phrase marker will be altered. That is, a transformation must match the descrip(9) Name -> John (8) Noun -> ice cream tion of a phrase marker and producesas output a new phrase marker. Further transformations may then apply to this new If the reader follows through the derivation of "John will phrase marker, and so on. In this sense,transformations are it grammar, will new eat the ice cream" as before,but with the generated, like an if-then production rule system,with the domarn much be will of symbols string following the be seenthat of the rules being phrase markers. the phrase marker or labeled bracketing of the sentence:
TRANSFORMATIONAL 357 GRAMMAR, For instance, one such transformation creates an interrogative sentenceby starting with a phrase marker of the form X wh Y, where X and Y are any string of symbols in phrase markers, and wh is any phrase with a wh at the front of it, like "whatr" "who," or "what ice cream." It then moves the wh phrase to the front of the sentence. For example, given the phrase marker corresponding to the sentence"John will eat what" (L2), the phrase marker portion corresponding to "John will eat" matches X, "what" matches the wh phrase portion of the transformation, and the empty string matches Y; therefore, this transformation can apply. Moving the wh phrase to the front gives "What John will eat." An additional transformation, subject-auxiliary inversion, can now apply to this new phrase marker, interchanging the NP "John" and the auxiliary phrase "will" to produce the question "What will John eat." Note that transformational rules manipulate only whole phrases, Iike the wh phrase above. Conventionally, structural descriptions and structural changesare written by labeling the elements in the pattern to be matched with numbers and then showing how those elements are changed (moved,inverted, or deleted)by indicating the appropriate changes on the numbers. In this format, for example, the wh phrase rule would be written as follows: Structural description: (X, wh, Y (L,2,3) (2,3,1) Structural change: Extended Standard Theory. As described, this version of transformational grammar, the standard theory, was current from the mid-1960s to about 1970. In this theory deep structures were also the input to another component of the grammar, dealing with semantic interpretation and then, ultimately, rules of inference, belief, and so forth. Among other reasons,this position was discardedwhen it becameclear, for example, that sentenceswith the same base structure could have different meanings. As a simple example, consider the sentences"Everyone in this room speaks two languages" versus "Two languages are spoken by everyone in this room." If Bill and Sue are in the room, the first sentence is usually taken to mean that Bitl speaks, for example, German and French, and Sue speaksEnglish and French-they each speak two languages but not necessarily the same languages. The second sentence is different: It is ordinarily interpreted to mean that there are exactly two languages-the same two languages-that everybody in the room speaks. But, assuming that the secondsentenceis derived from the first by the passivetransformation, this means that both sentenceswould have the same deep structure-and therefore the same meanitg, unless something more than just deep structure enters into the determination of meaning. To deal with such probleffis, among others, the extended standard theory (EST) of the early 1970saddednew representational devices and new constraints designedto simplify the statement of transformations and give a better format for semantic interpretation (L2,13). First, it was proposed that when a phrase is moved by a transformation, it leaves behind an empty category of the kind moved, a trace, indicating the position from which it was moved. For example, the wh phrase question transformation applied to "John will eat what" now gives: What John will eat [Npe]
where [xpe] denotesan empty NP or empty category(hencethe "e") that is the object of "eat." The theory assumesthat "what" and its trace are associated,for example, by the notation of coindexing: a subscript is assi$led to "What" (say, i) and the same subscript to [Npe]. This empty NP will not be "pronounced" by the rules of phonetic interpretation so that the final spoken sentencewill not reveal the empty category directly. The trace is to be understood as a kind of variable bound to "wh dt," and semantic interpretation rules will now assign to "what" the meaning "for which thing," thus yielding the following representation: For what thing X, will John eat X. In this way the enriched surface structure (now called S-structure) will provide the format for semantic interpretation and retain the relationships such as that between verb and object that were transparently represented by deep structure (now called D-structure to avoid any confusion with the earlier approach). Questions regarding the interpretation of sentences such as "everyone in this room speaks two languag€s," which involved mainly the interpretation of the quantifier-like terms "everyone" and "two," are now determined via the operation of rules that operate on S-structure, deriving a new level of representation quite closeto S-structure, but one that substitutes "for which thing X" for terms such as "what," binds traces considered as variables to their wh antecedents, interprets quantifiers, and so forth. This new representation, called LF (for logical form) completes the picture of the extended standard theory model, shown in Figure 2. Again, the diagram depicts simply the logical relationship among elements, not the temporal organization of a processing system that would use them. Constraints.The second major shift in transformational glammar from the mid-1960sthrough the 1970s,pursued today with renewed effort, involved the role of constraints. From the outset it was noted that the transformational rules for any particular grammar, soy, for English, would have to be quite complex, with elaborate structural descriptions. For example, the simple rule given earlier to move a wh phrase to the front of a sentence: (X, wh, Y) -
(2, 1, 3)
will give the wrong result applied to the following example even though the structural description of the rule matches: I wonder a picture of 12 What
what
is on the table
I wonder a picture of
is on the table
3-)
since after several other transformations it eventually produces the incorrect sentence, "what do I wonder a picture of is on the table." However, complicating the structural descriptions of rules leads to a large, complex rule system that becomes harder to learn since it is difficult to explain why a child would posit a complex structural description rather than a simple one like movin g a wh phrase to the front of the sentence. Starting about L964 linguists began to formulate constraints that allowed one to retain and even simplify transformational rules like "front wh." These constraints were not
358
GRAMMAR,TRANSFORMATIONAL Base grammar phrase-structure rules D-structures
J
Transformational rules
I
S-structures (with traces)
/\ LF rules
Phonological rules
JI
Logical form (LF)
Phonetic form eF)
Figure 2. Block diagram of extended standard theory (EST) (12,13).
part of any particular grammar, like English, but part of aII human grammars. For example, the A-over-A principle, applying to all transformational rules, states that one cannot move a phrase of type A out of another phrase of type A. This prohibits the wh phrase rule from applying in the errant example above,since "what," consideredan NP, cannot be moved out of the NP "a picture of." Further simplifications became possible when it was realized that many other particular transformations were just the result of such general principles operating in conjunction with just a few very simple, general transformations. As an example, in earlier work in transformational grammar there were many different rules that acted on NPs, among these, a passive transformation, exemplified below: John ate the ice cre(fin. The ice cream was eaten by John. This rule could be written with the following structural description and structural chatrge,moving the third element to the front, adding a past tense "be" form, and altering the tense of the verb (details of the latter change being omitted here) and moving the subject after a "by" phrase to the end: (NP,
V,
NP)
1
2
3+
(3,
be-en 2
bY 1)
Another transformational rule affecting NPs, called "raising," moves the NP of an embedded sentence to an empty position at the front of a sentence: e seernsJohn to like ice cream. John seernse to like ice cream. Given other, general, constraints, modern transformational theory shows that there is no passive or raising rule but just the interaction of the constraints with the single rule MoveNP. In addition, the rule moving wh phrases can be shown to have many properties in common with the Move-NP rule, so that in fact there is essentially just one transformational rule in the modern theory, namely, the rule Move-alpha, where alpha is any category (NP, wh phrase, etc.).There are no structural descriptions or structural changes in the statement of this single rule. It is therefore incorrect in the modern theory grammar to speak of a separate rule of of tranrfor*.tional passiveor raising (13,14).
To give a simple example of some of these constraints and how they simplify the statement of rules, consider again the passive transformation. In modern terms, the structure underlying the surface passive form would be: LSseenBill by John. The modern theory assumesthat there is a general principle requiring every NP that Will be pronounced to have Case, where Case can be roughly thought of in traditional grammatical terms, for example, "het" has objective Case, "she" has nominative, and so on. This is called the case filter. Case is assigned by transitive, tensed verbs: nominative case to the subject, objective case to the object; case can also be assigned via a preposition to its object. Verbs with passive tense, like "seen," are in effect intransitive; therefore, they cannot assign case. The result is that unless "8i11" moves, it does not get case;therefore,theruleMove-alpha(wherealpha plies, moving it to the empty subject position and obtaining the passive sentenceform. Similarly, in e seerrlsJohn to like ice crea'rrl "John" doesnot receive casebecauseit is in a sentencewithout a tensed verb. Therefore, it must move to a position that does receive case, for example, the subject position. In both cases, movement to the subject position is not a property of the rule of passive or raising but is a side-effectof the casefilter along with the particular properties of verbs for a given language. Much current work is devoted to describing the variation from language to language in these constraints, dubbed parameters, so that the full range of human languages can be ac' counted for. For example, in Rornancelanguages like Spanish and Italian, the object of a passive verb can still assign case, and so something like the surfaceform "was seenBill by John" is permitted. Besides the casefiIter, in the modern theory of transforrrrstional grammar there are a variety of other constraints that interact with the rule Move-alpha to yield particular surface sentences.Among the most important of these are certain locality principles that timit the domain over which a phrase can be moved. One such principle, Subjacency,states that Movealpha cannot move a phrase more than one sentence away from its starting position. This prohibits surface sentenceslike the following, where "John" is moved acrosstwo sentences("it is certain" is one and "to tike ice cream" is the other), and allows movement acrossjust one sentence.Note that the permitted example also substitutes a dummy "it" into the subject position, a particular requirement in English but not in Romance languages tike Italian or Spanish, that will not be covered here. e seerrlse is certain John to like ice cream. John seernsit is certain e to tike ice cream' (forbidden) It seemsJohn is certain e to like ice cream. (allowed) X-Bar Theory. Besides these constraints on transformations, there have been significant restrictions placed on the system of basephrase structure rules (generating the set of Dstructures). It was noted by L970that a (Noun, Verb, Preposition, Adjective) phrase consistsof a head that is itself a noun, verb, preposition, or adjective, respectively, and its comple-
GRAMMAR, TRANSFORMATIONAL
ments as determined by the head. For example, a verb such as "eat" takes optional NP complements corresponding to the thing eaten and an instrument used for eating: "John ate the ice cream with a spoor," while a verb like "know" takes either an NP or a sentence complement: "John knows Bill, John knows that Bitl likes ice cream." Importantly, the same requirements show up in the noun forms of the verbs, if there is one: "knowledge that BiIl likes ice creaffi, knowledge of Bill." This suggeststhat we need not write separate rules to expand out verb and NPs but one general template for all phrase types in a language. This subtheory of transformational grammar is called X-bar theory, after the idea that all phrases may be built on a single framework of type X, whete X is filled by the values of word categorieslike verb, noun, or preposition to get verb, noun, and prepositional phrases. Following X-bar theory, modern transformational grammar stores the properties of words in a dictionary, along with that, in English, the complementsfollow the head and that all complement restrictions of a head as expressedin the dictionary be represented in the syntactic representation as well; this last constraint is called the projection principle. If this is done, then an elaborate set of phrase structure rules is not neededto generate D-structures; all that is neededis the set of diction ary entries plus the general constraints of X-bar theory and the projection principle.
359
tences roughly in the form S + NP-auxiliary verb-VP. Instead of a transformational rule of subject-auxiliary verb inversion, there is in effect a separate context-free rule expanding S as Auxili ary Verb-NP VP. The systematic correlation between inverted and noninverted forms is captured by an implicational statement, a metarule in a metagfammar, stating that if there is a rule in the noninverted form, then there is a rule in the inverted form. However, there is no notion of a derivation from a D-structure to an S-structure. Such theories are sometimes said to be monostratal becausethey posit just one level of syntactic structure, in contrast to a multiple-level approach like GB theory. In addition, the effect of transformational movement is accommodatedby expanding the set of nonterminal names to record the link between a displaced element and the position from which it is interpreted in the predicate-argument sense.For example, in the sentence"who did John see," one can augment the sequenceof phrases to record "who" as being interpreted in the position afber "see" as follows: Who [S/wh did John [VP/wh seetwh/Wh e]l
Here, the categoriesS/wh and VPlwh record the link between the position after "see" (marked with Wh/Wh) and "'Who." Note that there is a phonologically empty element after "see," as there would be in the transformational analysis. The differenceis that this is generatedby a context-freerule rather than a transformation. by recent version of most The Government-BindingTheory. Lexical-functional grammar (qv) also avoids transformatransformational gTammar, known as government-binding but retains a muttiple-levels approach (3)' Instead of Dtions theory (12-L4), incorporates all of the changes described S-structure, lexical-functional grammar proposes above: X-bar theory and general principles, instead of base structure or (F-structure) and phrase structure rules, and reduction of transformations to a the representations of functional structure (C-structure). differs from F-structure structure constituent single transformational rule Move-alpha and general congrammatical relations such as it in takes that D-structure straints on its application (the casefilter, subjacehcY,etc.).In (the prepositional a of object object and oblique object, subject, addition, a rich source of investigation in the governmentprimitives. C-structure, generated by conbinding theory centers on the rules that map from S-structure phrase) as central gives a representation of phrasal and hierarrules, text-free to LF, having to do with the relationship between traces and pairing of F-structures and C-strucA relationships. chical the phrases that "bind" them (in the senseof variable bindgrammatical relations like subject and object associates tures primitive configuing); and the constraints governing certain to phrasal elements. In this theory some variations in surface rational relationships, such as that between a verb and its sentences,such as subject-auxiliary verb inversion, are gencomplements,dubbed the notion of government. The resulting directly by C-structure context-free rules, whereas erated picture is quite different in detail from earlier work, as there like passive sentences,are produced by the variations, other research are now no particular transformations; the bulk of now focuseson discovering particular patterns of constraints operation of rules in the lexicon, which convert active verbs to their passive forms and alter the default association of or parametric variation from language to language in the way grammatical relations with syntactic (C-structure) elements. the caseassignment,X-bar theory, or locality principles apply. Like generalized phrase structure grammar, lexical-funcNonetheless, the underlying principle of the theory still tional grammar has been argued to provide a better represenin language by possible a sentences the stands: to describeall means of a factored set of representations plus mappings be- tation for computational operations as well as a more adequate basis for describing languages that evidently do not dependon tween these representations. syntactic configurations to fix the association between verbs and their arguments like subject and object. This claim reAlternative Theories. As mentioned earlier, this model is mains to be established. still quite controversial. The existenceof representations like Government-binding theory is a topic of current research traces and constraints like the projection principle have been and as such is undergoing continual change; for details of recalled into question, as well as the multilevel organization of cent work the reader is urged to consult a survey such as that grammar assume the as whole. Competing approachesoften by Van Riemsdrjk and Williams (13) or the journal Linguistic that there is a single level of phrase structure, rather than a derivation from D- to S-structure. Other alternatives empha- Inquiry. size different representationsthat avoid the use of traces. The following two examples illustrate these alternatives. TransformationalGrammarand ComputationalModels Generalized phrase structure grammar (qv) (4) generates all possible surface structures via a single context-free gram- Several computational models have incorporated one or anmar. For instance, there is a rule expanding declarative sen- other version of transformational grammar. These may be di-
360
GRAMMAR,TRANSFORMATIONAT
vided into two sorts: those basedon Asp ectsstyle grammars, ca 1965, or even earlier versions of transformational grammar, and those based on post-Aspec/smodels. In general, the more recent post-Aspectsmodels have proved to be more adaptable to computational implementation, whereas the earlier versions did not prove very useful for direct computational analysis (16). Early approachesfor using transformational grammars for sentenceanalysis adoptedthe model of Syntactic Structures or Aspects.In these models a sentenceis generatedby the operation of the base component followed by zero or more transformations. Sentenceanalysis is the reverse of this: A procedure must start with a sentence,such as "The ice cream was eaten by John," and then must try to determine how it could be derived given a particular context-free base grammar and set of transformations. If no transformations are involved, this problem reduces to that of standard context-free parsing, for which there are several known efficient algorithms. However, surface sentencesthat are derived by transformations-like the examplejust given-are not directly generatedby the base but by a sequenceof one or more transformations. Inverting this processmay be difficult becausetransformations can delete someparts of a sentenceand rearrange others, and certain other transformations may be optional. The problem is that transformations work only on trees, so to compute an inverse transformation, an algorithm must start with a tree. However, given an input sentence, the procedure does not have a tree but only a string of words. For example, in the sentence"The book which was written by the Boston author became a best seller," an Aspectstheory might proposean optional transformation to delete "which was," yielding "the book written by the Boston author becamea best seller." To analyzethis sentence, a computer procedure must guess that material has been deleted and apply an inverse transformation. In general, since deleted material doesnot appear in the surface sentence and since the inverse of transformational rules may not be functions, this procedure is highly nondeterministic. One approach to solving the sentence analysis problem is purely enumerative. Given some transformational grammar and an input sentence,one can try to generate or synthesize all sentencesone after the other. After each sentenceis generated, it is checked against the input sentence to see if it matches;if it does,the sentenceis recognrzed;if not, the procedure keeps going. This procedure, analysis by synthesis, is computationally expensivebecausein general one would have to generate many imelevant candidate sentencesthat could not possibly be related to the actual input sentence.For instance, it makes little senseto analyze our example sentence "the ice cream was eaten," as an active sentence,but the analysis-by-synthesisprocedure will blindly try to do so. In addition, the procedure was judged to be psychologically unrealistic, becauseit calls for the entire sentenceto be read before any attempted synthesis match is attempted. Becauseof these problems, analysis by synthesis was not considereda serious algorithm for transformational-sentence analysis (16,17). Instead of enumerating all possibilities, the transformational parsers of the 1960s used a two-step procedure: First, analyze the sentence using a context-free grammar that generates a supersetof the possiblesurface sentencesproducedby the transformational grammar. This gives a candidate set of trees to work with. The secondstep applies inverse transfor-
mations to these trees, checking to seeif a tree that could have been generated by the base grammar is obtained. The Petrick System.The most widely known algorithms built along these lines were developedin the mid-1960s by Zwtcky and colleaguesat Mitre (18) and by Petrick at MIT and then IBM (18). The Petrick system (19) was originally designed to automatically provide a parsing procedure, given a transformational grammar. A revised version of this system is part of a question-answering system earlier called REQUEST and now dubbed TQA (19,2L). The original Petrick system contained a set of reverse transformations that mapped sentences to candidate deep structures. The idea was to have an automatic procedure to construct the reverse transformations given the structural descriptionsand structural changesof the original transformational grammar. The deepstructures were then parsed by a (context-free)base grammar component.The problem here is that the processof reversing transformations is still highly nondeterministic. For example, given the sentence "John was certain that Bill liked ice cream," such a parser would have to guessthat "John" was originally moved from an embeddedsentence,as it would be in "John was certain to like ice cream." To get around this difficulty, one must try to make the reverse transformations restrictive enough to cut down on the possibility of multiple reverse transformations applying to the same sentence.The current Petrick system, TQA, uses a restrictive set of reverse transformations that operate on trees rather than sentence strings, with results good enough for efficient question answering. For the most part, though, prosess on transformational parsing was blocked by computational difficulties QL). The Marcus Parser. The first modern transformational parser, based on the extended standard theory of the mid1970s,was that of Marcus (22). Marcus developeda computer progTam that would map from a surface sentence, such as "John was persuaded to leave," to its surface structure (Sstructure) representation as defined by the extended standard theory, indicating phrase boundaries as weII as traces. (The subscript I indicates coindexing.) [s [NpJohni]fvp was persuadede; [s ei fvp to leaue]lll The Marcus parser PARSIFAL used a basically bottom-up design in conjunction with a set of production rules that took the place of reverse transformations. That is, it would wait until it had seenseveral words of the sentenceand then build a piece of the analysis tree corresponding to the S-structure analysis of those words. Each rule, called a grammar rule, had a pattern and an action. An important element of the parser was the addition of a three-cell look-ahead buffer that could inspect upcoming words in the sentencein order to determine what action to take. The pattern was a triggering predicate that could look at part of the S-structure analysis already completed plus the words in the look-ahead buffer. The action would build a tiny piece of the output S-structure representation. For example, given the following sequenceof elements in the input buffer: was eaten o the Marcus parser could determine that a trace should be inserted after "eaten," thus undoing the effect of the Move-NP
GRAMMAR,TRANSFORMATIONAL
rule while building the S-structure representation corresponding to this sentence. It is instructive to seehow this design avoids the problems of early transformational parsers. The key problems with standard transformational parsing were: constructing candidate trees suitable for inverse transformations; correctly determining what elements had been deleted, if any; and guessing whether an optional transformation had been applied. The secondproblem is handled by relying on the extended standard theory representation. In this theory nothing can be deleted without leaving behind a trace, and there are other severe constraints that limit the appearanceof traces (such as the locality principles and case filter described above). The first problem, constructing candidate trees, is also aided by building S-structures rather than D-structures. Since the Sstructure representation looks very much like that produced by a context-free grammar, plus the addition of traces, it now becomespossibleto just build phrase structure directly rather than performing string-to-string transformational inverses or first constructing a partial tree and then applying reverse transformations to it. Finally, the third problem, determining which transformational rule may have applied, is greatly alleviated by means of the look-ahead buffer. In most casesMarcus argued that this reducedthe candidate choicesdown to just one; that is, the problem of mapping from an input sentenceto an S-structure became deterministic. Those sentenceswhere parsing remained nondeterministic included casesof obvious global ambiguity ("They are flying planes") or cases where people are not able to analyze the sentencedeterministically (such as "The horse raced past the barn fell," so-calledgarden path sentences)(L7,22). Becauseit was deterministic, the resulting parsing procedure was also quite efficient. Large-scaleversions of the Marcus parser are now being developedfor several industrial applications, including speechrecognition at Bell Laboratories. Marcus's model has also served as the basis for several more recent transformational models, some grounded more explicitly on X-bar theory, the casefilter, and the like (12).Work on adapting the principles of government-binding theory to this design is currently underway.
BIBLIOGRAPHY
361
Jakobovits (eds.),An Interdisciplinary Reader in Philosophy,Linguistics, and P sycltology,CambridgeUniversity Press,Cambridge, UK, L97L. 9. R. Hudson, Argurnents fo, a Non-transformational Grammar, University of Chicago Press, Chicago, IL, L}TL Presents arguments against transformational approachesgenerally. 10. N. Chomsky, Syntactic Structures, Mouton, The Hague, lgb]. 11. N. Chomsky, Aspects of the Theory of Syntax, MIT Press, Cambridge, MA, 1965. Summarizes transformational theory as of 1965. t2. B. Radford, Transformational Syntax, Cambridge University Press,Cambridge, MA, 1981. Gives a textbook introduction to the extended standard theory. 13. H. Van Riemsdijk and E. Williams, Introciuctionto the Theory of Grammor, MIT Press,Cambridge,MA, 1986. L4. N. Chomsky,Lectureson Gouernmentand Binding, Foris Publications, Dordrecht,The Netherlands, 1982.The first full-scaletreatment of the current theory of transformational grammar (ca 1985). 15. T. Winograd, Language as a Cognitiue Process,Vol. 1, AddisonWesley,Reading,MA, Chapter 4, 1983.Gives a short introduction to the aspectstheory. 16. J. Fodor, I. Bever, and M. Garrett, The Psychotogyof Language, McGraw-Hill, New York, L974. Gives psychologicaland computational studies of the relevance of transformational grammar up through the late 1960s. L7. R. Berwick and A. Weinberg, The Grammatical Basisof Linguistic Performance,MIT Press, Cambridg", MA, 198b. 18. A. Zwicky, J. Friedman, B. Hall, and D. Walker, the MITRE Syntactic Analysis Procedure for Transformational Grammars, Proc. 1965Fall Joint Computer Conference,ThompsonBooks,Washington, DC, 1965. 19. S. Petrick, Transformational Analysis, in R. Rustin (ed.),Natural Language Processing,Algorithmics, New york, 1gZB. 20. W. J. Plath, "REQUEST: A natural language question-answering system,"IBM J. Res.Deu.20,1?B-BB5(1926). 2I. F. Damerau, "Operating statistics for the transformational question answering system," Am. J. computat. Ling. 7(L), 82-40 ( 1 9 8 1) . 22. M. Marcus, A Theory of Syntactic Recognition for Natural Language, MIT Press, Cambridg", MA, 1980. 23. K. Wexler and P. Culicover, Formql Principles of Language Acquisition, MIT Press,Cambridge,MA, 1980. 24. R. Berwick, The Acquisition of Syntactic Knowled,ge,MIT Press, Cambridge,MA, 1985. Gives a computer model of acquisition.
1. N. Chomsky, LogicalStructureofLinguisticTheory,Chicago University Press,Chicago,IL, 198b. 2. F. NewmeY€r,Linguistic Theory in America,Academicpress, General Referenees New York, 1980.Providesan intellectualhistoryof transforma- Current research in transformational grammar may be found in the tional grammar. 3. J. Bresnan, The Mental Representationof Grammatical Relations, MIT Press,Cambridge,MA, 1983. 4. G. Gazdar, E. Kleitr, G. Pullum, and I. Sag, Generalized,phrase Structure Grarwnar, Harvard University Press, Cambridge, MA, 1985. 5. D. Perlmutter, Studies in Relational Grammar,IJniversity of Chicago Press,Chicago,IL 1985. 6. D. Johnson and P. Postal, Arc Pair Grammar-1980. Princeton University Press, Princeton, NJ 1980. 7. M. Brame, Base GeneratedSynfor, Noit Amrofer, Seattle, WA, 1978. 8. G. Lakoff, "On Generative Semantics," in D. Steinberg and L. A.
journal s Linguistic Inquiry, and The Linguistic Reuiew.For opposing viewpoints consult the journals Natural Language and Linguistic Theory, Linguistics and philosophy, and,Language. proceedings of the Linguistic Society of America Conference(LSA); Proceedings of the Meetings of the Chicago Linguistic Society (CLS); and Processingsof the New England, Linguistic Conference (NELS) are good sourcesfor extremely recent work, both pro and con. R. BenwrcK MIT
GRAMMAR,wEB. See Grammar, phrase-structure.
362
GUIDON
GUIDON An automated tutoring system for teaching about any domain representable by EMYCIN, GUIDON was written in L979 by Clancey at the Stanford Heuristic Programming Project. GUIDON explores the problem of carrying on a coherent, taskoriented mixed-initiative dialogue with a student by expert
HACKER A program by Sussmanthat createsplans for solving problems in the "blocks world" (seeG. J. Sussman,A ComputerModel of Skill Acquisition, Elsevier, New York, 1975).HACKER's creation was guided by introspection of the human problem-solving process.HACKER is viewed as a programmar who first tries to find a solution to a given problem by looking into an "answer libraty." If no answer is available, the programmer tries to "write" aplan by adapting a known plan with a similar "activation pattern." A "criticizer" then looks for any bugs in the plan and tries to use "patches" to fix them. Skill acquisition is achieved by generalizing and reusing these patches. The implementation of HACKER is basedon the CONNIVER (qr) language [see D. V. McDermott and G. J. Sussman,The CONNMR ReferenceManual, AI Memo 259, MIT AI Lab, Cambridge, MA (May 1972)1.
systems (see W. J. Clancey, Dialogue Management for RuleBased Tutorials, Proceedingsof the SfucthInternational Joint Conferenceon Artificial Intelligence, Tokyo, Japan, pp. 1551 6 1 ,t 9 7 g ) . M. Tem SUNY at Buffalo
communicating through a blackboard allows island parsing [seeV. Lesser,R. Fennell, L. Erman, and D. Reddy,"Organiza' tion of the HEARSAY-II speechunderstandittg system," IEEE Trans. Acoust. SpeechSig.Proc. ASSP-23, LL-24, 1975,and L. Erman, F. Hayes-Roth, V. Lesser, and D. Reddy, "The HEAR-SAY-II speech understanding system: Integrating knowledgeto resolveuncertainty," Comput.Suru. L2(2),2L3253, 19801. YuHax A. HaNYoNG SUNY at Buffalo
HERMENEUTICS
Recent debatesabout the theoretical foundations of AI refer to hermeneutics, the branch of continental philosophy that treats the understanding and interpretation of texts. Applying certain hermeneutic insights, Dreyfus (1), Winograd (2), and and Flores (3) have questionedthe functionalist cogWinograd J. Gnr,lnn SUNY at Buffalo nitive-scienceparadigm that guides most contemporaryAI research,particularly in natural-language processing(seeNatural-Ianguage entries) and common-sensereasoning (see Reasoning, Commonsense).Dreyfus draws upon the hermeneutic HARPY philosophy of Heidegger (4) to deny the possibility of form altzintelliA speech-understanding(qv) system, HARPY was written by ing mental processesand therefore creating artificial (March, Dreyfus 1986), personal communication gences. a [In Lowerre in L976 at Carnegie-Mellon University under ARPA his views and now Speech-Understanding Research project. Understanding a indicated that he has recently moderated necessarily impossible.l not but difficult AI very considers precompiled path in a transition of a as is realized sentence based on a network of words, where each word is a template of all possible Winograd and Flores reach a similar conclusion Yet, in addiargument. technical informed hermeneutically allophones [see B. Lowerre, The HARPY SpeechRecognition illuminate may hermeneutics of doubts, source a being to tion System, Ph.D. Dissertation, Carnegie-Mellon University, and understanding and Pittsburgh, PA, L976, and B. Lowerre and R. Reddy, The problems like the nature of meaning paradigm (2). functionalist the HARPY Speech Understanding System, in W. Lea (ed.), thereby hetp reconstruct for AI rehermeneutics of relevance the help clarify To Trends in Speech Recognition, Prentice-Hall, Englewood hermeneuof major strains the reviews first entry this search, Cliffs, NJ, pp. 340-360, 19801. tic thought. These positions include the naiue hermeneutics YusnN of early modern Europe and Dilthey's (5) more historically A. HarqYoNG conscious, nineteenth-century methodological hermeneutics, SUNY at Buffalo which sought to produce systematic and scientific interpretations by situating a text in the context of its production. In the twentieth century Heidegger's @) and Gadamer's (6) philo' HEARSAY-II sophical hermeneuticsshifted the focus from interpretation to existential understanding, which was treated more as a direct, A speech-understanding(qv) system, HEARSAY-II was writnonmediated, authentic way of being in the world than as a ten by Lesser et al. in L976 at Carnegie-Mellon University project. Asynway of knowing. Reacting to the relativism of this position, under ARPA Speech-Understanding Research Apel (7) and Habermas (8) introduced critical hermeneutics-a modules knowledge-source of different chronous activation
HERMENEUTICS
methodologically self-reflective and comprehensive reconstruction of the social foundations of discourseand intersubjective understanding. FinaIIy, Ricoeur (9), in his phenornenological hermeneutics, attempted to synthesize the various hermeneutic currents with structuralism and phenomenology. This background situates AI researchers and critics who draw from the various hermeneutic traditions. In their investigations of the affective structure of texts and in their concern with systematic rules for identifying the genres of narrative, Alker, Lehnert, and Schneider(10) in effect pursue a classical hermeneutical progTam tempered by phenomenologicalhermeneutics.Other researchers(2,LL,L2)draw from philosophical hermeneutics to propose strategies for developing computer systems that understand natural language. A third approach(3), aligned with philosophicalhermeneutics,argues that computer understanding of natural language is exceedingly difficult and probably intractable. A fourth group (13) has developedan implementation guided in part by ideas from phenomenological hermeneutics but informed by the other variants as well. Hermeneutic theories differ in several characteristic ways from approachesto meaning and understanding that are better known to AI researchers.Hermeneutics grounds the meaning of texts in the intentions and histories of their authors and/or in their relevance for readers.In contrast, analytic philosophy usually identifies meaning with the external referents of texts, and structuralism finds meaning in the arrangement of their words. Hermeneutics regards texts as means for transmitting experience,beliefs, and judgments from one subject or community to another. Hence the determination of specific meanings is a matter for practical judgment and commonsense reasoning-not for a priori theory and scientific proof. This attitude reflects the origin of hermeneutics in ancient-world efforts to determine systematically the meaning, intent, and applicability of sacred and legal texts. Hermeneutic theories and applications also share the idea of the hermeneutic circle or the notion that understanding or definition of something employs attributes that already presupposean understanding or a definition of that thing. Circles or spirals of understanding arise in interpreting one's own languaga, d foreign language or an observed action, in confirmin g a theory and in distinguishing between background knowledge and facts (14). The existenceof these circularities raises questionsfor hermeneutics regarding the grounding and validity of understanding. The philosophical conceptof the hermeneutic circle resembles the distinctly computational notion of bootstrapping-a processthat uses a lower order component(a bootstrap component) to build a higher order componentthat is used in turn to reconstruct and replace the lower order component.Bootstrapping has been introduced in the design of certain knowledge bases(15-17) and in Al-oriented theories of cognitive development (18-21) and should be distinguished from hierarchical layering in systems that do not include the "strange loop" of replacing the bootstrap component.The similarity of the hermeneutic circle and bootstrapping suggeststhe possibility of an important contribution from hermeneutics to AI architectures for natural-language processing and for commonsense reasoning. ClassicalMethodologicalHermeneutics Origins. Hermeneutics as a general scienceof interpretation can be traced back to more domain-specificapplicationsin
363
the ancient Greek's study of literature and in ancient Biblical exegesis. The word hermeneutics was coined in the seven"to teenth century (22) on the basis of the Greek hermeneu,ein, an of a text, a declamation equally interpr et:' which signified explanation of a situation, or a translation from a foreign tongue. (Hermeneueinitself derived from the name of Hermes, the winged messengergod of ancient Greece,who both delivered and explained the messagesof the other gods.)Regarding texts as organic or coherent wholes rather than as collections of disjointed parts, the Greeks expecteda text to be consistent in grammar, style, and ideas. Accordingly, they codified rules of grammar and style that they used to verify and emend textual passages.By extending the logic of part and whole to a writer's or school'sentire output, the Greeks were also able to attribute works with uncertain origin. Although the Jewish Rabbis and the early Church Fathers deployed similar philological tools, their biblical exegeseswere better known for the developmentof allegorical readings, frequently at the expense of the texts' literal meaning. Their interpretations found within the visible sign a hidden sensein agreement with the intention they beforehandascribedto the text. Since instances of this method are found for the Vedas, Homer, the Koran, and other sacredwritings, it seemsa typical strategy for reconciling an enlightened or moral world-view with texts whose"outward" earthiness or banality seemsbeneath the dignity of the gods being celebrated(23). The Middle Ages witnessed the proliferation of nonliteral interpretations of the Bible. Christian commentators could read OId Testament stories simultaneously as precursors of analogous episodesin the New Testament, symbolic lessons about Church institutions, and allegories about spiritual traits (24). In each case the meaning of the signs was constrained by imputing a particular intention to the Bible, such as teaching morality, but these interpretive baseswere posited by the religious tradition rather than suggestedby a preliminary reading of the text. Thus, when Martin Luther argued that Christians could rediscover their faith by reading the Bible themselves, Catholic Church officials not surprisingly respondedthat the Bible was too obscureto read without their guidance. The Protestant exegesis, which appeared after Luther's translation of the Bible, tended to view the texts as responsesto historical or social situations rather than expressions of theological principles. Assuming that the New Testament documentedthe Christian faith, one reader's guide proposedthat contradictory statements and difficult passagesin the New Testament could be clarified by comparing their possible meanings with contemporaneous Christian practices. The example suggeststhat interpretation might rely on empathetic understanding, the interpreter's self-projection into the author's space.Indeed, it was just such empathy that Schleiermacher and Dilthey raised to a methodological principle in their attempt to create a general hermeneutics. MethodologicalHermeneuticsof Schleiermacherand Dilthey. Schleiermacher (25) proposed to join classical philology'sfocuson grammar and style and biblical exegesis'concern for themes, creatin g a general hermeneutics with principles independent of domain-specific interpretation principles. Schleiermachercomparedthe reader's approachto a text with the efforts by participants in a dialogue to understand each other, and he depicted the dialogue in terms of a speaker who puts together words to expresshis thoughts and a listener who understands this speechas part of a shared language and as
364
HERMENEUTICS
part of the speaker's thinking (26). The listener can comprehend the words and sentences because they are drawn from the language's lexicon and follow its grammatical rules, but the listener can also recognizethe intentions behind the words by virtue of being in the same situation and sharing a common human nature with the speaker. Since Schleiermacher'sconcept of understanding includes empathy (projective introspection) as weII as intuitive linguistic analysis, it is much richer than the idea in modern communication theories that understanding is merely the decodingof encodedinformation. Interpretation is built upon understanding and has a grammatical, as well as a psychological moment. The grammatical thrust has a bootstrapping flavor: It places the text (or expression) within a particular literature (or language) and reciprocally uses the text to redefine the character of that literature. The psychologicalthrust is more naive and linear. In it the interpreter reconstructs and dxplicates the subject's motives and impticit assumptions. Thus Schleiermacher claimed that a successfulinterpreter could understand the author as well as, or even better than, the author understood himself because the interpretation highlights hidden motives and strategies. Broadening Schleiermacher'shermeneutics, Dilthey Q7) developeda philosophy of method for history and the human sciencesthat he believed could produce objective knowledge but avoid the reductionist, mechanistic, ahistorical explanatory schemaof the natural sciences.Dilthey argued that texts, verbal utterances, work of art, and actions were meaningful expressionswhose "mental contents" or intentions neededto be comprehended.He claimed that investigating human interactions was more like interpreting a poem or discoursethan doing physics or chemistry experiments (5). Dilthey termed the desired comprehensionof events and expressions"understanding" (uerstehen)and attempted to distinguish it from the explanatory knowledge (erkennen) generated by the hypothetico-deductivemethod of the natural sciences. Dilthey initially followed Schleiermacherin identifying understanding as empathy guaranteed by the notion of a common human nature. Although he recognizedthat the outlook and values of people varied over different historical periods and cultures, Dilthey argued that becausehistorians themselves thought and acted, they could relive and understand what people in the past were trying to expressand accomplish in their writings, speeches,actions, and art. Nevertheless, many of his contemporaries criticized this position becauseit relied on introspection and an underspecified,noncritical psychology. Stung by this criticism and influenced by the neoKantian idea that works of art and literature embodied the formal values of their respectiveperiods, Dilthey revised his position. He began to emphasizethat texts and actions were as much products of their times as expressionsof individuals, and their meanings were consequentlyconstrainedby both an orientation to values of their period and a place in the web of their authors' plans and experiences.In this revision meanor worldings are delineatedby the author's weltanschauutug, view, reflecting a historical period and social context. Understanding (uerstehen), the basis for methodological hermeneutics, involves tracing a circle from text to the author's biography and immediate historical circumstancesand back again. Interpretation, or the systematic application of understanding to the text, reconstructsthe world in which the text was produced and places the text in that world. [See Dilthey (5) for a sampling of Dilthey's writings on history and
the human sciencesand Ermarth (28) and Plantinga (2$ for their discussion.l This circular processprecludes an interpretation of a text from being unique and scientifically objective, like the explanation of a chemical reaction, inasmuch as knowledge of the author's or agent's world may itself critically depend on the present interpretation (14). Dilthey and his recent followers, Hirsch (30) and Betti (31), claim, however, that interpretations become more valid as they assimilate more knowledge about the author and the author's values, instead of reflecting the interpreter's own values or sense of reality. Dilthey's method in effect bootstraps from a whole (a biography, a set of works) whose themes may be repeatedly respecifiedthrough the elaboration of one of its parts (the action or work). The processeventually reachesstability becausesuccessiveinterpretations of the work or action serve to constrain subsequent refinements in the background model of the author. The strength and validity of such constraints dependson the currency and robustnessof that model. Increasesin temporal and cultural distance between the speaker and interpreter decrease the reliability of interpretation, but this neither foreclosesthe possibility of such a model nor deniesthe potential for a valid interpretation. Hermeneutics Philosophical Heidegger'sOntological Hermeneutics.In Being and Time (4) Heidegger undermines the notion of objectivity in Husserl's phenomenology(32) and, by extension,in methodologicalhermeneutics. [schmitt (33) and Zaner (34) present conciseoverviews, and Ricoeur (35) providesan extensiveanalysis of phenomenology(qv).1Husserl argues that objectiveinterpretation is possible using his transcendental phenomenological method, which requires bracketing the subjectivity inhering in the interpreter's life-world (Lebenswelt),the world of personal experience and desires. Heidegger denies that this bracketing is possible.He claims instead that the understanding of a situation is directly mediated by a foreknowledge, or sensitivity to situations, that is comprised by the understander's life-world. Therefore, suspending that life-world would preclude the possibility of understanding altogether. Heideggerreacheshis conclusionby contendingthat, as a necessary part of human "being-in-the-world" (Dasein), things are perceivedaccording to how they are encounteredand used in one's everyday routines and tasks. Perception and apprehension thus move from foreknowledge to an existential understanding, a largely unreflective and automatic grasp of a situation that triggers a response.This understanding must be incomplete because Dasein ts both historical and finite. It is historical in that understanding builds from the foreknowledge accumulated from experience.It is finite due to "throwness,"the necessity of acting in situations without the time or ability to grasp the full consequencesof actions or plans in advance.OnIy when actions fail to meet the exigenciesof the situation and "breakdown" occurs do individuals stand back and assume the theoretical attitude of science, which sees things "objectively," as discrete objects separate from the self and resistant to one'swiII. Heidegger brings hermeneutics from a theory of interpretation to a theory of existential understanding. He "depsychologizes" hermeneutics by dissociating it from the empathetic perception of other beings. Understanding now appears as a
HERMENEUTICS 365 no-Ionger-consciouscomponent of Dasein; it is embedded within the context of specific situations and plans, with, in effect, finite computational resources.Therefore, interpretation (Auslegung) that dependson such existential understanding (Verstehen) is not the general logical method found in classicalphilolory but refers to a consciousrecognitionof one's own world. Dilthey's methodologicalhermeneuticcircle is consequently supplanted by the more fundamental ontological hermeneutic circle, which leads from existential understanding situated in a world to a self-consciousinterpretive stance. however,cannot escapeits limitations This self-consciousness, understanding in the senseof Hetranscendental a to achieve gel (36,37), who consideredrationality the ability to reflectively accept or reject (transcend) the received sociocultural tradition (38). According to this reading of Heideggef, foreknowledge is accumulated over time and constrains successive exercisesof existential understanding. But self-consciousunderstanding cannot choosewhich elements in the experiencebased foreknowledge are respecifiedin the bootstrapping process.Green (39) presents a conciseoverview of Heidegger's contributions to philosophy.Steiner (40) and Palmer (22) provide accessibleintroductions to Heidegger's thought. Murray (4L) contains an informative collection of essays discussing Heidegger'sthought. Hermeneutics.In his philosophical Gadamer'sPhilosophical foundation hermeneutics Gadamer (6) follows his teacher Heidegger in recognizing that the ties to one's present horizons, one's knowledge and experience, are the productive grounds of understanding. However, Gadamer argues that these limits can be transcended through exposure to others' discourse and linguistically encoded cultural traditions becausetheir horizons convey views and values that place one's own horizonsin relief. [This position remedieswhat Green (39) contends is Heidegger's failure to show how the historicity of the individual relates to the history of a broader community.l He stressesthe role of language in opening the subjectto these other subjectivities and their horizons. In forcefully stressing the role of language in opening the subject to other subjectivities in constituting traditions. Gadamer places language at the core of understanding. Gadamer's (42) position approximates the hypothesis advancedby the American linguists Sapir (43) and Whorf (44),which holds, in its strong version,that the individual's language partially determines his or her conceptual system and world-view. According to the Sapir-Whorf hypothesis, complete translations between languages is impossible, and understanding another language requires complete immersion accompaniedby a change in thinking. Consequently, understanding for Gadamer does not scientifically reconstruct a speaker's intention but instead mediates between the interpreter's immediate horizon and his emerging one. For Gadamer, understanding is bound and embeddedin history becauseunderstanding deploys the knower's effectiuehistory, or personal experienceand cultural traditions, to assimilate new experiences.Thus, the initial structure of an effective-history constrains the range of possible interpretations, excluding somepossibilities and calling forth others.As effective-history constitutes the prejudices brought to bear in understanding, it simultaneously and dialectically limits any self-consciousattempts to dissolve those prejudices. Gadamer thus explicitly opposesthe scientific ideal of prejudicelessob-
jectivity in interpretation. In this respect, he moves beyond Heidegg€r, who regarded so-called scientific objectivity as a derivative of existential understanding. Gadamer does not deny the importance of either scientific understanding or critical interpretation, a form of interpretation that introspectively questions assumptions unreflectively inherited from cultural traditions. His focus on the human context of knowledge emphasizes the need for repeated attempts at critical understanding, through which people can gain the insight needed to correct their prejudices. But if prejudices may be individually overcoffi€,their fact is inescapable.It imposes a priori limitations on the extent to which a self-reflectivemethodolory can eliminate distortions from scientific inquiry. The critical self-consciousnessof a rational agent who introspectively questions received traditions may counter distorting consequencesof effective-history, but it at best only leads to successiveapproximations of objectivity. Gadamer's position prompts the philologists Betti (31) and Hirsch (30) to complain that its relativism destroys all bases for validating an interpretation and so defeats the purpose of interpretation. Social theorist Habermas (45) also criticizes Gadamer'srelativism. The resulting theory of meaning differs from the methodological hermeneutics of Schleiermacherand Dilthey, which identifies the meaning of a text with its author's intentions and seeks to decipher the text by uncovering the world-view behind it. For Gadamer, understanding recreates the initial intention embodiedin the text by elucidating the subject matter that the text addresses(its aboutness).The processmoves the text beyond its original psychological and historical contexts and gives it a certain "ideality" of meaning, which is elaborated in a dialogue between the interpreter and the text. The dialogue is grounded in the concern the interpreter and the author share toward a common question and a common subject matter. In confronting a viewpoint reflecting a different set of horizons, the interpreter can find his own horizons' In seeking highlighted and reach critical self-consciousness. the key question, the interpreter repeatedly transcends his own horizons while pulling the text beyond its original horizonsuntil a fusion of the two horizons occurs.The interpreter's imagination can also play a role in the dialogue with texts and carry the understanding of the subject matter beyond the finite interpretation reahzed in methodological hermeneutics. Nevertheless,the interpretations are constrained by the questions posedsince each question calls forth frameworks within which the subject matter must be understood.The meaning of a text then is not fixed but changesover time accordingto how it is received and read. Thus, for Gadamer, to understand is to understand differently than the author or even one'sown earlier interpretations preciselybecausethe processinvolves creating new horizons by bootstrapping from the old horizons they replace. But the notion of bootstrapping in Gadamer moves beyond the one in Heidegger becauseGadamer allows prejudicesto come into a consciousfocus that may direct their individual supersession. Gadamer doesnot merely work through Heidegger'sphilosophical program. He also redirects philosophical hermeneutics along partly Hegelian lines by appropriating substantial parts of the Hegelian transcendentalphilosophythat Heidegger eschewed(a6). Gadamer'sconceptsof the opennessof language and the ability of people to transcend their interpretive horizons are based on Hegel's dialectic of the limit, in which the recognition of limits constitutes the first step in transcend-
366
HERMENEUTICS
ing them. The concept of understanding as a concrete fusing of horizons is derived ultimately from Hegel's idea that every new achievement of knowledge is a mediation, or a refocusing of the past within a new, present situation (47), which attempts to explain mind and logic on the basis of the dialectical resolution ofmore basic and antithetical concepts(36). As each opposition is resolved, the resulting synthesis is found to be opposedto yet another concept, and that opposition must also bL dialectically resolved. This purely subjective and continual unfolding interacts with and is conditioned by experience' particularly the experience of language, which tends to mold the developing subject in conformity with the traditions encoded in Iinguistic utterances and in the language itself. However, Gadamer clearly resists Hegel's notion of the self-objectifying, transcendental subject. Instead, he views the logical and ontoIogical categories with which Hegel marks the unfolding of thought as distillations ofthe logic inherent in language, particularly the German language, whose use as a tool for speculative philosophy Hegel brought to perfection (48)' This view affirms the relativist position that thought and reason are always determined by the historical traditions of a linguistic community (49). Critical Hermeneutics Strategic Orientation. Heidegger's and Gadamer's critique ofobjectivity was particularly challenging for social theorists because empirical social science and normative social theory depend ultimately on the characterization ofevents and situations.Ataminimum,thepracticalneedtoassesstruth-claims' and interpretations had to be reconciled with the critique of objectivity. Apel (50) and Habermas (8,51) sought the means for ttre reconciliation in conjoining methodological hermeneutics with ordinary language philosophy. Their point of departure was the critique of ideology originated by Marx, which argues that beliefs and ideas reflect the holders' social class inLrests. (Although implying that an objective social reality might ultimately be described, this view also helps explain .orifli.t in beliefs among members of the same society') Armed with it, Apel and Habermas could conceive of a hermeneutically inspired analysis of communication in linguistic communitils. Thus, just as Heidegger's ontological hermeneutic concentrates ott ih" individual's apperception ofexperience, from the inside out, critical hermeneutics concentrates on individsituated in groups, from the outside in' uals Ap"l and HabIrmu. u*go" that of the three divisions of the study of language-syntax, semantics, and pragmatics-only the hrst two nave been adequately studied by the school of ordinary language philosophy descending from Wittgenstein (52). They Uetleve [h.t .to account of human understanding can be beiieved ifexplained as a theory about a single, asocial and ahistorical being. On the contrary, understanding may only be explained by reference to the social and historical settini in which understanding occurs and in the discursive or diiogical situation in which communication takes place' meaning do not await discovery but are negotiated t*tfr ""a who come to consensuson issues of truth and meanby actors ing through social discourse' This perspective may be conas tristed *it*, tttu first principles of research programs' such and use language explicate to seek (53-55), which Chomsky's learning on the basis of an examination of a monolanguagl "*oa"l of the competence of an ideal speaker-hearer i"gi-"uf of ab'stracted from his social situation (7)' Although studies
syntax and semantics are surely necessary for an adequate grasp of the human linguistic faculty, they are by no means sufficient. Any adequate understanding of language' Habermas (56,5?) asserts, must be grounded in the practical purposesfor which speakersuse language. Universal Pragmatics.To provide such grounditg, Habermas (56) proposed a universal pragrnatics (see Ref. 58 for a short overview and discussion),the primary task of which is the identification and reconstruction of the necessarypreconditions for the possibility of understanding in discursive communication. Turning to ordinary language philosophy, he attempts this reconstructionby linking Austin's (59) and Grice's (G0) notions of felicity conditions underlying discourse to Searle's(61) theory of speechacts and to a consensustheory of truth, which holds that truth claims are resolvedthrough reasoned discussion culminating in consensus.Habermas does not confine universal pragmatics to the analyses of language and speech.Rather, becausehe seeslanguage as the medium in which all human action is explicated and justified, he intends "universal pragmatics" as the gfoundwork for a general theory of social action. The resulting critical hermeneutics holds that intersubjective communication is possible despite differencesin the participants' preunderstandings, becausethe participants in effect posit as an ideal the attainment of a consensus(concerning the validity of statements). The desired consensusis free from constraints imposed on them by others and from constraints that they might impose on themselves.That is, a participant posits a situation in which all participants can freely try to convince others (or be convincedby them) and in which all have equal opportunity to take dialogue roles. Participation in dialogue thus admits the possibility of reinterpreting and changing the perceived situation. Habermas and ApeI term this idealization the ideal speechsituation and considerit the participants' emancipatory interest-the situation of freedom to which they aspire. This ideal might never be attained, but even to approach it, the participants must overcomesystematically distorted communication, which suppressesand concealsthe speakers'interests. According to Habermas, these distortions are produced by the division of labor and disguise its correlated structure of domination. Habermas turns to a Freudian psychotherapeutic model to prescribe treatment for the pathologicat consequencesof the systematically distorted horizonr produced under these conditions. According to him, the task of the social theorist is to act as therapist, encouraging citizens (patients) to reject internalizations of distorted institutional urrungements (classdomination). For Habermas' then, understanding involves compensating for these distortions, and interpretation requires an account of how they were generated. The Habermas-GadamerDebate. Gadamer (62) attacks Habermas's position by pointing out that the psychotherapist or social theorist is not i**r.,ne from the preunderstandings of tradition and that these preunderstandings are not themselvesnecessarilyfree of distortion. Gadamer seesHabermas's effort as part of the traditional social-scientificgoal of attain(45) ing ,,objective"knowledge of the social realm. Habermas Schleierappears to believe that the social theorist, like macher's interpreter, can understand the social actor better than the social actor understands himself. That is beyond belief for Gadamer, given his notion of ontological preunder-
HERMENEUTICS
standing. For his part, Habermas seesGadamer as too ready to submit to the authority of tradition and too reticent to offer any methodological considerations (apart from the exceedingly abstract notion of "interpretive horizons"), thereby giving unwitting support to positivist degradations of hermeneutics. In reply to Gadamer's claim that prejudices are inescapable, Habermas insists that a self-reflective methodology can overcomeprejudices and that an objective social theory can be approachedby bootstrapping from an initial understanding of society. Habermas argues that the systematic distortions in communication that bias an initial understanding of society can be analyzedand reducedusing generalization from empirical knowledge of society, quasi-causalexplanation (deductive verification), and historical critique. To build this comprehensive social theory, Habermas must provide a theory of knowledge grounded in 1. a general theory of communicative action; 2. a general theory of socialization to explain the acquisition of the competencethat underpins communicative action; 3. a theory of social systems to show the material constraints on socialization and their reflection in cultural traditions; and 4. a theory of social evolution that allows theoretical reconstruction of the historical situations in which communicative action obtains. But this move apparently fails to counter Gadamer's objection since the theoretical tools used to forge this theory may themselves be subject to interpretations other than Habermas's vary acrossthe cultural traditions of social interpreters. McCarthy (63,64) reviews the debates,discussesvarious problems in Habermas's position, and provides a systematic rendition of Habermas's arguments. Ricoeur's proposedresolution of this debate is discussedbelow. Theoryof CommunicativeAction. Gadamer'sobjectionsnotwithstanding, Habermas has embarked on a multivolume statement of a comprehensivesocial theory centered on communicative action. In the first volume Habermas (65) concentrates on the connectionbetween the theory of universal pragmatics and the general theory of action descending from Weber (66) through Parsons (67) to Schutz (68) and Garfinkel (69). His stratery is to align the various types of communication, their inherent truth claims, and their counterparts in rational action. Cognitiue corrlmunication,in which the correspondenceof terms to objects and events is at issue, has its rational action counterparts in instrumental and strategic action. These types of action are oriented toward successand are validated by instrumental reason,which reflectson the efficacy of plans as means to desiredends.Habermasties interac' tiue cornrrlunication,in which claims to moral rightness and appropriateness are thematrzed, to normo,tiuely regulated action, in which the norms of a community and the social roles of actors becomeimportant constraints on the perceivedapprocompriateness of actions. Finally, Habermas links eJcpressiue munication, in which the truthfulness of communicative actions are thematized, to drarnaturgical action, which focuses on the fact that actors respectively constitute a public for each other. Dramaturgical action attends to phenomena involving each actor's presentation of the self to others (70), to those aspectsof the actor's subjectivity he choosesto reveal to others
367
and to those he choosesto conceal.These revelations and concealments are, in turn, important factors that rational actors must assesswhen interpreting the actions of others and when planning their own. Hermeneutics Phenomenological Faced with the diversity of hermeneutics, and other continental philosophies including structuralism and phenomenology, Ricoeur strives for a grand synthesis in his phenomenological hermeneutics. For his interpretation of earlier hermeneuticists, seeRef. TL Ricoeur (72) arguesthat phenomenoloryand hermeneutics presupposeeach other. The connectionbetween hermeneutics and phenomenology traces to Heidegger who took the term "hermeneutics" from Dilthey to distinguish his own philosophical investigation of everyday being from Husserl's transcendental phenomenology,which tried to achieve objective knowledge by suspending concern for the subject's life-world. To capture knowledge of that world, Heidegger retained Husserl's notion of eidetic phenomenolory, which assumes immediate registration of phenomena in a picturelike but uninterpreted manner. Like Heidegg€r, Ricoeur also follows Husserl to eidetic phenomenology, but like the later Heidegger and, particularly, Gadamer, Ricoeur recognizesthe ontological basis of understanding in language. For Ricoeur, then, the subject'sbeing is not identical with immediate experiences. So, instead of attempting a direct description of Dasein like Heidegger (4) and Merleau-Ponty (73,74), Ricoeur sees the need for a hermeneutic theory of interpretation to uncover the underlying meaning constituting Dasein. Through its emphasis on the prelinguistic, eidetic phenomenolory supplies a means of distancing observation from linguistic descriptions and their implicit preconceptions.This distanciation (7il is precisely what is required for interpretation to proceed.Since the task of uncovering the underlying objectivity cannot be achieved through the suspensionof subjectivity, Ricoeur concludesthat Husserl's project of transcendental phenomenologycan only be realized through the application of a methodological hermeneutics to an eidetic phenomenology. Ricoeur also argues that structuralism and hermeneutics can be complementary approaches to analyses of langusg€, meaning, and cultural symbolism, for reasonssimilar to those he advanced for the complementarity of eidetic phenomenology and hermeneutics. Structuralism refers to a mode of inquiry that inventories elements of a system and notes the grammar of possible combinations. It is exemplified by Saussurean linguistics and Levi-Strauss's anthropolory (76). Ricoeur finds that the value of structuralist analysis lies in its ability to catalogue phenomena and describe their possible (grammatical) combinations, but its weakness lies in its inability to provide anything more insightful than behavioral descriptions of closed systems. Nevertheless, the ability to generate structural descriptions complementsthe hermeneutic method, which interprets these descriptions by assigning functional roles to the phenomena. In his treatment of psychoanalysis,particularly the interpretation of dreams, Ricoeur (77) shows the complexity involved in the hermeneutic task of assigning functional roles to words and symbols. The analyst must develop an interpretive system to analyze the dream-text and uncover the hidden meanings and desires behind its symbols, particularly those that have multiple senses(polysemy).Allowing for the possi-
368
HERMENEUTICS
bility of multiple levels of coherent meaning, hermeneutics aims at ascertaining the deep meaning that may underlie the manifest or surface meaning. Ricoeur distinguishes two approachesfor getting at the deepermeaning: a demythologizing one that recovers hidden meanings from symbols without destroying them (in the manner of the theologian Bultmann) and a demystifying one that destroys the symbols by showing that they present a false reality (in the manner of Marx, Nietzche, and Freud). The demythologizerstreat the symbols as a window into a sacredreality they are trying to reach. But the demystifiers treat the same symbols as a false reality whose illusion must be exposedand dispelled so that a transformation of viewpoint may take place, &s, for example, in Freud's discovery of infantile illusions in adult thinking. Thus, there are two opposingtendencies,o revolutionary and a conservativehermeneutics.Whereasthe critical hermeneutics of Apel and Habermas falls within revolutionary demystification, the phenomenologicalhermeneutics of Ricoeur and the philosophical hermeneutics of Gadamer fall in the more conservative camp of the demythologizers. Ricoeur (78) attempts a dialectical resolution of the Habermas-Gadamer debate by arguing that the hermeneutics of tradition and the critique of ideology require each other. He deniesthe alleged antinomy between the ontology of tradition, which limits possiblemeanings (Gadamer),and the eschatology of freedom, which seeks to transcend these constraints (Habermas).If, as Gadamer believes,understanding should be conceivedas the mediation between the interpreter's immediate horizons and his emerging horizon, the interpreter must distance himself to some degree if he hopesto understand the text. That is, when confronted with a text, the interpreter must adopt a stance of critical self-understanding not unlike the stance adopted in the critique of ideology. Hermeneutics thus incorporates a critique of ideology. Likewise, the critique of ideology incorporates tradition. The ideal of undistorted communication and the desire for emancipation do not begin with Habermas. They arise from a tradition-from the tradition of the Greek conception of "the good life," from the Exodus, and from the Resurrection. Thus, the interests voiced by Gadamerand Habermas ate, in Ricoeur'sview, not incompatible. One is an interest in the reinterpretation of traditions from the past and the other is the utopian projection of a liberated humanity. Only when they are radically and artificially separated, argues Ricoeur, does each assume the character and tenor of ideology.
according to the author or agent's world-view but also according to its significancein the reader's world-view. Ricoeur'shermeneutic arc combinestwo distinct hermeneutics: one that moves from existential understanding to explanation and another that movesfrom explanation to existential understanding. In the first hermeneutic subjectiveguessingis objectively validated. Here, understanding correspondsto a processof hypothesis formation based on analogy, metaphor, and other mechanismsfor "divination." Hypothesisformation must not only propose sensesfor terms and readings for text but also assign importance to parts and invoke hierarchical classificatory procedures.The wide range of hypothesisformation means that possible interpretations may be reached through many paths. Following Hirsch (30), explanation becomes a process of validating informed guesses.Validation proceedsthrough rational argument and debate based on a model of judicial proceduresin legal reasoning.It is therefore distinguished from verification, which relies on logical proof. As Hirsch notes, this model may lead into a dilemma of "selfconfirmability" when nonvalidatable hypothesesare proposed. Ricoeur escapesthis dilemma by incorporating Popper'snotion of "falsifiability" (80) into his methodsfor validation, which he applies to the internal coherenceof an interpretation and the relative plausibility of competing interpretations. In the secondhermeneutic that moves from explanation to understanding, Ricoeur distinguishes two stances regardittg the referential function of text: a subjective approach and a structuralist alternative. The subjective approach incrementally constructs the world that lies behind the text but must rely on the world-view of the interpreter for its preunderstanding. Although the constructed world-view may gradually approximate the author's as more text is interpreted, the interpreter's subjectivity cannot be fully overcome. In contrast, Ricoeur sees the structuralist approach as suspending referenceto the world behind the text and focusing on a behavioral inventory of the interconnectionsof parts within the text. As noted earlier, the structural interpretation brings out both a surface and a depth interpretation. The depth semanticsis not what the author intended t;osay but what the text is about, the nonostensivereferenceof the text. Understanding requires an affinity between the reader and the aboutnessof the text, that is, the kind of world openedup by the depth semanticsof the text. Instead of imposing a fixed interpretation, the depth semantics channels thought in a certain direction. By suspending meaning and focusing on the formal algebra of the genres reflected in the text at various levels, the structural method The HermeneuticArc: Ricoeur'sTheoryof Interpretation.Ri- gives rise to objectivity and captures the subjectivity of both coeur's theory of interpretation (79) seeks a dialectical inte- the author and the reader. Like the other traditions, Ricoeur'shermeneutic arc can be gration for Dilthey's dichotomy of explanation (erklaren) and existential understanding (uerstehen).Ricoeur begins by dis- interpreted as a bootstrappingprocess.Becauseit groundsthe tinguishing the fundamentally different interpretive para- bootstrapping in an eidetic phenomenology,incorporates an digms for discourse (written text) and dialogue (hearing and internal referential model of the text, and begins interpretaspeaking). Discourse differs from dialogue in being detached tion with a structural analysis, Ricoeur'stheory of interpretation may be easierto envision in computationalterms. But the from the original circumstances that produced it, the intencentral bootstrapping engine in his theory is the alternation tions of the author are distant, the addresseeis general rather than specificand ostensivereferencesare absent. In a surpris- between forming hypotheses about meanings and validating ing move, Ricoeur extends his theory of interpretation to those hypotheses through argument. This view resonates action, arguing that action evinces the same characteristics strongly with computational ideas about commonsensereathat set discourseapart from dialogue. A key idea in Ricoeur's soning (qv). Indeed,these ideas lead Ricoeur to identify metaview is that once objective meaning is released from the sub- phor as the main source of semantic innovation (81,82),linguistic evolution, and therefore as major question for jective intentions of the author, multiple acceptableinterprehermeneutics (83). For an excellent overview and comparison tations becomepossible. Thus, meaning is construed not just
HERMENEUTICS 369 of the treatments of langu ageand cognition found in phenomenological hermeneutics and in other nonhermeneutical traditions of philosophy, see Dallmayr (84). Hermeneuticsas Metascience The hermeneutic tradition provides a basis for prescribing and criticizing the conduct of inquiry and the development of knowledge in the natural, social, and cognitive sciences(qt). Its representatives have figured prominently in debates concerning how valid knowledge can be acquired and whether there is a need for a separate methodology in the social sciences.Since AI is a new discipline, occupying a middle ground between the natural and social sciences,its researchers can benefit from knowledge of these debates. The choice of the appropriate methodology for inquiry in AI research remains unsettled for such areas as natural-language processing,human problem solvitg, belief systems,and action. On one hand, the substantial contributions to AI from logic, mathematics, engineering, and the natural sciences'like physics, seem to make their strategies for inquiry uncontested. On the other hand, when the subject matter is clearly linked to the human sciences-particularly lingUistics, anthropology,and psychology-methods devised for those areas might be more appropriate. Hermeneuticsand the SocialSciences.Dilthey distinguished from the cultural and social sciences(Geistewissenschaften) approthe and their objects of basis the on the natural sciences priate means for knowing them. The natural sciencesconcernedphenomenathat, opaqueto thought, could only be studied from the "outside" through observation of uniformities in their behavior and through the construction of causal laws to explain those uniformities. In contrast, the human sciences had objects such as texts, verbal expressionsand actions that could be investigated from the "inside" through an understanding of their authors' experiencesand intentions. An interpretive or hermeneutic methodology could more reliably and intelligibly account for these objectsby reconstructing the internal cognitive processesthat motivated and gave meaning to each of them. The use of hypothetico-deductive methods employedin the natural sciencescould only capture the external correlations among these objects at some high level of abstraction. Dilthey's arguments were embracedin the early twentieth century by many social scientists, including the sociologist Weber (66), whose paradigmatic studies of social institutions interpreted human behavior as intentional action, structured by the agents' goals and beliefs. However, the physics model of the social sciencesalso persists and is currently manifested in such techniques as Skinnerean stimulus-responsemodeling of human behaviors and statistical content analysis, which determines the meaning of texts through frequency count of their words. Contemporary hermeneuticists, such as Apel (85,86),Habermas (8), and Ricoeur (9), strengthen Dilthey's distinction by noting that in the human sciencesthe subject of investigation and the investigator can communicate with each other. The equality suggests that an appropriate methodology will resemble discussionsin which members in a community justify their actions. The tools of the natural sciencesare simply 'incapableof representingthe key conceptsin such discussions, namely motivation, belief, and intention, and the complexity
of their interactions. Intentional actions are embedded in groups of varying size and are constrained by (re-) created rules and norms-sociocultural traditions. Because of the complexity of these intertwined and mutually defining webs of relationships, scientific accessto them is difficult, and "uncertainty principles" abound. These involve the difficulties of isolating the object of study from its milieu and preventing changesthat communication between the investigator and the subject produces in the subject. These conditions support the notion that cultural and social studies have the role of clarifying the beliefs, plans, motivations, and social roles that led cognitive agents to produce their texts and actions. The inquiry becomesa "dialogue" through which the inquirer comes to understand the tradition in which the author or agent is embedded,so that the inquirer may either accept or repair the tradition, as Gadamer demands, or even reject it, as Habermas permits. Phases of understanding may be alternated with phasesof validating knowledge,as Ricoeur'shermeneutic arc suggests,or of seeking explanations to opaque behaviors, as suggested in Apel's model of psychoanalysis.In any event, hermeneutic studies are inherently interactive and produce self-understanding. In this way they extend the original mission of hermeneutics to mediate cultural traditions by correcting misreadings or distortions. Logical positivists have neverthelessrejectedthe claims for a separate method for social and cultural sciencesas groundless challenges to their own program of creating a unified scientific method based on an unambiguous observation language (87). Abel (88), Hempel, and others argue that empathetic understanding and the attribution of rule following are psychologicalheuristics, unverifiable hunches,or intuitions based on personal experience.Although Abel concedes that they may be useful in setting up lawlike hypothesesfor testing, he concludesthat they are neither necessarynor sufficient to constitute a human science. There are several rebuttals to these claims. First, methodological hermeneutics,which Dilthey initiated and Betti (31) and Hirsch (30) continue, holds that an interpretation can be "objective" and "valid," if not verifiable, provided the investigator resists temptations to make the text relevant for her own practical affairs. This strategy regards the text as an embodiment of the values of its time and suspendscredibility regarding its truth and acceptability, according to present standards. But knowledge of values expressedin other texts and recordsfrom the period are allowed to constrain the possible interpretations. Second,the idea of an interpretive or hermeneutic social sciencehas received indirect support from ordinary language philosophy, an analytic that eschews the mentalism to which the logical positivists so strenuously object. The support comesfrom the sociologistWinch (89), who generatesrecommendationsfor a social scienceon the basis of the later Wittgenstein's analysis (52) that particular word use and discoursepatterns-"language games"-1sflect and constitute activities in semi-institutionalized,functional areas of life-"life-forms." Winch contendsthat the analysis of social actions (both verbal and nonverbal) has a necessarilyholistic, situation-oriented, interpretive character rather than a generalizing, explanatory one: "Llnderstanding . . is grasping the point or meaning of what is being doneor said. This is a notion far removed from the world of statistics and causal laws: it is closer to the realm of discourse and to the internal relations that link the parts of . . a discourse"(90).Third, philosophi-
37O
HERMENEUTICS
cal hermeneutics is not concerned with verifiable accounts, Apel and Habermas. Apel (85) clarifies this processof reconand, as noted above, it denies the possibility of objective structing paradigms from first principles when he notes that knowledge. Instead, it argues that only a personwho stands in justifications for scientific statements ultimately rely on a history, subject to the prejudices of his 89€, can hope to under- common ground in ordinary language statements. This comstand it. A valid understanding of an event, interaction, or mon gfound, the "communicative a priori," provides procetext is one that bridges history or sociocultural differencesto dural norms regarding the admissability of evidence and the highlight the inquirer's situation. By this standard, Winch's validity of argumentation. Thus, despite paradigmatic differrecommendations are not hermeneutic becausethey are based ences, scientific discourse can still reach a consensus,and on the idea of ahistorical language games.They do not recog- avoid arbitrariness or dogmatism, by falling back on princintze that interpretation includes both "translation" and "ap- pted argument stated in ordinary language. plicatior," that is, the mediation between the disintegrating Notion of an EmancipatoryScience.The hermeneuticstradiand the emerging language-games,on one hand, and the revitalization of the past and its assimilation into the present life- tion also provides the methodologicalstarting point for Marx's critique of ideology, Freud's psychoanalysis,and other studies form, on the other hand (85). that seek human emancipation by dissolving obsolete,restricHermeneuticsand the Natural Sciences.Kuhn's influential tive, and self-denying traditions and practices. Their initial justifications given for these pracThe Structure of ScientificReuolutions(91) developeda herme- strategy is to unmask the true needsand the conditions actors' the of as distortions neutics of the natural sciencesby portraying them as histori- tices understanding will not rehermeneutic Yet, situation. the of organized cally embedded, Iinguistically mediated activities justifications. In presenting these accept actors the why veal invesand conceptuahzation the direct around paradigms that science, emancipatory paradigmatic tigation of the objects of their studies. Scientific revolutions psychoanalysis as the acfully cannot beings (50,86) human that emphasizes occur when one paradigm replaces another and introduces a Apel expresin their intentions the or motives own their knowledge noThe new set of theories, heuristics, exemplars, and terms. need to be tion of a paradigrn-centered scientific community conse- sions. Consequently, empathy and introspection the applies that turn quasi-naturalistic supplemented by a quently seems analogous to Gadamer's notion of a linguistiAny behavior. actor's to the science natural of cally encodedsocial tradition. Kuhn (92) reports that his own causal analysis fed back to the actor and development toward this idea began with his distress over resulting explanations can then be self-knowledge. as appropriated that discovery Aristotle's theory of motion and the eventual As mentioned earlier, Gadamer and Habermas debatedthe Aristotle meant by "motion" something other than what the especially in regard to the word signified in Newtonian mechanics. This effort corre- vatidity of rejecting past traditions, institutions. Gadamer social political and Western of spondsclosely to a programmatic definition of hermeneuticsas critique and ungrounded incoherent move this considers the study of human actions and texts with the view to discover "rr"rrtially the value of raincluding tradition, very the it rejects since them with agree them, their meaning in order to understand must acinvestigator the that tional, noncoerced consensus, or even amend them (87). and Habermas_(51) response, In explication. the begin Debates around Kuhn's thesis have spurred often grudging cept to understandand reason for preference (Z) the that claim Apel concessionsthat data, facts, and lawlike relations are theoryfor hermeneutics-is not just arbitrary or dependent rather than verifiable, coherent, and independent ing-the grounding in the Western cultural tradition. Inof the scientific theories in which they are embedded (93). an inherited prejudice communicative a priori underlies all a that Noting the inescapable theory dependenceof observational stead, they assert that speech(and speechlikeaction) entails and communication sentences and the incommensurabilities across paradigms, as grammatical and sincere,to be weII as appropriate, be must no that conclusion Feyerabend (94,95) reaches the radical these validity claims imply a processfor methodological standards can legitimately be applied. He meani"gfui. Since the act of speaking itself commits the agreement, therefore advocates a "methodological anarchism" that pro- reaching reason. to speakers Prefer ceeds from the slogan "in science, anything goes!" Feyerabend's doubts about the possibility of interparadigm communication closely resemble Gadamer's doubts regarding the Hermeneuticsin Al accessibilityof alien traditions. AI researchers have incorporated ideas from Putnam (96), however, argues that Feyerabend conflates Thus far, few their computational models of understandinto hermeneutics comconceptswith conceptualization. According to Putnam, Hermeneutics, instead, has provided interpretation. and ing conthe that require not munication across paradigms does source of urguments for doubting the possibility of the cepts be the same acrossparadigms but only that members of fertile ,1ard AI" project, creating true artificial intelligences that can otrl paradigm make their ideas intelligible to members of anpassthe Turing test (qv)-which can be thought of basically as othei paradigm. They can do so provided the fundamental in natural language just like a human. mechanisms conceptualization are the same acrossparadigms lft. ability to converse in action theory and social interinterest AI (langu age communities). According to Putnam, the mecha- Nevertheless, as need to glean the insights of will researchers deepens, action nisms oi .ott.eptualization must be universal and a priori or if their programs are to adequately mirror social empirical experiencewould not be possible.But making ideas hermerr.rrii.s and theii cognitive foundations. Efforts that fail intelligible across paradigms can require rederiving the con- phenomena the variability of meaning according to the intencepts upon which a paradigm's theories rely as well as recon- lo .orrrider of actors as well as the perceptions of obhistories and tions structing the grounds for those concepts, and so or, recurwill not solve the difficult questions of understanding, sively. Thnr, interparadigmatic communication accordingly servers perform very weil in microworlds. Indeed, requires a "critique of ideology" similar to the one proposedby and may not even
HERMENEUTICS
371
cal knowledge or experience that an understander deploys when interpreting utterances. Hermeneuticists identify this problem as the historicity of understanding or the role of background knowledge in mediating understanding. Moreover, these deductive formalisms are subject to Alker, Lehnert, Text. of Analyzing the Affective Structure ontological critique of Husserl. Their failure to Heidegger's for extractmodel (10,97) bottom-up present a and Schneider the ing the affective structure of a text. Their "computational her- address the fundamental ontolory of language typified by the for to account inability to an leads situation conversational meneutics" builds from Lehnert's earlier work on "plot units" inthe of identification in speaker-hearer role of context nevertheless but (98,99). Plot units provide an unvalidated (2). supports Thus, Winograd utterances of meanings tended relationships affective designating for interesting vocabulary and their combinations. In this research they are used to de- the Heideggerian critique with arguments and examples In for participants in events drawn from ordinary language philosophy (59,61,103,104). scribe many emotional consequences sense making that he argues (qt) of Gadamer, reminiscent vein a and actions. Working within "conceptual dependency" theory (100), Lehnert identified various combinations of plot of a statement requires knowing how it is intended to answer (implicit or explicit) questions posed by the conversational units for use in summarizing narrative texts. These "story molecules" relate changes in actors' affects to successesand context. He concludesthat deductive logic can accountfor only a small fraction of human reasoning, and therefore new adfailures in the resolutions of problems involving them. In their in natural-language understanding require "a calculus vances pasreduced manually work Lehnert, Alker, and Schneider reasoning" (105). natural of Christ's to leading up events of retelling sagesfrom Toynbee's Winograd proposes knowledge-representation language crucifixion to a large number of these molecules. The molecules were interrelated through the actors involved and by (KRL) (qv) (106) as a starting point for an alternative apvirtue of some molecules being antecedent conditions for proach. KRL's reasoning based on limited computational reothers. After the input of these manual reductions,the central sourcescaptures Heidegger's thesis of the finititude of Dasein subgraph of the plot structure was computationally extracted, and also echoesSimon's notion of "boundedrationality" in the theory of decision making (107). For Winograd, effective reausing a program for finding the most strategic and highly connectedmolecules. This central subgraph was labeled the soning strategies under limited or variable computational resources provide a "natural reasonitg," which, although for"essential" Jesus story. After studying this affective core, Alker, Lehnert, and Sch- mally incomplete, can account for more of everyday neider concluded that the Jesus story involves an ironic vic- natural-language usage than can the small fraction that fits tory born from adversity and conforms to a well-known genre, the mold of a complete deductive logic (105). Moreover, this the romance of self-transcendence.Their method resembles approach must have the ability to deal with partial or impreclassical hermeneutics in seeking to uncover the essential cise information if it is to work at all. Winograd proposesa structure of text based on systematic linkages between the control structure that uses matching of the current processing parts and the whole and in emphasizing the use of explicit context to trigger actions appropriate for the situation. This rules for objective interpretation. However, their willingness view of situated action, in which situational responsesare unreflective, resemblesthe concept of "thrownness" as develto tolerate multiple interpretations and their structuralist orientation also aligns them with phenomenologicalhermeneu- oped by Heidegger. The combination of situated action as a tics. Alker, Lehnert, and Schneider suggest that the Jesus control structure and resource-limited reasoning grounded in story has been emotively potent becauseit provides a step-by- commonsense,stereotype-basedreasoning (2) resonates with recent work on analory (108-110), precedential reasoning step account of affective change in self-transcendenceand thus (111), and metaphor (2l,ll2). At its core KRL also incorpocan open its readers to the experience of this process.In its present form, however, this work does not implement a boot- rates a notion of bootstrapping similar to the one found in the strapping process even though ironically the theme of self- various hermeneutic traditions, particularly in the works of transcendence presupposes a mechanism capable of con- Heidegger and Gadamer. Winograd argues that spurious reification, or misplaced sciously directed bootstrapping. concreteness,has plagued earlier efforts to develop a formalWhat Does it Mean To UnderstandNatural Language?Wino- ism for representing natural language. Spurious reification grad (2) uses insights primarily from philosophical hermeneu- occurswhen a competenceis imputed to an understander, not tics to sketch a new approachto natural-language understand- becausethe understander actually employs the specifiedcompetence in performance, but because the observer classifies ing (qv). He intends to overcome the pitfalls of earlier performances as instances of a particular competenceand then approachesthat succumbedto the phenomenologicalcritique advancedby Dreyfus (1). Focusing on the theory of meaning, mistakenly imputes the competenceto the understander. InWinograd argues that previous efforts, including his own stead of building from domain-level conceptsand structures, SHRDLU (qv) (101), fell into the trap of "objectivism," or the Winograd attempts to avoid spurious reification by constructmisplaced belief that the contents of a theory or model corre- ing formal representationsbasedon ontological considerations boruowedfrom methodologicalhermeneutics (113). Since no spond directly to reality (the correspondencetheory of truth). [Prior (102) provides a conciseoverview of the coruespondence substantial AI project has been attempted using KRL, the theory of truth, which holds that the structure of theoretical ideas that its designershoped to capture remain more theoretknowledge corresponds to reality.l Winograd adds that the ical than practical. In discussing hermeneutics, Winograd not only proposesa deductive nature of the formalisms used by AI researchers forced them to adopt an objectivist position but that these new researchprogram for AI but also problematizesthe philoformalisms failed to account for the informal, phenomenologi- sophical basis of current natural-langu age research. Fundathey are more likety to impute the implementor's theory, &s embodied in the program, rather than recognizethe particular organization in the phenomena under study.
372
HERMENEUTICS
mental assumptions and philosophical orientations underlying research must now be explicitly analyzedand justified. In rejecting "objectivism," Winograd advocates a "subjectivist" hermeneutical position that builds from Maturana's (114) notion of the nervous system as "structure determined,"plastic, and closed.According to this model, activities outside the system (stimuli) perturbate the structure of the system,and these perturbations in turn lead to "patterns of activity that are different from those that would have happenedwith different perturbations." Winograd's parallel notion of understanding posits a closed system in which preunderstanding evolves through acts of interpretation. As in Heidegger'shermeneutic circle, the possible horizons that can be understoodare constrained by the historically determined structure of preunderstanding or set of stored schemas(2). Understanding is open to the environment only within this range. Unlike Heidegg€r, who recognizedthe importance of the environment but failed to analyze it, Winograd is led to the analysis of the environment by several influences. These include Garfinkel's (69) ethnomethodology,which emphasizessocial context, Searle's focus on speechas social action, and Lakatos' (115) argument that even in mathematics the meanings of terms are contingent on outside context. Winograd Q) grounds his theory of meaning in terms of social action, and so takes a position close to critical hermeneutics,between relativism and objectivism. Stimulated in part by Winograd (2), Bateman (11,12)examof Heidegger'sexistential phenomenolines the consequences ogy and agrees with Dreyfus (1) that this philosophy denies the possibility of modeling thought and action using the specific formalizations proposedby the functionalist paradigm of cognitive science.Bateman saysthese formalisms are basedon the "ontological assumption" of an interpreter who follows rules in acting upon a mental representation of a situation. Heidegger's notion of "being-in-the-world," which includes both situatedness and understanding as ontological modes, precludes the subject-object dichotomy in this assumption. Since one is always in a situation, and its structure and significance are determined by its relevance to one's plans and purposes,no context-freerepresentationis possible. Bateman, however, does not dismiss the possibility of a functionalist paradigm for cognitive science.He wants instead to ground it on the later Heidegger'sidea of language' which, according to Bateman, seeks to make intelligible the experienceof "being-in-the-world" as it is for "anyone,"that is, for a generalized subject or member of a language community. As a collective artifact, a language is consideredto encodepartially the history of the language community through both the admissible and inadmissible combination (association)of words and phrases. The resulting connotatlqnal structure captures a kind of collective background knowledge and imposesa priori constraints on the actions of individuals who contemplate actions in terms of the language. In Halliday's "systemic grammar" (116) there is the notion of a "social semiotic" that acknowledgesthat a group's culture can restrict the possible meanings of utterances through constraints on possibleways of acting in situations. Bateman considers this orientation compatible with the hermeneutic view and believes that "systemic gIammtt," with appropriate revisions, can provide an adequate theoretical framework for natural-language understanding. Yet despitethis opennessto social constraints, Bate-
man does not consider hermeneuticists who came after Heideggct, most notably Gadamer and Habermas. Foundationsof Understanding.In a more recent work Winograd and Flores (3) draw upon philosophicalhermeneuticsand Maturana's (117) work on the biolory of cognition to deny the possibility of the constructing intelligent computers. They argue that to the extent Heidegger and Gadamer make a persuasive casethat certain features of human existenceare fundamental, the quest for intelligent machinery is quixotic. Theseconceptsinclude "thrownness,""blindness,"and "breakdown." "Thrownness" denotesthat people are thrown into the situations of everyday life and rarely have time to reflect on alternative coursesof action. They cannot be impartial, detached observersof the world in which they live, but they must decideand act using heuristics they have as part of their effective histories. Although these heuristics enable some action possibilities, the same heuristics also "blind" peopleto other action possibilities that might have predominated had their effective-histories been different. When faced with situations where their effective-histories fail to provide an adequate guide for action and also "blind" them to those actions that support their purposes, people experience a kind of "breakdown." In breakdown, actions becomeproblematic and tools which had been previously taken for granted are perceivedin isolation as objects. If an expert system (qt) is designedto present a user with possiblecoursesof action in particular situations, the concepts of "thrownness," "blindness," and "breakdown" also comeinto play. Although expert systems may operate successfully in well-understood,constrained domains, expert systemsin complex domains may be "thrown" into situations where they cannot evaluate all possibleactions and they consequently"break down." Systems targeted at complex domains must therefore rely on heuristic rules, but these may "blind" the program to more propitious coursesof action. Winograd and Flores add that the expert-systemprogrammer introduces his own "blindness" or preconceptions into the program. Because of these difficulties, Winograd and Flores recommendreformulation of the goals of artificial intelligence. Instead of directing efforts toward the putatively impossible goal of creating machines that can understand, programs should be designedto serve as tools for enhancing the quality of life. This could be done by recognizing the role of such programs in the web of conversations (speechacts) that constitute social existence,by attempting to minimize the "blindness" they engender,and by anticipating the range of their potential "breakdowns." Winograd and Flores present a reasoned critique of two specific categoriesof AI research. The first comprisesAI approaches that incorporate rigidly fixed means of interpretation, such as much work in knowledge-basedsystems. The secondcategory includes those approachesthat proceedfrom the dualist presumption that truth, meaning, and reference are establishedby means of a correspondencebetween entities in the world and entities in the mind (the correspondencetheory of truth) rather than in the everyday discourseof intelligent agents. Although they acknowledge that learning approaches might eventually be able to address the criticisms they raise, they do not expect progressin learning during the near term. Thus, their work amounts to a critique of the tractability of the "hard AI" project. As such, it constitutes a con-
HERMENEUTICS 373 tinuation of the critique AI begun by Dreyfus (1) but differs in that it comesfrom within AI and is argued in more computational terms. However, Winograd and Flores fail to demonstrateconvincingty that computer understanding exceedsthe range of the possible. They only demonstrate that the goal is much more difficult than many people, including many AI practitioners, may have thought. Unfortunately, Winograd and Flores unfairly charactenze as "blind" those AI approachesthat come closest to overcoming their objections, such as Winston's (108,110,118)approachto learning and reasoningby analogy. Winograd and Flores misconstrue Winston's approachas capable of producing results only becauseit operates in a microworld with primitives fixed in advance by the implementors. Although this criticism may be leveled fairly at many AI progfams, Winston's program is in principle not so limited, precisely because it is not based on domain-specificprimitives. Indeed, Winston's program is general enough to perform well in any domain becauseit processeslinguistically derived data accordingto the data's form rather than specificcontent. Moreover, becauseit learns rules on the basis of its experience(the effective history over which it can draw analogies).Winston's program representsa first computational approximation of the basic hermeneutic notion of a preunderstanding grounded in effective-history. Mallery and GroundingMeaningin EideticPhenomenology. Duffy (13) present a computational model of semantic perception-the processof mapping from a syntactic representation into a semantic representation. Some computational and noncomputational linguists (100,119,L20)advocate determining equivalent meanings (paraphrases)through the reduction of different surface forms to a canonicalizedsemantic form comprised by somecombination of semanticuniversals (e.9.,"conceptual-dependency"primitives). Mallery and Duffy reject this view on the grounds that most meaning equivalencesmust be determined in accordancewith the specificlinguistic histories of individual language users-or at least linguistic communities basedon social groups-and the intentional context of the utterance. Their alternative is lexical-interpretiueserrLantics, an approach to natural-language semantics that constructs semantic representations from canonical grammatical relations and the original lexical items. On this view, semantic representations are canonicalized only syntactically, not semantically or pragmatically. Instead of relying on static equivalencesdetermined in advance, lexical-interpretive semantics requires meaning equivalences to be determined at their time of use, reference time. To meet this requirement, Mallery and Duffy introduce the conceptof a ffLeaningcongruence class,the set of syntactically normalized semantic representations conforming to the linguistic experienceof specificlanguage users and satisfying their utterance-specific intentions. Meaning equivalencesare then given by th_emeaning congruence classes to which utterances belong. Lexical-interpretive semantics differs from approachesrelying on semantic universals becausemeaning equivalencesare determined dynamically at referencetime for specific-languageusers with individual histories rather than statically in advance for an idealized-language user with a general but unspecific background knowledge. The major assumption underlying lexical-interpretive se-
mantics is that meaning equivalencesarise becausealternative lexical realizations (surfaceforms) accomplishsufficiently similar speaker goals to allow substitution. Determining meaning congruencesin advance,based on static analysis, is hopelesslyintractable. This follows from the need to predict in advance all potential utterance situations, intentional contexts, and combinations of language-user effective-histories. Although semantic canonicalization on the basis of a general "semantic and pragmatic competence"renders static analyses of language-usercombinations tractable by fiat, it also reduces nuancesso dramatically that intentional analysis and individual linguistic histories play a drastically diminished role. Lexical-interpretive semantics is hermeneutic because it emphasizesinterpretation based on the individual effectivehistory of language users and the specificintentional structure of communicative situations. By virtue of its emphasison innovation in language and polyseffiy, Iexical-interpretive semantics is perhaps most closely aligned with the phenomenological hermeneutics of Ricoeur (72). Interpretation builds from an eidetic level of representation, the syntactically normalized semantic representation.The determination of meaning congruenceclassesbecomesan early level of a more general and open-endedhermeneutic interpretation. Stimulated by recent debates about perception (I2I,122), Mallery and Duffy consider semantic perception to be a processof mapping from sense-data, in this casenatural-langu age sentences,to a semantic representation. But instead of providing an account of perception suited to a theory of meaning basedon semantic universals like Feigenbaum and Simon (L22), MaIIery and Duffy provide one suited to a hermeneutic theory of meaning. Mallery and Duffy have implemented this theory, uP to the level of eidetic representation, in the RELATUS Natural-Language System (123). Although they share some of the hermeneutically oriented views and concerns articulated in Winograd Q) and Bateman (11,12), their implementation allows more concrete specification and testing of their theory, which currently focuseson earlier processinglevels. For example, Mallery and Duffy (13) have proposedconstraint-interpreting reference (L24) as a model that conforms to lexical-interpretive semantics,just as discrimination nets are well-suited to approaches relying on semantic primitives (122,125-127). They ground this choice both in the available experimental psycholinguistic evidence and in the desirable computational properties of reference based on constraint interpretation. These properties include maximizing monotonicity (minimizing backtracking) in the syntactic processing that precedes referenceand optimizing subgraph isomorphism (search)as it arises in reference and in other reasoning operations-particularly commonsensereasoning grounded in analory. Conclusions This entry has presentedhermeneutics primarily as a philosophy of understanding rather than as a set of technologiesfor interpretation in specific domains. As such, the hermeneutic tradition seemsable to speak to AI researchersin two distinct ways. First, hermeneutics provides some basis for arguing against the feasibility of the AI project, dt least under its present dispensation. Whether represented by Dilthey's idea of empathetic understanding or Heidegger's idea of situated understanding, hermeneutics seemsto have discovereda qual-
HERMENEUTICS
374
ity in the human situation that is vital for knowledge of others and oneselfbut has not yet been simulated mechanically.Becausethese doubts are generated from an ongoing intellectual tradition and becausethey refine some fairly common intuitions, they cannot easily be dismissedas "irrational technological pessimism."On the other hand, these doubts should stimulate attempts by AI researchers to overcome them, as just somedoubts raised by Dreyfus (1) stimulated earlier research. At the very least, then, the insights of the various hermeneutical camps can be expectedto receive increasing attention in the artificial intelligence community. Second,hermeneutics can suggestconstraints, orientations and even criteria in the design of AI systemsthat are intended either to understand natural language or to represent knowledgeof the social world. The lessonsof this tradition are, however, equivocal.Dilthey, Heidegger,Gadamer,Habermas,Ricoeur, and others provide very different notions of what constitutes understanding and its grounding. Nevertheless, researchers who are aware of these debates might be more cognizantof the choicesthey make in their own designs.As a systemswould not merely illustrate isolatedand consequence, perhaps idiosyncratic theories about linguistic phenomenabut would begin to support (or deny) major philosophical positions in ontolory, epistemology,and philosophy of mind. But the generally precomputational nature of contemporary hermeneutics calls for specific formulations that can be tested computationally. Computational experimentation, and empirical philosophy, can then feed back into the reformulation and refinement of ideas about both hermeneutics and AI.
BIBLIOGRAPHY 1. H. Dreyfus,What ComputersCan'tDo: A Critiqueof Artificial 'W. Reason, H. Freeman, San Francisco,1972.2nd edition with a new prefacewas published in L979. 2. T. Winograd, "What does it mean to understand natural language," Cog. Sci. 4,209-24L (1980).
3. T. Winograd and F. Flores, Understanding Computersand Cognition: A New Foundation for Design,Ablex, Norwood,NJ, 1986. 4. M. Heidegger,Being and Time, J. Macqarrie and E. Robinson (trans.),Harper & Row, New York, 1962.Originally publishedas Sein und Zeit, Neomarius Verlag, Tubingen, F.R.G., L927. 5. W. Dilth"y, "The Rise of Hermeneutics,"T. Hall (trans.), in P. Connerton (ed.),Critical Sociology:SelectedReadings, Penguin, Harmondsworth, U.K., pp. 104-116, 1976. Excerpted from W. Dilthey, "Die Entstehung der Hermeneutik," 1900, in W. Dilthey, Gesammelte Schriften, B. G. Teubner, Leipzig and Berlin, pp. 3L7-320, 323-31, 1923. 6. H. Gadamer, Truth q,ndMethod, Continuum, New York, 1975. Originally published as Wahrheit und Methode, Tubingen, F.R.G.,1960. 7. K. Apel , Towards A Transformation Of Philosophy, G. Adey and D. Frisby (trans.),Routledge& Kegan Paul, London,1980.Originally published in Transformation der Philosophie, Suhrkamp Verlag, Frankfurt am Main, F.R.G., L972, 1973. 8. J. Habermas, Knowledge and Human Interests, J. J. Shapiro (trans.), Heinemann, London, 1972. Originally published in 1968. 9. P. Ricoeur,Main Trends in Philosophy,Holmes and Meier, New York, 1979. Reprinted from Main Trends in The Social and Human Sciences-Part /1, UNESCO, New York, 1978: see Ref'
7r. 10. H. R. Alker Jr., W. G. Lehnert, and D. K. Schneider,"Two rein-
terpretations of Toynbee'sJesus: Explorations in computational hermeneutics,"Artif.Intell. Text Understand.Quad.Ric. Ling.6, 49-94 (1985). 11. J. A. Bateman, Cognitive ScienceMeets Existential Phenomenology: Collapseor Synthesis?Working Paper No. 139, Department of Artificial Intelligence, University of Edinburgh, Edinburgh, April 1983. t2. J. A. Bateman, The Role of Language in the Maintenance of Intersubjectivity: A Computational Investigation, in G. N. Gilbert and C. Heath (eds.),Social Action And Artificial Intelligence,Grower, Brookfield, VT, pp. 40-81, 1981. 13. J. C. Mallery and G. Duffy, A ComputationalModel of Semantic Perception,AI Memo No. 799, Artificial Intelligence Laboratory, MIT, Cambridge,MA, May 1986. 14. W. Stegmuller, The So-called Circle of Understandi.g, in W. Stegmuller (ed.), CollectedPapers on Epistemology, Philosophy of Scienceand History of Philosophy,Vol. 3, Reidel, Dordrecht, The Netherlands L977. 15. D. B. Lenat, AM: Discoveryin Mathematics as Heuristic Search, in R. Davis and D. B. Lenat (eds.),Knowledge-BasedSystems in Artificial Intelligence,McGraw-Hill, New York, pp. L-227, 1982. 16. D. B. Lenat, "Eurisko: A program that learns new heuristcsand domain concepts:The nature of heuristics III: Program design and results," Artif. Intell. 21, 6I-98 (1983). L7. K. W. Haase, ARLO: The Implementation of a Language for Describng Representation Languag€s, AI Technical Report No. 901, Artificial Intelligence Laboratory, MIT, Cambridge, 1986. 18. J. Piaget, The Origins of Intelligence in Children, M. Cook (trans.),W. W. Norton, New York, 1952. 19. J. Piaget, Genetic Epistemology, Columbia University Press, New York, 1970. 20. G. L. Drescher, Genetic AI: Translating Piaget Into LISP, AI Memo No. 890, Artificial Intelligence Laboratory, MIT, February 1986. 2L. M. Minsky, The Societyof Mind, Simon & Schuster,New York, 1986. 22. R. Palmer, Hermeneutics:Interpretation Theory in Schleiermacher, Dilthey, Heidegger, and Gadamer, Northwestern lJnversity Press,Evanston, IL, 1969. 23. J. Bleicher, Contemporary Hermeneutics: Hermeneutics as Method, Philosophy, and Critique, Routledge & Kegan PauI, London, 1980. 24. B. Smalley, The Study of the Bible in The Middle Ages,2d ed., Blackwell, Oxford,U.K., 1952. 25. F. Schleiermacher,in H. Kimmerle (ed.), Hermeneutik, Carl Winter Universitatsverlag, Heidelb€rg,F.R.G., 1959. 2G. J. B. Thompson, Critical Hermeneutics:A Study in the Thought of Paul Ricoeur and Jurgen Habermas, Cambridge University Press,Cambridg", U.K., 1981. ZT. W. Dilthey, in H. P. Rickman (ed.),SelectedWritings,Cambridge University Press,Cambridge,U.K., 1976. 28. M. Ermarth, WithetmDilthey: The Critique of Historical Reason, University of ChicagoPress,Chic&go,IL, 1978. 29. T. Plantinga, Historical [Jnderstanding in the Thought of Withetm Ditthey, University of Toronto Press,Toronto, 1980. 80. E. D. Hirsch Jr., Validity in Interpretation, Yale University Press,New Haven, CT' 1967. 81. E. Betti, Hermeneuticsas The GeneralMethodologyof The Geisrrr Ref. 23, pp. 5t-94. Originally published as teswissenschaften, Die Hermeneutik als allgemeineMethod der Geisteswissenschaften, Mohr, Tubingen, F.R.G., L962BZ. E. Husserl, Id.eas:General Introduction to Pure Phenomenology, W. R. B. Gibson(trans.),GeorgeAllen and Unwin, London,1931. First publishedin 1913.
HERMENEUTICS Bg. R. Schmitt, Phenomenology,in P. Edwards (ed.),The Encyclope' dia of Philosophy,Yols. 5 and 6, Macmillan, New York, pp. 135151,1967. 94. R. M. Zaner, The Way of Phenomenology:Criticism as a Philosophical Discipline, Pegasus,New York, 1970. 35. P. Ricoeur, Husserl: An Analysis of His Phenomenology,E. G. Ballard and L. E. Embree (trans.), Northwestern Univerity Press,Evanston, IL, 1967. 36. G. W. F. Hegel, The Philosophy of Mind, Part 3 of The Encyclopedia of the Philosophical Sciences,W. Wallace (trans.), Oxford University Press, Oxford, U.K., IgTL First published in 1830. g7. G. W. F. Hegel, The Scienceof Logic, Part 2 of The Encyclopedia of the Philosophical Sciences,W. Wallace (trans.), Oxford University Press,Oxford, U.K., L975.First published in 1830. 38. P. Singer, Hegel, Oxford University Press, Oxford, U.K., 1983. 39. M. Green, Martin Heid egger,in P. Edwards (ed.),The Encyclopedia of Philosophy,Yols.7 and 8, MacMillan, New York. pp. 457465, 1967. 40. G. Steiner, Martin Heidegger, Penguin, New York, 1980. 4L. M. Murray, Heidegger and Modern Philosophy: Critical Essays, Yale University Press,New Haven, CT, 1978. 42. H. Gadamer,Man and Language,in D. E. Linge (ed.and trans.), Philosophical Hermeneutics, University of California Press, Berkeley,pp. 59-68, L976. 43. E. Sapir, SelectedWritingsof Edward Sapir, University of CaIifornia Press, Berkeley, L947. 44. B. Whorf, Language, Thought and Reality, MIT Press, Cambridge, MA, 1967. 45. J. Habermas, A Review of Gadamer'sTruth and Method, in F. R. Dallmayr and T. A. McCarthy, (eds.),Understanding and Social Inquiry, University of Notre Dame, Notre Dame, pp. 335-363, L977. Originally published tn Zur Logik der Sozialwissenschaften, Suhrkamp Verlag, Frankfurt am Main, 1970. 46. H. Gadamer,Hegel'sDialectic: Fiue HermeneuticalStudies, P. C. Smith (trans.), Yale University Press, New Haven, CT, L976. German edition published in L97L. 47. D. E. Linge, Editor's Introduction, in Ref. 42, pp.xi-viii. 48. H. Gadamer, Hegel and Heidegg€r,in Ref. 46, pp. 100-116. 49. H. Gadamer,The Idea of Hegel's Logic, in Ref. 46, pp. 75-99. 5 0 . K. Apel, Scientistics, Hermeneutics and The Critique of ldeology: Outline of a Theory of Sciencefrom a Cognitive-Anthropological Standpoint, in Ref. 7, pp. 46-76. 5 1 . J. Habermas,"Knowledge and human interest," InquiU 9r 285300(1966). 52. L. Wittgenstein,PhilosophicalInuestigations,3d ed.,MacMillan, New York, 1968.Earlier edition published in 1953. N. Chomsky, SyntacticStructttres,Mouton, The Hague, 1957. N. Chomsky, Aspectsof The Theory of Syntax, MIT Press, Cambridge, MA, 1965. 55. N. Chomsky, Lectures on Gouernment and Binding, Foris, Dordrecht, 1981. 56. J. Habermas, What is Universal Pragmatics? Communication and the Euolution of Society,T . McCarthy (trans.), BeaconPress, Boston,pp. 1-68, 1979. First published in German in 1976. 57. J. Habermas, "Some Distinctions in Universal Pragmatics, Theor.Soc.3, 155-167 (197O. 58. J. B. Thompson,Universal Pragmatics, in J. B. Thompsonand D. Held (eds.),Habermas: Critical Debates,MIT Press, Cambridge, MA, pp. 116-133, 1982. 59. J. L. Austin, How To Do Things withWords, Harvard University Press, Cambridge, MA, t962. 60. P. H. Grice, Logic and Conversation,in P. Cole,and J. L. Morgan (eds.),Studies in Syntax, Vol. 3, AcademicPress,New York, pp. 41-58, r975.
375
6 1 . J. R. Searle, Speech Acts, Cambridge [Jniversity Press, Cambridge, U.K., 1970. 62. H. Gadamer, On The Scopeand Function of Hermeneutical Reflection, in Ref. 42 pp. 18-43. 63. T. McCarthy, Rationality and Relativism: Habermas's "Overcoming" of Hermeneutics, in Ref. 58, pp. 57-78. 64. T. McCarthy, The Critical Theory of Jurgen Haberrnans, MIT Press,Cambridge,MA, 1978. 6b. J. Habermas. The Theory of CommunicatiueAction, Vol. L, Reason and the Rationalizationof Society,T.McCarthy (trans.),Beacon, Boston, 1981. German edition published in 1981. 66. M. Weber,in E. Shils and H. Finch (eds.and trans.),The Methodology of the Social Sciences,Free Press, Glencoe,IL, 1949. 67. T. Parsons, The Structure of Social Action, McGraw-Hill, New York, 1937. 68. A. Schutz, The Phenomenologyof a Social World, Northwestern University Press,Evanston, IL, 1967. 69. H. Garfinkel, What is Ethnomethodology?In Ref. 46, pp. 24026L. Originally published in H. Garfinkel, Studies in Ethnomethodology,Prentice-Hall, Englewood Cliffs, NJ, L967. 70. E. Goffman, The Presentation of SeIf in Euerydoy Life, Doubleduy, New York, 1959. 7L. P. Ricoeur, "The task of hermeneutics,"PhiloA. Tod,.L7, (1973)' D. Pellauer (trans.). Reprinted in Ref. 41, pp. 141-160. Also reprinted in J. B. Thompson(ed.and trans.),Paul Ricoeur:Hermeneutics and the Human Sciences,Cambridge University Press, Cambridge,U.K., pp. 43-62, 1981. 72. P. Ricoeur, Phenomenologyand Hermeneutics, Translated and reprinted in J. B. Thompson(ed. and trans.),Paul Ricoeur:Hermeneutics and the Human Sciences, Cambridge University Press,Cambridge,England, pp. 101-L28, 1981.Originally published as "Phenomenologie et Hermeneutiqu€," Phanomenologische Forschungen, Vol. 1, E. W. Orth (ed.), Karl Alber, Freiberg,pp. 3l-77, 1975. 73. M. Merleau-Ponty, Phenomenology of Perception, C. Smith (trans.),Routledge& Kegan Paul, London, L962.Originally published as Phenomenologiede la Perception,Paris, L945. 74. F. A. Olafson,Maurice Merleau-Ponty in P. Edwards (ed.),The Encyclopediaof Philosophy,Vols. 5 and 6, MacMillan, New York, pp. 279-282, L967. 75. P. Ricoeur, "The hermeneutical function of distanciation," PhiIas.Tod. L7, t29-143 (1973).Reprinted in Ref. 9, pp. 131-144. 76. C. Levi-Strauss,Structural Anthropology,C. Jacobsonand B. G. Schoepf(trans.), Penguin, Harmondsworth,U.K., 1968. 77. P. Ricoeur, Freud and Philosophy: An Essay on Interpretation, D. Savage (trans.), Yale University Press, New Haven, CT, 1970. 78. P. Ricoeur, Hermeneutics and the Critique of Ideology,in J. B. Thompson (ed. and trans.), Paul Ricoeur: Hermeneuticsand the Human Sciences,Cambridge University Press,Cambridgu,U.K., pp. 63-100, 1981.Originally publishedas Hermeneutiqueet critique des ideologies,in E. Castelli (ed.),Demythisationet ideologie, Aubier Montaigtr€, Paris, pp. 25-64, 1973. 79. P. Ricoeur, "The model of text: Meanin#ul action consideredas text," Soc.Res.38,529-562 (L971).Reprinted in J. B. Thompson (ed. and trans.), Paul Ricoeur: Hermeneuticsand the Human Sciences,Cambridge University Press,Cambridge,U.K., 1981. 80. K. Popper, The Logic of ScientificDiscoue4y,Basic Books, New York, 1959. 81. P. Ricoeur, "Creativity in language,"Philos. Tod. 17,97-111 (1e73). 82. P. Ricoeur, The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning in Language, R. Czerny, (trans.), University of Toronto Press, Toronto, 1977. Originally published as La Metaphore uiue,edition du Seuil, Paris, 1975.
376
HEURISTICS
83. P. Ricoeur, Metaphor and the Main Problem of Hermeneutics, New Literary History, Vol. 6, pp. 95-110, 1974-75. Reprinted in C. E. Reagan and D. Stewart (eds.),The Philosophyof Paul Ricoeur:An Anthology of His Work, Beacon,Boston, pp. L34-I48, 1978. 84. F. R. Dallmayr, Languo,geand Politics: Why DoesLanguageMatter to Political Philosophy?" University of Notre Dame Press, Notre Dame, IL, 1984. 85. K. Apel, The Communication Community as the Transcendental Presuppositionfor the Social Sciences,in Ref. 7, pp. 136L79. 86. K. Apel, lJnderstanding and Explanation, G. Warnke (trans.), MIT Press, Cambridg", MA, 1984. Originally published as Die Brklaren-Verstehen-Kontrouerse in Tranzendental-Pragmatischer Sicht, Suhrkaffip, Frankfurt am Main, F.R.G., 1979. 87. G. Radnitzky, Continental Schoolsof Metasciences:The Metascienceof the Human SciencesBased upon the "Hermeneutic-Dialectic" School of Philosophy, Vol. 2 of ConternporarySchools of Metascience,Scandinavian University Books,Goteborg,Sweden, 1968. 88. T. Abel, The Operation Called Verstehen,in Ref. 46, pp. 8I-92. Originally published tn Am. J. Soc. 54, 2L1-218 (1948). 89. P. Winch, The Idea of a Social Scienceand its Relation to Philosophy, Routledge & Kegan Paul, London, 1958. 90. Reference89, p. 115. 91. T. S. Kuhn, The Structure of ScientificReuolutions,University of ChicagoPress,Chicago,IL, L962. 92. T. S. Kuhn, The Essential Tension: SelectedStudies in Scientific Tradition and Change,University of ChicagoPress,Chicago,IL, L977. 93. R. J. Bernstern, Beyond Objectiuism and Relatiuism: Science, Hermeneutics, and Praxis, University of Pennsylvania Press, Philadelphia, 1983. 94. P. Feyerabend,Consolationsfor the Specialist,in I. Lakatos and A. Musgrave (eds.),Criticism and the Growth of KnowledgeCambridge University Press,Cambridge,U.K., pp. L97-230, 1970. 95. P. Feyerabend,Against Method,Yerso, London, 1978. 96. H. Putnaffi, Reason, Truth and History, Cambridge University Press,Cambridge,U.K., 1981. 97. W. C. Lehnert, H. R. Alker Jr., and D. K. Schneider,The Heroic Jesus: The Affective Plot Structure of Toynbee's Christus Patiens, in S. K. Burton and D. D. Short (eds.),Proceedingsof the Sixth International Conferenceon Computers and the Humanities,ComputerSciencePress,Rockville,MD, pp.358-367,1983. 98. W. C. Lehnert, "Plot units and narrative summarization," Cog. Scl. 4, 293-331 (1981). 99. W. C. Lehnert, Plot Units: A Narrative Summarization Strategy, in W. C. Lehnert and M. H. Ringle (eds.),Stratgies for Natural Language Processing, Erlbaum, Hillsdale, NJ, pp. 375-4L4, L982. 100. R. C. Schank and R. Abelson, Scripts,Plans, Goals,and Understanding, Erlbauffi, Hillsdale, NJ, 1977. 101. T. Winograd, Understanding Natural Langudge,Academic,New York, 1972. Theory of Truth, in P. Edwards(ed.), L02. A. N. Prior, Correspondence The Encyclopedia of Philosophy, Vols. l-2, MacMillan, New York, pp. 223-232, 1967. 103. J. R. Searle,"A Taxonomy of Illocutionary Acts," in K. Gunderson (ed.),Language And Knowledge: Minnesota Studies In PhilosophyOf Science,11,University of Minnesota Press,Minneapolis, pp. 344-369, 1975. L04. J. R. Searle, "The intentionality of intention and action," Cog. S c r .4 , 4 7 - 7 0 ( 1 9 8 0 ) . 105. Reference2, p.2Lg.
106. D. G. Bobrow, and T. Winograd, "An overview of KRL, a knowledgerepresentationlanguage," Cog. Sci. 1,3-46 (L977). 107. H. A. Simon, "Rational decision making in business organizations," Am. Econ. Reu. 69,493-513 (1979). 108. P. H. Winstort, "Learning and reasoning by analogy," CACM 23, (December1980). 109. J. G. Carbonell, Learning by Analogy: Formulating and Generalizing Plans From Past Experience,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, PaIo Alto, CA, pp. 137-L62, 1983. 110. P. H. Winston, Artificial Intelligence,Addison-Wesley,Reading, MA, 1994. 111. H. R. Alker Jr., J. Bennett, and D. Mefford, "Generalizedprecedent logics for resolving insecurity dilemmas," Int. Interact. 7, 165-206 (1980). LL2. J. G. Carbonell, Metaphor: An InescapablePhenomenonin Natural Language Comprehension,in Ref. 99, pp. 415-434. 113. Reference2, p. 227 LL4. H. R. Maturana, Biology of Knowledge,in R. W. Reiber (ed., The Neurophysiology of Language, Plenum, New York, L977. 115. I. Lakatos, Proofs and Refutations, Cambridge University Press, Cambridge,MA, 1976. 116. M. A. K. Halliday, Language as Social Semiotic,Edward Arnold, London, 1978. LL7. H. R. Maturana, Biology of Cognition, in H. R. Maturana and F. Varela (eds.),Autopoeisis and Cognition: The Realization of the Liuing, Reidel, Dordrecht, 1980,2-62. 118. P. H. Winston, "Learning new principles from precedentsand exercis€s,"Artif. Intell. 19, 321-350 (1982). 119. J. J. Katz and J. A. Fodor, "The structure of a semantic theory:' Language 39(2), L70-210 (1963). L20. R. C. Schank, "Conceptual dependency:A theory of natural language," Cog. Psychol.3,552-63 0972). LZL. L. W. Barsalou and G. H" Bower, "Discrimination nets as psychological modelsl' Cog.Scl. 8, L-26 (1984). L22. E. A. Feigenbaumand H. A. Simon,"EPAM-like modelsof recognition and learningi' Cog. Sci. 8, 305-336 (1984). t23. G. Duffy and J. C. Mallery, Relatus: An Artificial Intelligence Tool for Natural Language Modeling, AI Memo No. 847, Artificial Intelligence Laboratory, MIT, Cambridge, MA, 1986. L24. J. C. Mallery, Constraint-Interpreting Reference,AI Memo No. 827, Artificial Intelligence Laboratory, MIT, Cambridge, MA, 1986. L25. E. A. Feigenbaum, An Information ProcessingTheory of Verbal Learning, RAND, Santa Monica, CA, 1959. L26. J. L. Kolodner, "Reconstructive memory: A computer model," Cog.Sci. 7, 280-328 (1983). L27. J. L. Kolodner, "Maintaining organization in a dynamic longterm memoryi' Cog. Sci. 7,243-280 (1983). J. C. MallERY and R. Hunwlrz MIT G. Duppv University of Texas at Austin
HEURISTICS Heuristics are approximation techniques for solving AI problems. AI deals primarily with problems for which no practical exact solution algorithms are known, such as finding the shortest proof of a given theorem (seeTheorem provitg) or the
HEURISTICS 377 least costly plan for robot actions (see Planning). Heuristics provide uppro*imate methods for solving these problems with practical computational resourcesbut often at some cost in solution quality. Their usefulness is derived from the fact that the trade-offs among knowledge, computation time, and solution quality are generally favorabte. In other words, a small amount of approximate knowledge often buys a large 'improvement time. -Candidatein solution quality, and/or computation generally fall methods heuristic problems for are algorithms no exact which for those classes: into two known at all and those for which the known exact algorithms are computationally infeasible. As an example of the first class, consider the problem of computer vision (seeVision). The task is to take the output of a digitizing camera in the form of a two-dimensional matrix of pixel values representing color and light intensities, and transform it into a high-level symbolic description of objects and their spatial relationships. UnfortunatelY, there are no known algorithms for solving this problem that are guaranteed to always yield a "cortect" interpretation of the scene. Computer chessis an example of the secondclass of probIem (see Computer chess methods). In principle, there is an exact deterministic algorithm for always making an optimal move in a chess game. It requires generating all moves and countermoves in the game until only won, lost, and drawn positions remain, and propagating the ultimate outcomes of these positions back to the current position in order to choose an optimal move (seeMinimax procedure).Unfortunately, the number of positions that would have to be generated by such an algorithm could be as large as 1gtzo.Thus, although an exact solution to this problem is known, the computational cost of running the algorithm is prohibitive. In either casearrivin g at an exact solution is either impossible or impractical. Thus, AI programs must resort to heuristic techniques that provide approximate solutions. Their power lies in the nature of the trade-offs between domain knowledge (qt), computation, and solution quality. If the domain knowledge is fixed, increasedcomputation results in improved solution quality. Alternatively, if the amount of computation is held constant, more accurate domain knowledge producesbetter solutions. Finally, for a given level of solution quality, improved domain knowledge reduces the amount of computation required. The value of more accurate domain knowledge is that it improves the trade-off between computation and solution quality. For example, given no knowledge of chess,two algorithms suggest themselves: one is the complete minimax procedure for playing perfect chess,and the other is to make legal moves randomly. The minimax procedure producesperfect play but at a tremendous cost in computation, whereas the random algorithm is very efficient but generates very poor play. Introducing some heuristic knowledge allows some additional computation to produce large improvements in quality of play. For example, one heuristic for chessis to always make the move that maximizes one's relative piece or material advantage. Although less efficient than random Play, this heuristic provides a relatively efficient means of selecting a next move that results in play that is far superior to random play but still inferior to perfect play. Returning to the vision example, heuristics such as "adjacent pixels with the sameintensity values probably belong to the same object" and dramatically improve the ability of programs to interpret visual scenes,but at the risk of occasionallymaking mistakes.
The nature of these trade-offs among knowledge, computation, and solution quality determines the usefulnessof heuristic knowledge. If it were the casethat a large percentageof the knowledge and/or computation necessary for perfect performance was required for even minimal performance,heuristic techniqueswould not be practical. For example,if it were necessary to examine any significant fraction of the lgtzo chess boards in order to achieve even beginner skill levels, good chessprograms could not be built. On the other hand, if significant performancelevels can be achieved with relatively small amounts of knowledge and computation, heuristics becomevery cost-effective,at least until near-optimal performance levels are reached. In computer chess,for example, if quality of play is measuredas the percentage of human players that can be beaten by a given progroffi, small amounts of knowledge and computation provide large improvements in performance, at least initially. Only when Expert- or Master-level performance is achieved is a point of diminishing returns reached where additional performance increments come only with a large amount of knowledge or at great computational cost. One of the empirical results of the last 30 years of AI research is that for many problems, the knowledg., computation, and solution-quality trade-off is initially quite favorable. Thus, a little knowledge and computation goesa long way and heuristic programs have been spectacularly successful at achieving moderate levels of performancein a large number of domains. At the same time it becomesincreasingly difficult to improve the performance of progfams as they begin to approach expert levels of competence. HeuristicEvaluationFunctions Given this general discussion of heuristics as a background, almost all of the analytic and experimental work on heuristics per se has occurred on a special case of heuristics, namely heuristic evaluation functions. The only exceptionsto this rule are the development of heuristic production rules for particular problem domains and the EURISKO (qv) project, which is discussedbelow. A heuristic evaluation function is a function that maps problem situations to numbers. These values are then used to determine which operation to perform next, typically by choosing the operation that leads to the situation with the maximum or minimum evaluation. Heuristic evaluation functions are used in two different contexts: single-agent problems and two-player games. Single-AgentProblems.The classicAI example of a singleagent problem is the Eight Puzzle (seeFig. 1). It consistsof a 3 x 3 square frame containing eight numbered square tiles and one empty position. Any tile horizontally or vertically adjacent to the empty position can be slid into that position. The task is to rearrange the tiles from a given initial configuration into a particular goal configuration by a shortest sequenceof legal moves. The brute-force solution to this problem involves searching all move sequencesup to the length of the optimal solution. Since the Eight Puzzle has roughly 180,000 solvable states (9!12),thisapproachis feasibleby computer.However,for even the slightly larger 4 x 4 Fifteen Puzzle, which has approximately 10 trillion solvable states (16!l2), this brute-force approach is computationally intractable.
378
HEURISTICS
For example, if the constraint that position Y be empty is removed, the resulting problem allows any tile to move along the grid regardlessof where the empty position is. The number 5 6 8 7 of moves required to solve this simplified problem is exactly equal to the Manhattan Distance. 9 10 1 1 T2 If both constraints are removed, the resulting problem allows any tile to move directly to its goal position in one move. The number of moves neededto solve this problem is exactly l3 15 T4 equal to the number of tiles that are out of place. This is an obvious heuristic estimator for the original problem that is Figure 1. Eight and Fifteenpuzzles. even cheaperto compute than Manhattan Distance but is also Iess accurate. Finally, if only the constraint that positions X and Y be The standard heuristic approachto this problem makes use adjacent is removed, the resulting problem allows one to move of an evaluation function to guide the search. The heuristic any tile into the empty position, adjacent or not. The number evaluation function is interpreted as an estimate of the numgoal of movesrequired to solve this problem is the number of times the to state ber of moves required to map the current the empty position must be swappedwith another tile to solve for the function heuristic known best the state. For example, Eight Puzzle is called the Manhattan Distance heuristic. It is the problem, which suggestsanother heuristic estimate for the computed by taking each tile individually, measuring its dis- original problem. Although it is not as obvious how to express tance from its goal position in grid units, and summing these this value in closed.form, it is not necessary.A program can values for each tile. Note that this measure in general under- simply count the number of steps required to solve each simplified problem and use this count as a figure of merit for the estimates the number of moves since it does not take into moves in the original problem. A simplification schemeof this consideration interactions between the tiles. algotype was implemented for discovering heuristics in constraintGiven such an estimate, there are a number of different consider satisfaction (qt) problems (2). to move which decide it to of rithms that make use by required is than time in less find a solution to in order next Two-PersonGames. Although a heuristic evaluation funcbrute-force search (seeSearch). The simplest, often referred to for a single-agent problem is normally an estimate of the tion is to always as pure heuristic search or the greedy algorithm, to the goal, the exact meaning of a heuristic function distance minimum the with state the to selectnext the move that leads game is not as precise.Generally speaking, it for two-player of a goal. accuracy As the the to distance of estimate heuristic the heuristic improves, the amount of searchrequired to find a is a function from a game situation to a number that measures solution and the cost of the resulting solution both decrease. the strength of the position for one player relative to the other. player A stightly more complex algorithm, called A* (seeA* algo- Large positive values reflect good positions for one positions the for strong indicate values negative large of whereas number actual the estimate heuristic rithm), adds to the positions to moves always Max, player, called One opponent. the from state get curuent the to to used were moves that initial state, and then always selectsnext the state for which that maxim tze the heuristic evaluation function, whereas the minimize tt. this sum is a minimum. This amounts to selecting states in other player, Min, moves to positions that for the game of function evaluation simple a For example, solution of a cost total the of increasing order of the estimate Max's piecesand subof values weighted the is sum to chess additional an Given state. that pass through to constrained pieces.The weights reflect the constraint on a heuristic function that it never overestimate tract the weighted sum of Min's pieces, and the classic values are the of utilities different by the actual cost of a solutior, o constraint that is satisfied and Pawn-l. Note that Knight-3, Bishop-3, Rook-5, be shown it can Puzzre, Queen-9, Manhattan Distance for the Eight not to maximize mateis and ate, is checkm goal chess of the acmore a case that In solution. that A* finds an optimal goal in the an approximate represents curate heuristic reduces the amount of search required to rial. Material, however, Even if efficiently. computed be can of which game, status the find the optimal solution. A number of theoretical results game in the as game maximizematerial, is to of the object the search and accuracy quantify this trade-off between heuristic material maximizing of Othello, it is not necessarilytrue that efficiency(1). in the short term is the best way to maximize it over the long A more accurate evaluation function for chesswould inrun. HeuristicsFromSimplifiedModels. Where do theseheuristic additional components such as center control, pawn clude discovery their evaluation functions come from, and how can and mobilitY. structure, sugquestion, which first the be automated? One answer to technique used to increase the accuracy of a heuAnother evaluation heuristic is that second, the to gests an approach at the cost of increasedcomputation functions ut. derived from simplified models of the original ristic evaluation function idea is that, instead of directly basic The look-ahead. is called problem (1). current position and picking the of successors the for evaluating rule move legal the For example, one way of describing evaluation can be obtained by the Eight Puzzle is that a tile can be moved from position X to the best, a more accurate evaluating the positions at moves, several forward position y iff position X is adjacent to position Y and position searching values to the successorsof the up backing then and level, that the is removed, Y is empty. If either of these constraints algorithm. The minimax minimax the position by cupent the for idea The solve. to is easier result is a simpler problem that of a position where Min is (qv) computes the value generating heuristics is that the exact number of moves re- algorithm and the its successors, of values of the minimum the as move to to compute [uired to solve the simprer problem may be easy of maximum the as move to is Max position where a of value to needed moves of and can serve as an estimate of the number lookgames minimax most the values of its successors.For solve the original Problem. I
2
3
4
HEURISTICS 379 ahead search improves the accuracy of the evaluation with increasing search depth. Since improved accuracy results in better-quality play, look-ahead provides a nearly continuous trade-off between computation cost and quality of play. In practice, programs search as far ahead as possible given the ro*p,rtational resources available and the amount of time allowed between moves. Unifying One- and Two-Player EvaluationFunctions.Although most of the literature on heuristic search in singleug"ttl problems overlaps very little with that on two-player g"tn6, th"t" is a consistent interpretation of heuristic evaluation functions in both domains (3). In both casesan ideal heuristic evaluation function has two properties: When applied to a goal state, it returns the outcome of the search; and the value of the function is invariant over an optimal move from any given state. The outcome of a search is the figure of merit against which successis measured, such as the cost of a solution path or the win, Iose, or draw result of a game. Note that the constraints of determining the outcome and invariance over the best moves guarantee that suboptimal moves will have a different evaluation than the optimal moves. Taken together, these two properties ensure a function that is a perfect predictor of the outcome of pursuing the best path from any state in the problem space.Therefore, a heuristicsearch algorithm using such a function will always make optimal moves. Furthermore, any successful evaluation function should approximate these properties to some extent. For example, the evaluation function for the Ax algorithm is f(s) - g(s) + h(s),where g(s) is the cost of the best path from the initial state to the state s and h(s) is an estimate of the cost of the best path from state s to a goal state. Typically the h term is called the heuristic in this function, but for this text the entire function f is referred to as the heuristic evaluation function. When this function is applied to a goal state, the h term is zero,the g term representsthe cost of reaching the goal from the initial state, and hence /returns the cost of the path or the outcomeof the search.If h is a perfectestimator, then in moving along an optimal path to a goal state, each move increasesg by the cost of the move and decreasesh by the same value. Thus, the value of f remains invariant along an optimal path. It h is not a perfect estimator, / will vary somewhat depending on the amount of error in h. Thus, a good evaluation function for an algorithm such as A* will determine the outcome of the search and is relatively invariant over single moves. Now considera two-persongame using minimax searchand a heuristic evaluation function. The heuristic evaluation reflects the strength of a given board position. When applied to a state where the game is over, the function determines the outcomeof the game, or which player won. This is often added as a special caseto an evaluation function, typically returning positive and negative infinity for winning positions for Max and Min, respectively. When applied to a nongoal state, the function is supposedto return a value that predicts what the ultimate outcome of the game will be. To the extent that the evaluation is an accurate predictor, its value should not changeas the anticipated moves are made. Thus, a goodevaluation function should be invariant over the actual sequenceof moves made in the game. Therefore, in both domains a good evaluation function should have the properties of determining outcomeand being invariant over optimal moves.
LearningEvaluationFunctions.The idea that heuristic evaluation functions should remain invariant over optimal moves can also be used to automatically learn (seeLearning) evaluation functions. The basic idea is to search in a spaceof evaluation functions for a function that has this invariance property. This is done by computing the difference between direct evaluations of positions and the values returned by look-aheadand modifying the evaluation function to reduce this difference. This idea was originally used by Samuel in a pioneering program that automatically learned a very powerful evaluation function for checkers based on a large number of different factors (4) (see Checkers-playing programs). A refinement of Samuel's technique used linear regression to automatically learn a set of relative weights for the chess piecesin an evaluation function basedjust on material (3).The basic idea is that any board position gives rise to an "equation" that constrains the ideal evaluation function. The left side of the equation is the function as applied to the given position, and the right side is the backed-upvalue of the function resulting from look-ahead search. In an ideal evaluation function these two values would indeed be equal. By generating a large number of such "equations," one from each board position, linear regression can be used to find the set of weights that provides the best approximation to an invariant evaluation function. Iterating this entire processover successiveapproximations of the heuristic function produces a converging sequence of weights for the pieces. HeuristicRules Although most work on heuristics has focusedon humerical evaluation functions for one- or two-player games, the EURISKO (qv) project has addressedthe nature of heuristics in general (5). The lessonslearned from EURISKO are consistent with, but more general than, the results concerning heuristic evaluation functions. Recall that heuristic evaluation functions derive their power from their relative invariance over single moves in the problem space. In other words, the value of a given state is roughly equal to the value of the state resulting from making the best move from the given state. This can be viewed as a form of continuity of the evaluation function over the problem space. This idea was originally suggestedin the more general context of heuristic production rules for determining what action to apply in a given situation (5). A production rule is composed of a left side that describesthe situations in which the rule is applicable and a right side that specifiesthe action of the rule (see Rule-based systems). Consider the function Appropriateness(Action, Situation), which returns somemeasure of appropriateness of taking a particular action in a particular situation. The claim is that heuristics derive their power from the fact that this function is usually continuous in both arguments. Continuity in the situation argument means that if a particular action is appropriate in a particular situation, the same action is likely to be appropriate in a similar situation. Continuity in the action argument means that if a particular action is appropriate in a particular situation, a similar action is also likely to be appropriate in the same situation. Furthermore, this appropriatenessfunction is time-invariant, which amounts to a strong form of continuity in a third variable, time. In other words, if a particular action is appro-
380
HORIZON ETFECT
priate in a particular situation, that sameaction will be appropriate in that same situation at a later time. If the notion of an action is broadenedto include an evaluation, the invariance of an evaluation function can be viewed as a special caseof this idea where the situation variable ranges over different states of the same problem. Similarly, the use of an exact evaluation from a simplified problem as a heuristic evaluation for the original problem can be viewed as another example of this general rule where the situation variable is allowed to range over similar problems. The notion of continuity of appropriateness over actions and situations was used to automatically learn heuristic production rules. In Eurisko, both the situation and action sides of a rule are describedusing a large number of relatively independent features or parameters. Given a useful heuristic, Eurisko generates new heuristics by making small modifications to the individual features or parameters in the situation or action sides of the given heuristic. The continuity property suggeststhat a large number of heuristics derived in this way will be useful as well.
sentation is employed in which specific piece configurations represent discrete states, and the moves that are legal from these positions represent the permissible operators. A lookahead game tree (qv) is developed by generating all of the positions that could be produced by every possible move sequence for the two players. Since it would take literally millions of years to examine all possiblelines of play until each reached a terminal state (win, lose, or draw), existing game programs search only a few moves ahead (usually three to six) and then artificially declare the position as "terminal" and make a heuristic (qr) evaluation of whether it is good for the player on the move. The values assigned to these end points are then "backed up" to the initial position by using a minimax stratery (qv) (2). The backed-up value for each of the potential moves at the initial position determines which is the best. TerminalPositions
Positions that are declared terminal may be, in fact, very turbulent. For example, in chess,a so-calledterminal position might be one that is in the middle of a queen exchange. The Conclusions heuristic evaluation calculated for such a position will be inacHeuristics are approximation techniques for solving AI prob- curate becausethe queen discrepancywill be correctedon the lems. Approximations, however, are only useful in domains next move. This common problem has been addressedrouwith some form of continuity. Thus, the power of heuristic tinely in chess by developing a quiescencefunction that astechniques comesfrom continuities in their domains of appli- sessesthe relative material threats for each side and adjusts cability. The successof heuristic techniques in AI can be taken the evaluation function accordingly. Sometimesthis is doneby as evidencethat many domains of interest contain continuities direct calculation and sometimes by a minature look-ahead search from each terminal position examining only capturing of various kinds. moves and a subset of checking moves. This approach is usually reasonably accurate with respect to material considerations but is often blind to positional factors, which may be in BIBLIOGRAPHY a turbulent state. An example of positional turbulence is a piece en route to an important location where it will exert a Reading,MA, 1984. 1. J. Pearl,Heuristics,Addison-Wesley, 2. R. Dechterand J. Pearl,The Anatomyof EasyProblems:A Con- commanding presence.Despite its attractive destination, its s of theI{ inth I nterna- current position may appear to be weak or even dangerous. Formulation,Proceeding straint-satisfaction tional Joint Conferenceon Artificial Intelligence, Los Angeles, CA, Other dynamic positional factors include a trapped piece, a August 1985,pp. 1066-L072. pawn in a crucial lever role, and a pawn aspiring for promo3. J. Christensen and R. E. Korf, A Unified Theory of Heuristic Evalution. Current quiescencefunctions often misevaluate these poation Functions and Its Application to Learning, Proceedingsof the Fifth National Conferenceon Artificial Intelligence, Philadelphia, PA, August 1986,pp. 148-L52. 4. A. L. Samuel, Some Studies in Machine Learning Using the Game of Checkers, in E. Feigenbaum and J. Feldman (eds.),Computers and Thoughf, McGraw-Hill, New York, 1963,pp. 71-105. 5. D. B. Lenat, "The nature of heuristics,"Artif.Intell.l9(2), 189-249 (October1982). R. E. Konr' UCLA This work was supported in part by NSF Grant IST 85-15302,by an NSF Presidential Young Investigator Award, and by an IBM Faculty Development Award.
HORIZONEFFECT Two-person zero-sum, strictly competitive games such as chess,checkers,and Othello can be played quite skillfully by a computer. The methodology most commonly used today dates back to a seminal paper by Shannon (1). A state spacerepre-
sitions. Berliner (3) provided the name horizon effect to this classof problems because the arbitrary search termination rule causedthe program to act as if anything that was not detectable at evaluation time did not exist. Berliner defined two different versions of this phenomena, a negative-horizon effect and a positive-horizon effect. The negative-horrzon effect involves a form of self-delusionin which the program discoversa series of forcing moves that push an inevitable unpleasant consequencebeyond the searchhorizon. The program manages to convince itself the impending disaster has gone away when in fact it is stitl lurking just beyond the search horizon. In essence,the negative-horuzoneffect is an unsuccessfulattempt to avert an unpleasant outcome.The positive-horizon effect is a different form of self-delusion. In this effect the program attempts to accomplish a desired consequencewithin the search horizon even when the outcome would be much better if postponed a few moves. In Berliner's words the program "prematurely grabs at a consequencethat can be imposed on an opponent later in a more effective form." Both of these effects are based on improper quiescence,and usually this has to do with the evaluation of positional factors.
HORIZON EFFECT
381
Negative-HorizonEffect An excellent example of the negative-hortzorteffect occurred in a computer chess match (4) at the sixth North American computer chess championship (Minneapolis, 1975) between programs from Northwestern University and the University of Waterloo. Figure 1 depicts the game position after black's twelfth move, Ra8 to b8, attacking the advancedwhite pawn atb7. This position resulted from an early exchangeof queens and minor pieces.At this juncture, white is destined to losethe advanced pawn, which will even up the material but leave white with a slight positional advantage (its king is castledand its rook dominates the queen file). The Northwestern program placed a high value on the passedpawn on the seventh rank. Instead of accepting the inevitable loss of the pawn, white deviseda plan to "save" it by making liberal useofthe negativehorizon effect. In its look-aheadsearchwhite discoveredthat it could advance pawns on the rook file and knight file, which would force black to retreat the bishops. The tempos used in these pawn thrusts were sufficient to push the eventual capture of the white pawn at b7 over the search horizon. White continued the actual game by playing 13. a3, forcing the black bishop atbf to retreat. White followed with 14. h3, forcing the black bishop at 94 to retreat. White's next move continued the sametheme, 15. 94,forcing the black bishop to move again and substantially weakening white's defensiveposition. From the computer's perspective these attacking pawn moves were effective becauseeachone savedthe pawn atb7. In reality, these moves,especially L5. 94, weakenedwhite's position. Positive-HorizonEffect The positive-horizon effect can be demonstratedwith the position presented in Figure 2 with white to move. In this situation white's pawn advant age provides excellent winning chances.For most programs the look-ahead search will not be
iir
iii
1[ : -a-: .r:
I.
A :I :'fr: A i.i*;
erD+ r[-J:
g
-
I
I
A iA
m
€
n
:r I
€
.
( t .
r\
a Figure 2.
sufficiently deep to "see" the pawn promotion. Therefore, the correct move choice must be based on heuristic factors such as moving the pawn closer to the eighth rank. With a typical shallow search,white is likely to push the pawn immediately, ignoring the black knight's threat to capture becausewhite can recapture. Heuristic evaluation functions usually consider a knight to be worth as much as three pawns, and therefore, the program would assumethat black would not initiate such a foolish exchange.In reality, the exchangeof the knight for the pawn is good for black since it transforms a losing situation into a draw. This conclusion is based on the knowledge that white can only win by promoting the pawn, and thus the pawn in this situation is much more valuable than the knight. Programs that know about the future only in terms of their immediate look-ahead search underestimate the value of the pawn becauseits "moment in the sun" lies beyond their searchhorizon.Most chessprograms would throw away the win by giving their opponent the opportunity to exchangethe knight for the pawn. This positive-horizon effect differs from the negativehorizon effect in that it results from an inability to understand long-range consequencesand is not influenced dramatically by moving the search horizon one or two plies deeper (see also Computer chessmethods).
A
a A ffi H Figure 1.
.)(' :) < :/,)
-=t
BIBLIOGRAPHY
A E g
1. C. E. Shannon, "Programming Mag. 4I, 256-275 (1950).
a computer to play chess," Philos.
2. P. w. Frey, An Introduction to Computer Chess, in P. w. Frey (ed.;, Chess Skill in Man and Machine, Springer-Verlug, New York, pp. 54-91,1993. 3. H. J. Berliner, Some Necessary Conditions for a Master Chess Program, Proceedings of the Third International Joint Conference on Artifi.cial Intelligence, Stanford, CA, pp. 77-85, 1973.
382
HOUGH TRANSFORM
4. B. Mittman, A Brief History of the Computer ChessTournaments: 1970-1975, in P. Frey (ed.), Chess Skill in Man and Machine, Springer-Verlag, New York, pp. 27-28, 1983.
Description
In the HT, features of phenomena (e.g., shape features) in an input space produce votes for phenomena in a parameterized spaceof causesor explanations (e.9.,shapelocation) transform P. W. Fnnv with which the features are compatible (see Feature extracNorthwestern University tion). Explanations garnering the most votes are those that account for the most features. For example, points in (x, y) input space may lie on (be explained by) a line described in programming. Logic See HORN CIAUSES. parameter spaceby the two parameters m and b in the equation y : m^x* b. A point in input (x,y) spacepresumedto lie on a line producesa locus of votes in parameter spacefor all lines HOUGH TRANSFORM on which it could lie. (This locus happens to be in a straight The Hough transform (HT) denotes any of several parameter line in (m, b) space.)The vote locus of a secondpoint intersects estimation strategies based on histogram analysis, in which the first (adds to it) only at the (m, b) parameters of the single histogram peaks, (modes)in a transform space identify phe- (infinite) line containing both feature points. All other feature nomena of interest in an input feature space.The name origi- points colinear with the first two contribute votes to this (m, nates from L962 invention for locating lines in bubble chamber b), and no other points do. If the input space is ideal edge photographs (1). Since then the idea has becomewidespread elements-(r, y, orientation) triples describing image bright-, and of considerable engineering importance. In computer vi- nessdiscontinuities-each edgeelement castsa single vote for sion (qv) it was first used to identify parameterized curves the one line passing through it at the correct orientation. After (e.g.,conics) in images (2). HT has been generalized to detect voting, peaks (modes) in the parameter space correspond to nonparametric shapes of arbitrary scale and rotation (3,4). image lines through the greatest number of lined-up edge eleThe HT process has been postulated to occur in abstract fea- ments regardless of their sparsenessor other confusing edges ture spacesduring human perception (5) and is a widely appli- in the image. Multiple lines in the input do not interfere but grve multimodal results in parameter space. Figure 1 cable form of evidence eombination.
Figure l. Circle detection. An input grayscale image (o) is processedwith an edge detector, yielding an orientation at each point. The edge strength, or contrast, is shown in (6). For each of several radii E; there is an accumulator array Ai the same size as the image. Each edge element votes into eachA; for two possiblecenters.B;away from the edgein both directions orthogonal to it. The accumulator for one of the larger radii is shown in (c). Peaks in the three-dimensional (x,y,R) accumulator are interpreted as circles and displayed in (d).
HUMAN-COMPUTERINTERACTION
shows circle detection with edge element input. An HT implementation of general shape matching is formally equivalent to template matching (matched filtering). With HT, the computational effort (voting) grows with the number of matchable features in the input, not the size of the input array (6). Practicallssues HT is a form of mode-basedparameter estimation that is complementary to mean-based (such as least-squared error) techniques. Least-squared eruor methods may be preferable if all the data originate from a single phenomenon, there are no "outlier" points, and data are corrupted by zero-mean noise processes.Mode-basedestimation is indicated if there are several instances of the phenomenon of interest in the input or if the data are incomplete or immersed in clutter or outliers. Parametric HT finds parameters that may describe infinite objects.Line detection is a goodexample: Further processingis needed to find end points of line segments. Noise of several varieties can affect HT (6,7) and can be combatedby standard techniques. Uncertainty in any feature parameter (e.g., edge orientation) may be accommodated either by using a set of votes spanning the uncertainty range or by smoothing the accumulator array before peak finding. Votes may be weighted according to the strength of the feature producing them. Votes are usually collected in discrete versions of parameter space implemented as arrays. Parameter spaces involving three-dimensional directions are often represented with more complex data structures, such as spheres or hyperspheres. High-resolution or highdimensionality arcays can have large memory requirements. A solution is to implement the accumulator as a hash table. If each feature detector is prewired to its associatedparameters, in transform space the "voting" happens in parallel instantaneously and can be considered as excitation in a network (8). In two-dimensional shape detection the parameter spaceis usually (x,!,0,s), for location, orientation, and scale.A highdimensional parameter spacemay sometimes(with ingenuity) be decomposedinto a sequenceof lower dimensional spaces, making voting less expensive. Parameters in accumulator space must be independent if a mode is to correspond to a unique tuple of parameters. The global nature of the HT, accumulating evidencefrom the entire input space,can be a drawback for some applications. One remedy is to decomposethe input space into a set of regions small enough to enforce the desired locality. The histogram generation and analysis neededfor HT admit parallel solutions.
BIBLIOGRAPHY 1 P. V. C. Hough, Method and Means for RecognizingComplex Patterns, U.S. Patent 3,069,654,DecemberL8, L962. 2. R. O. Duda and P. E. Hart, "Use of the Hough transform to detect lines and curves in pictures," Commun. Assoc.Comput. Mach. 15, 11-15 Q972). 3. D. H. Ballard, "Generalizing the Hough transform to detect arbitrary shapes,"Patt. Recog.13(2),111-122 (1981). 4. D. H. Ballard and C. M. BrowD, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, L982.
383
5. H. B. Barlow, "Critical limiting factors in the design of the eye and visual cortex," Proc. Roy. Soc.Lond.B2l2(L), 1-34 (1981). 6. C. M. Brown, "Inherent bias and noise in the Hough transform," IEEE Trans. Patt. Anal. Mach. Intel,l. PAMI-5, 493-505 (September 1983). 7. S. D. Shapiro and A. Iannino, "Geometric constructionsfor predicting Hough transform performance," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-l(3), 310-317 (July 1979). 8. D. H. Ballard, G. E. Hinton, and T. J. Sejnowski,"Parallel visual computation,"Nature 306(5938),2I-26 (November3, 1983). C. BnowN University of Rochester
INTERACTION HUMAN-COMPUTER The recent history of advancesin the study and techniques of human-computer interaction has been intertwined with that of AI; each has contributed to the other. At times research in AI has developedtechniques to improve user-computer communication, and at other times the unique demands placed on the users and programmers of AI systems have led them to be the first to apply innovative techniques for human-computer communication. BecauseAI systemsare often designedto perform complicated and poorly understood tasks, they need to interact with their users more intimately than other systems and in more complex, less stereotypedways. AI programs are also among the most complicated programs written, Ieast amenable to being specifiedclearly in advance, and most unpredictable. Hence their programmers have been the first to need such advancesas powerful interactive debuggers,editors, programming tools, and environments, and they have developed many of them. This entry examines the reciprocal connectionsbetween the study of human-computer interaction or human factors and AI from each of the following directions: 1. specificfields of AI directly useful in constructing humancomputer interfaces, such as speech recognition (some of these topics are coveredin separate entries in this volume and are mentioned only briefly here); 2, by-products of AI programming that have proven useful in designing human-computer dialogues; and 3. developments in the study of the human factors of humancomputer interaction that are helpful in designing user interfaces for complex AI systems. Finally, this entry indicates how the two fields of study overlap in their concernsand how insights into cognitive psychology from both fields will help to build more intelligent, natural user interfaces in the future. SpecificAl Applicationsto Human-ComputerInteraction Natural Language.Among those areas of AI researchuseful in improving human-computer interaction, the most obvious is the study of natural language (seeNatural-language understanding). Researchinto how natural language is understood can permit human-computer dialogues to be conducted in such a language (although this is not always an unalloyed benefit, BSdiscussedsubsequently).The study of natural-lan-
384
HUMAN-COMPUTERINTERACTION
guage input has its roots in early work in machine translation and, later, in query-answering systems. Systems such as ELIZA (qv), SHRDLU (qv), and BASEBALL (qv) demonstrated that computers could conduct plausible natural-language dialogues in restricted domains. But proceeding from that point to a general ability to accepta wide range of natural language has been difficult. A natural-language processing system generally contains three parts: a dictionary or lexicon of the words it accepts;a grammar, which describes the structures of the sentencesit accepts;and a semantic component,which assigns interpretations to or performs actions in responseto the input. Syntax is typically represented in the secondcomponentby a set of productions or an augmented transition network (seeGrammar, augmented transition network). Some systems combine the latter two componentsinto a semantic glammar, putting the semantic rules or actions directly into the syntax grammar. Tlrey use a specialized grammar designedfor a particular domain of discourse and subset of the language (1). This approach provides an effective way to build systemsthat accepta relatively constrained subsetof natural language in a particular domain, but it is difficult to expand to larger, more general areas of the language. The alternative, use of a purely syntactic grammar and leaving the semantics in a separate component, is helpful for building a system with broad coverage,but the syntactic component will often identify a wide range of possible parses of a sentence,which can only be narrowed by the semantic. Thus, such systems tend to perform searches with considerable backtracking. Still other alternative approaches,such as systemsdriven by semantically basedscripts (qv) rather than syntax (2) and menu-basednatural-language (qv) systems (3), have also been used successfully.Finally, to complete a dialogue in natural language, it is necessaty to generate sentences from internally stored information, and approachesthat go beyond simply storing canned responses have been studied (4,5). Given the present state of the art, it is possible to construct a practical natural-language system for a specifiedsubset of a language in somenarrow, well-defined domain. Such a system requires that a considerableamount of knowledge about that domain be built into the lexicon, Brahmar, and semantic component and thus much effort that cannot be reused in another natural-language system. Systems that can handle a broad range of language on many topics remain a research goal. Speech.Another important area of AI research is the processing of speech,both accepting as input and generating as output. Speechis an attractive input medium becausepeople already know how to use it, they can generally speak faster than they can write or type, and it leaves their hands free for other tasks. Recognition of isolated words is a relatively wellunderstoodproblem, and commercial systemsare available for this task (6). Accepting continuous speechhas proven significantly less tractable, Iargely becausenormal speakersdo not pause between words. It is generally not possible to divide a speechinput signal into words simply through signal processirg; it requires knowledge of the meaning and context of the utterance. Thus, speech understanding (qv) involves both a signal processing or pattern recognition component, which identifies words or other parts of the input signal, and a semantic component,which assigns meanings to the utterance. For systemsthat go beyond isolated words, there must be feedback between the two; and to function effectivelY, the latter
component requires considerableknowledge about the underlying language and the domain of the discourse.Thus, work in speechinput is intimately connectedto the study of natural language and knowledge representation. Much of the principal work on speechunderstanding was performed under the aegis of the ARPA SpeechUnderstanding ResearchProgram between 1971 and 1976.The principal projects, which included HEARSAY (qv), HARPY (qv), and HWIM, all emphasizedthe problems of representation and use of knowledge about the spokenlanguage, and each used different approachesto them. More recent work has extendedthese ideas, but robust, production-quality continuous speechinput continues to be an elusive goal (7). The area of speechgeneration is also important, but much of it is sufficiently well understoodand widely available that it is no longer considereda topic in AI (8). Current research focuses more on reducing the cost of large vocabularies through coding techniques and on improving naturalness. PatternRecognition. Computer vision (qv) or pattern recognition (qv), appropriately applied, is also relevant to humancomputer interaction, as it can permit computer input in the form of gestures much as people use in communicating with one another. An example of this approach, without using sophisticated pattern recognition, was demonstratedby Bolt (9). Similar AI techniques could also be used to accept rough sketchesdrawn by people as a form of computer input, again similar to the mode used for communication between people. Going further, such gesture input can be combined with improved displays to be an important componentof a user interface that resemblesa real-world environment (10). "lntelligent" User lnterfacesand Computer-Aidedlnstruction. The above has examined some specifictechniques or modalities of human-computer interaction derived from AI research.What can be said of a human-computer interface that begins to exhibit more generally intelligent behavior, beyond simply competencein one or more of the specific interaction media discussed?An intelligent human communication partner can: accept and compensatefor many types of incorrect or incomplete inputs; realize when the conversational partner has misperceived something and provide explanations to rectify the underlying misconception; infer the underlying goals in a statement or question, even where they are at odds with those stated; follow the changing focus of a conversation; maintain and update an estimate of the partner's interests, knowledge, and understanding; and construct replies that take into account that current estimate. There is research in AI that attempts to understand and duplicate some of these processes.The bulk of it has thus far been conductedin the area of computer-aidedinstruction (CAI, see Intelligent CAI) in order to build "intelligent tutors." Such systemsattempt to model a student's (incomplete)understanding of material and present new material or leading questions appropriate to the student's current level of knowledge and ability.
HUMAN-COMPUTERINTERACTION
For example, SOPHIE (qv) watches a student troubleshoot electronic equipment, answers his or her questions,and criticizeshis or her hypotheses.WEST and WUMPUS both observe students playing computer games and offer suggestionsbased on inferences about the students' skill made from watching their moves. SCHOLAR (qt) asks its student leading questions when it finds deficiencies in his knowledge. MENO-II finds bugs in student programs and identifies the underlying misconceptionthat causedthe bug. GUIDON (qv) is built upon a rule-based system. By presenting example cases,it attempts to deducewhich of the rules in its knowledge base the student already knows and which he or she is ready to learn. It also manages the overall flow of the dialogue with the student, selectstopics for study, selectsappropriate presentation techniques, maintains context, and allows for unsolicited inputs (11,12). Some such work has extended outside traditional CAI. For example, The University of California (13) uses these techniques in an intelligent help system. It attempts to infer the user's underlying goals and intentions and provides answers that take this information into account in addition to the specific question asked. Other intelligent help systemsvolunteer advice when appropriate (L4). This sort of research into problems such as modeling a user's information state in a dialogue, inferring user's misconceptions, and constructing appropriate replies has been concentrated in the area of CAI, but it is applicable to the design of intelligent user interfaces or intelligent dialogues in any area. By combining many of these individual techniques, one can take the notion of an intelligent user interface and carry it somewhat further, to build a user-modeling system that can describe and reason about what its user knows and conduct a dialogue with the long-term flow and other desirable properties of dialogues between people.Such a system would maintain and use information about the user and his or her current state of attention and knowledge, the task being performed, and the tools available to perform it (15,16). For example, when the underlying application program sends information to the user, this user agent can control its presentation based on its model of what the user already knows and is seeking and remove information irrelevant to his current focus. It is important to remember that such an intelligent user interface is by no means restricted to natural language.Most researchon the processesneededto conduct such dialogueshas concentrated on natural language, but they apply to any human-computer dialogue conducted in any language. For example, STEAMER (17) demonstrates a dialogue in a rich graphical language using powerful and natural state-of-theart input and output modalities. The user's side of the dialogue consistsalmost entirely of pointing and pressing mouse buttons and the computer's of animated pictorial analogs. A dialogue in such a language could also exhibit the intelligent user interface properties discussedhere-following focus,inferring goals, correcting misconceptions. Further, knowledge-based techniques can be applied to improve the selection, construction, and layout of appropriate graphical representations for the output side of the dialogue (18). Adaptation. An intelligent user interface would also exhibit some learning and adaptation to the user. The simplest form such adaptation could take uses explicit input: A user enters instructions about the way he or she wants the dialogue to be conducted,and the subsequentdialogue usesthis information.
385
This is already available in, for example, facilities for defining aliases or command proceduresor using profiles. A more intricate form of adaptation uses implicit inputs: The computer obtains information about the user without actually asking him for it. This can be done in two ways: using information intrinsic to the dialogue or using external information about the user (19). Examples of the former are: using information about the user's errors, choice of commands, or use of help features to decidewhether he is an expert or novice user; inferring the focus within which a command should be interpreted from the preceding sequenceof user commands;and measuring and using the length and distribution of user idle periods. The other possibility is to use implicit measurementsobtained from inputs outside the actual dialogue. For example, sensors might try to determine whether the user was actually sitting at his terminal (or had left the roorn) or what the user was lookin g at and, from that, the context within which his commands should be interpreted (20). Another way to classify adaptation is by time span. Changes like renaming a command are intended to be long term. Explicit inputs are generally used only for such longterm adaptation becauseit is too much trouble to enter them more frequently. Short-term adaptation to changes in the user'sstate relies on implicit inputs. A systemcould usethe fact that he or she is typing faster, making more semantic errors, or positioning a cursor inaccurately to make short-term changesin the pace or nature of the dialogue. Short-term adaptation using implicit inputs is a potentially powerful technique for creating adaptive human-computer dialogues.Some beginnitrg steps in this direction are demonstrated by Edmonds (2I). Other Al Developmentsin Human-Computerlnteraction Becauseof the complexity of AI prograffis, their programmers have been pioneers in the development and use of innovative human-computer interaction techniques, which are now used in other areas. The development of powerful interactive programming environments was spearheadedby AI programmers developing large LISP programs. They required and developed complex screen-orientededitors, break packages,tracing facilities, and data browsers for LISP programming environments (22,23), More recent interaction methods, such as overlapping display windows, icons, multiple contexts, use of mice, pop-up menus, and touch scleens had their roots in AI programming. Many of these were developed by workers at Xerox PARC, both in Interlisp and Smalltalk (in parallel and with considerable interaction between the two). Many of these ideas were spawned and made practical by the availability of powerful graphics-oriented personal computers in which a considerable fraction of the computing resourcesin the unit was devotedto the user interface. Recent programming systemsthat combine and exemplify these include Interlisp (qv), the MIT LISP Machine (qv), Smalltalk (qv), and LOOPS (24). The combination and effective use of many of these techniques have been demonstrated by a variety of systems (e.g.,Ref. 25) and most notably in STEAMER (17). These techniques are moving out of the AI community into all areas of human-computer interaction, including small personal computers. One continuing problem is that, although interfaces involving such techniques are often easier to learn and use than conventional ones, they are currently consider-
386
INTERACTION HUMAN-COMPUTER
ably more difficult to build since they are typically programmed in a low-level, ad hoc manner. Appropriate higher level software engineering conceptsand abstractions for dealing with these new interaction techniques are needed. DesigningHuman-ComputerInterfaces The study of human factors and user psychology over the past few years has paralleled that of AI. Its results are now finding application in the design of better user interfaces for AI and other complex systems. AI systems stretch the limits of what has been or can be done with a computer and thus often generate new human-computer communication problems rather than alleviating them. The methods of human factors-task analysis, understanding of interaction methods and cognitive factors, and empirical testing of alternatives with users-are thus especially applicable to designers of AI systems. Design of a human-computer interface begins with task analysis, an understanding of the user's underlying tasks and the problem domain. It is desirable that the user-computer interface be designed in terms of the user's terminology and conceptionof his or her job, rather than the progTammer's.A good understanding of the cognitive and behavioral characteristics of peoplein general as well as the particular user population is thus important, as is knowledge of the nature of the user's work. The task to be performed can then be divided and portions assignedto the user or machine based on knowledge of the capabilities and limitations of each. AI often expands the capabilities of the computer side, but for all but fully autonomous systems,the user is likely to play some role in performing or gpiding the task and hence will have to interact with the machine. ' Stylesof Human-Computer Interfaces. A style of user interface appropriate to the task should be selected.The principal categories of user interfaces currently in use are command Ianguages, menus, natural language, and gfaphics or "direct manipulation" (26). Command language user interfaces use artificial languages,much like programming languages. They are concise and unambiguous but are often more difficult for a novice to learn and remember. However, since they usually permit a user to combine their constructs in new and complex ways, they can be more powerful for advanced users. They are also most amenable to programming, that is, writing prog1ams or scripts of user inPut commands. Menu-based user interfaces explicitly present the options available to a user at each point in a dialogue. Thus, they require only that the user be able to recognize the desired .ntry from a list rather than recall it, placing a smaller load on long-term memory. They are highly suitable for novice users. A principal disadvantage is that they can be annoying for experienced users who already know the choicesthey want to *.k" and do not need to seethem listed. Well-designedmenu systems, however, can provide bypasses for expert users. Menus are also difficult to apply to "shallow" languag€s,which have large numbers of choices at a few points, becausethe option display becomestoo big. Natural-language user interfaces are considered above. Their principal benefit is, of course, that the user already knows the language. However, given the state of the art, such an interface must be restricted to some subset of natural languege, and the subset must be chosencarefully, both in vocab-
ulary and range of syntactic constructs. Such systems often behave poorly when the user veers even slightly outside the subset. Since they begin by presenting the illusion that the computer really can "speak English," the systems can trap or frustrate novice users. For this reason, the techniques of human factors engineering can help. A human factors study of the task and the terms and constructs people normally use to describe it can be used to restrict the subset of natural language in an appropriate way based on empirical observation (27). Human factors study can also identify tasks for which natural-language input is good or bad. Although future research in natural language offers the hope of human-computer communication that is so natural it is "just like talking to a person," such conversation may not always be the most effective way of commanding a machine (28). It is often more verbose and less precise than computer languages. In some settings people have evolved terse, highly formatted languages, similar to computer languag€s, for communicating with other people. For a frequent user the effort of learning such an artificial language is outweighed by its conciseness and precision, and it is often preferable to natural language. Finally, recent advances have led to a graphical or direct manipulation (26) style of user interface, in which objectsare presented on the screen, and the user has a standard repertoire of manipulations that can be performed on them. There is no command language to remember beyond the set of manipulations, and generally any of them can be applied to any visible object. This approach to user interfaces is in its infancy. Some current examples include Visicalc, the Xerox Star, STEAMER (17), and, of course,many video games. Although object-orientedlanguageslike Smalltalk and powerful graphic input and output techniques make such interfaces easier to build, an important difficulty in designing them is to find suitable manipulable graphical representations or visual metaphors for the objectsin the problem domain. The paper spreadsheet (Visicalc), desk and filing cabinet (Star), and engine control panel (STEAMER) were all fortunate choices.Another problem with direct manipulation interfaces is that it is often diffi.rrlt to create scripts or parameteri zedprograms in such an inherently dynamic and ephemeral language. Various modalities of human-computer communication may also be employedas appropriate in designing a user interface. Keyboards and text displays are common,but somemore mod.ern modalities include, for output, saPhics, windows, icons, active value displays, manipulable objects,speech,and other sounds. Techniques for input include keys that can be dynamically labeled, interactive spelling correction and commice, speech, gesture, and visual line of mand .o*ft.tion, gaze.Eaclrmust be matched to the tasks for which it is used. DesignTechniquesand Guidelines.A variety of tools, techniques, and guidelines from human factors engineering can be brought to bear on the design of the user interface (29,30).One important principal is that of empirical measurement. Decisions about user interface design should be based on observations of users rather than on a designer's or programmer's notions. Careful use of empirical measurement also encourages the establishment of precise performance objectivesand metrics early in the development of a system. Alternative designs can then be tested against them empirically as the work progresses(31-33).
HUMAN-COMPUTERINTERACTION
In addition to specifictests of proposeduser interfaces,some general principles have been derived from laboratory experiments. For example, a user interface should be consistent; similar rules should apply for interpreting commands when the system is in what appear to the user to be similar states. Command names, order of arguments, and the like should be as uniform as possible, and commands should generally be available in all states in which they would be plausible. The system should also be predictable; it should not seemerratic. A small difference in an input command should not result in a big difference in the effect (or time delay) of the response. Unpredictability makes the user anxious, continually afraid of making an irrevocable mistake. A general backup facility, which lets the user undo any command after it has been executed, is one way to allay this anxiety. A fully general undo facility is difficult to implement but has been demonstratedin the Interlisp programming environment. More generally, a system should exhibit causality; the user should be able to perceive that the activity of the system is caused directly by his or her actions rather than proceeding seemingly at random. The state of the system should be visible to the user at all times, perhaps by a distinctive prompt or cursor or in a reserved portion of the screen. Systemscan be easy to learn and/or easy to use,but the two are differentn sometimesconflicting goals. Designs suitable for novice users may interfere with expert users; features like help facilities or command menus should be optional for experienced users. A good command language should consist of a few simple primitives (so as not to tax long-term memory) plus the abitity to combine them in many ways (to create a wide variety of constructs as needed, without having to commit all of them to long-term memory). The user interface should also exploit nonsymbolic forms of memory. For example, it can attach meaning to the spatial position on a display screen (certain types of messagesalways appear in certain positions)or to icons, typefaces,colors, or formats. One way to help design a user interface is to consider the dialogue at several distinct levels of abstraction and work on a design for each. This simplifies the designer's task becauseit allows him or her to divide it into several smaller problems. Foley and Wallace (34) divide the design of a human-computer dialogue into the semantic, syntactic, and lexical levels. The semantic level describes the functions performed by the system. This corresponds to a description of the functional requirements of the system but doesnot addresshow the user will invoke the functions. The syntactic level describes the sequencesof inputs and outputs necessaryto invoke the functions described. The lexical level determines how the inputs and outputs are actually formed from primitive hardware operations. With appropriate programming techniques,these aspectsof the dialogue can be desigxedand programmed entirely separately (35). Another approach that can help the designer and software engineer is the user interface management system (UIMS). A UIMS is a separate software component that conducts all interactions with the user; it is separate from the application program that performs the underlying task. It is analogous to a database management system in that it separates a function used by many applications and moves it to a shared subsystem. It removes the problem of programming the user interface from each individual application and permits some of the effort of designing tools for human-computer interaction to be amortized over many applications and shared by them. It
387
also encouragesthe design of consistent user interfaces to different systems since they share the user interface component. Conversely, it permits dialogUe independence,where changes can be made to the dialogue design without affecting the application code (36). It is also useful to have a method for specifying user interfaces precisely so that the interface designer can describe and study a variety of possible user interfaces before building one (37,38). Al and Human Factors:Toward"Natural" Human-ComputerInterfaces The recent histories of research in AI and human factors have been interrconnectedin many ways. Each has contributed techniques and ideas to the other, and each has found applications in the other. How will these two disciplines cross paths in the future? The answer is in the domain of understanding the user's cognitive processes. Much work in human factors has been devoted to understanding the mental models and processesby which users learn about, understand, and interact with computer systems. Its purpose is to build systemsthat are easier to learn and use because they fit these processesmore closely. For example, somecommand languages,text editors, and programming language constructs have been improved by studying and using carefully, but not overloading, the capabilities of human shortand long-term memory in their design (39). Much of AI research,too, is devotedto understanding people'scognitive processes.The results of such study can be a better understanding of how people (specifically, computer system users) processinformation-perceive data, focus attention, construct knowledge,remember, make errors. The insights into cognitive psycholory developed by research in both fields can be used to make human-computer interfaces more "natural," to fit their users better. The goal of such work is to produce a more intelligent and natural user interface-not specifically natural langudge, but a naturally flowing dialogue. Such a development will begin with human factors study of good user interface design using insights from cognitive psychology. Appropriate visual and other metaphors for describingoand proxies for manipulating, the objects and activities of the task at hand must then be chosen.AI techniques can permit the system to obtain, track, and understand information about its user's current conceptions, goals, and mental state well beyond current dialogue systems where moq,tof the context is lost from one query or command to the next. The system will use this information to help interpret users' inputs and permit them to be imprecise, vague, slightly incorrect (e.g., typographical errors) or elliptical. This approach, combined with powerful techniques such as direct manipulation or graphical interaction, can produce a highly effective form of human-computer communication. The research in AI pertinent to human-computer interaction has attempted to discover users' mental models, to build systems that deduce users' goals and misconceptions,and to develop some forms of adaptive or personalizable user interfaces.A collection of powerful interaction modalities has also been developed.The challenge for the future is for research into cognitive psychology in both human factors and AI to combine with new interaction and programming techniquesto produce a style of interface between user and computer more closely suited to the human side of the partnership.
3BB
HUMAN-COMPUTERINTERACTION
BIBLIOGRAPHY 1. G. G. Hendrix, E. Sacerdoti,D. Sagalowicz,and J. Slocum,"Developing a natural language interface to complexdata," ACM Trans. DatabaseSys. 3, 105-147 (1978).
22. D. Weinreb and D. Moon, Lisp Machine Manual, MIT Artificial Intelligence Laboratory, Cambridg", MA, 1981. 23. W. Teitelman, Interlisp ReferenceManual, Xerox PARC Technical Report, Palo Alto, CA, 1978. 24. M. Stefik, D. G. Bobrow, S. Mittal, and L. Conway, "Knowledge programming in LOOPS: Report on an experimental course:' AI Mag. 4(3),3-13 (1983).
2. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals,and.Understanding, Lawrence Erlbaum, Hillsdale, NJ, L977. 3. H. Tennant, K. Ross,R. Saenz,C.Thompson,andJ. Miller, Menu- 25. R. G. Smith, G. M. E. Lafu€, E. Schoen,and S. C. Vestal, "Declarative task description as a user-interface structuring mechanism," BasedNatural Language Understanding,Proceedingsof theAssoIEEE Comput. l7(9), 29-38 (1984). ciation fo, Computational Linguistics Conference,Cambridg., MA, pp. 151-157, 1983. 26. B. Shneiderman, "Direct manipulation: A step beyond programming languag€s,"IEEE Comput. 16(8),57-69 (1983). 4. W. C. Mann, An Overview of the PENMAN Text Generation System, Proceedings of the Third National Conferenceon Artificial 27. P. R. Michaelis, M. L. Miller, and J. A. Hendler, Artificial IntelliIntelligence,Washington, DC, pp. 261-265, 1983. gence and Human Factors Engineering: A NecessarySynergism in the Interface of the Future, in A. Badre and B. Shneiderman 5. B. Swartout, The GIST Behavior Explainer, Proceedingsof the (eds.), Directions in HumanlComputer Interaction, Ablex, NorThird National Conferenceon Artifi.cial Intelligence, Washingwood,NJ, pp. 79-94, 1982. ton, DC, pp. 402-407, 1983. 6. J. L. Flanagan, Speech Analysis, Synthesis, and Perception, 28. D. W. Small and L. J. Weldon, "An experimental comparisonof natural and structured query languages,"Hum. Fact.25r 253-263 Springer Verlag, New York, 1972. (1e83). 7. W. A. Lea (ed.;, Trends in Speech Recognition, Prentice-Hall, 29. B. Shneiderman, Software Psychology: Human Factors in NJ, 1980. Englewood Cliffs, Computer and Information Systems,Winthrop, Cambridge, MA, 16(8), 18-25 IEEE Spect. speaks," computer 8. B. A. Sherwood,"The 1980. (1979). B. R. Gaines and M. L. G. Shaw, Dialog Engineering, in M. E. 30. 9. R. A. Bolt, "Put-that-there: Voice and gesture at the graphics Sime and M. J. Coombs(eds.),Designing for Human-Computer (1980). 262-27 0 L4(3), interface," Contput. Graph. Academic Press,London, pp. 23-53, 1983. Communication, 10. A. Lippman, "Movie-maps:An application of the optical videodisc Whiteside, A. Singer, and W. Seymour, "The J. A. H. Ledgard, 31. to computer graphics," Comput. Graph. l4(3),32-42 (1980). natural language of interactive systems," CACM 23, 556-563 11. W. J. Clancey, Dialogue Management for Rule-BasedTutorials, (1e80). Proceedingsof the Sixth International Joint Conferenceon AI,To' 32. J. D. Gould, J. Conte, and T. Hovanyecz, "Composing letters kyo, Japan,pp. 155-161, 1979. with a simulated listening typewriter," CACM 26, 295-308 L2. B. Woolf and D. D. McDonald, "Building a computertutor: Design (1983). (1984). issues,"IEEE Comput. 17(9),61-73 33. T. K. Landauer, K. M. Galotti, and S. Hartwell, "Natural com13. R. Wilensky, Y. Arens, and D. Chin, "Talking to UNIX in English: mand names and initial learning: A study of text-editing terms," An overview of UC," CACM 27,574-593 (1984). cAcM 26, 495-503 (1983). 14. J. Shrager and T. W. Finin, An Expert System that Volunteers g4. J. D. Foley and V. L. Wallace, "The art of graphic man-machine Advice, Proceedings of the SecondNational Conferenceon Artificonversation,"Proc. IEEE 62, 462-47I (L974). cial Intelligence, Pittsburgh, PA, 1982. gb. R. J. K. Jacob,An Executable SpecificationTechniquefor Describ15. P. Hayes, E. Ball, and R. Reddy, "Breaking the man-machine ing Human-Computer Interaction, in H. R. Hartson (ed.), Adcommunicationbarrier," IEEE Comput. l4(3), 19-30 (1981). uances in Human-Computer Interaction, Ablex, Norwood, NJ, 16. E. L. Rissland, "Ingredients of intelligent user interfac€s,"Int. J. 1985. Man-Mach. stud. 2L, 377-388 (1984). 36. H. R. Hartson and D. H. Johnson, "Dialogue management:New 17. J. D. Hollan, E. L. Hutchins, and L. Weitzmar:,"STEAMER: An concepts in human-computer interface development," Comput. interactive inspectable simulation-based training system," AI Suru.(1987)(in press). Mag. 5(2), t5-27 (1984). 87. P. Reisner, "Formal grammar and human factors design of an 18. F. Zdybel, N. R. Greenfeld, M. D. Yonke, and J. Gibbons, An interactive graphics system," IEEE Trans. Soft. Eng.SE'7' 229Information Presentation System, Proceedingsof the SeuenthIn240 (1981). ternational Joint Conferenceon AI, Vancouver, BC, pp. 978-984, g8. R. J. K. Jacob, "Using formal specifications in the design of a 1 9 8 1. human-computer interface," CACM 26, 259-264 (1983). 19. E. Rich, "IJsersare individuals: Individualizing user models,"Int. Bg. R. B. Allen, Cognitive Factors in Human Interaction with ComJ. Man-Mach. stud. 18, 199-2L4 (1983). puters, in A. Badre and B. Shneiderman(eds.),Directions in Hu20. R. A. Bolt, Eyes at the Interface, Proceedingsof the ACM SIGCHI manlComputerInteraction, Ablex, Norwood,NJ, 1982. Human Factors in Computer Systems Conference,Gaithersburg, MD, pp. 360-362, L982. 2I. E. A. Edmonds, Adaptive Man-Computer Interfaces, in M. J. R. Jncos Coombsand J. L. Alty (eds.),Computing Skills and the UserInterNaval Research LaboratorY face, Academic, London, PP. 389-426, 1981.
basic familiarity with vision as provided by the overview entry (seeVision). Integration is the key phrase when describing an IUS. Research on IUSs has experimented with ways of integating IMAGEUNDERSTANDING existing techniques into systems and, in doing so, has discovyou you what process understand by which ered problems and solutions that would not otherwise have Think about the see.Can you determine what is happening and how it is hap- been uncovered.Unfortunately, there are no truly general vipening when you look out the window and notice that your sion systems yet, and much further research is necessaryon best friend is walking toward your door? As you may guess, all aspectsof the problem. Integrated within a single framethe processby which you amived at this conclusion,and which work, an IUS must: causedyou to go and open the door before your friend knocked, is not a simple one. Ancient philosophers worried about this Extract meaningful two-dimensional (2-D) grouping of inproblem. Biological scientists have been studying the problem tensity-location-time values. Images or image sequences in earnest since Hermann von Helmholtz (1821-1894), comcontain a tremendous amount of information in their raw monly credited as the father of modern perceptual science. form. The processof transformation thus begins with the Computer scientists began looking at this problem only reidentification of groups of image entities, pixels. These pixcently in these terms, and the discipline of computer vision is a els are grouped by means of similarity of intensity value, very young one. The miracle of vision is not restricted to the for example, over a particular spatial location. They can eye; it involves the cortex and brain stem and requires interacalso be grouped on the basis of intensity discontinuity or tions with many other specificbrain areas.In this sense,vision similarity of change or constancy over time. The assumpmay be consideredas an important aspectof AI. It is the major tion is that groups of pixels that exhibit some similarity source of input for man's other cognitive faculties. in their characteristics probably belong to specific objects This entry discussesthe aspectsof vision that deal with the or events. Typical groupings are edges, regions, and flow 'understanding' of visual information. Understanding in this vectors. context means the transformation of visual images (the input Infer 3-D surfaces, volumes, boundaries, shadows, occluto the retina) into descriptions of the world that can interface sion, depth, color, motion. Using the groupings of pixels and with other thought processesand can elicit appropriate action. their characteristics, the next major transformational step The representation of these descriptions and the process of is to infer larger groupings that correspondfor, for example, their transformation are not understood currently by the bioto surfaces of objects or motion events. The reason for the logical sciences.In AI, researchersare concernedwith the disneed for inference is that the pixels by themselves do not covery of computational models that behave in the same ways contain sufficient information for the unique determination that humans do, and thus, representationsand processesare of the events or objects;other contraints or knowledge must definedusing the available computational tools. This encyclobe applied. This knowledge can be of a variety of forms, pedia is a collection of such tools and their application, and ranging from knowledge of the imaging process,knowledge this entry assumesthat the reader will refer to other appropriof the image formation process,and knowledge of physical ate entries for details on specifictopics only mentioned here. constraints on the world to knowledge of specific objects Image understanding (IU) is the research area concerned being viewed. Typically, the most appropriateknowledgeto with the design and experimentation of computer systemsthat use is an open question,but the simplest and least application-specific knowledge is preferred, and the current belief is that no application-specificknowledge is required at this or more methods for matching features with models using a stage. control structure. Given a goal, or a reason for looking at a information into unique physical entities. Surfaces Group particular scene,these systems produce descriptions of both connected to form 3-D objects,and changesin trajeccan be the images and the world scenesthat the images represent. tories can be joined to describe motions of specifictypes. The goal of an image-understanding system (IUS) is to Again, the original pixel values do not contain sufficient transform two-dimensional (2-D) spatial (and, if appropriate to information for this process, and additional knowledge the problem domain, time-varying) data into a description of must be applied. This knowledge is perhaps in the form of the three-dimensional spatiotemporal world. In the early to connectivity and continuity constraints,and in many cases mid-seventiesthis activity was termed "sceneanalysis." Other these are embedded in explicit models of objects of the terms for this are "knowledge-basedvision" or "high-level vidomain. sion." Several survey papers have appearedon this topic. The Transform image-centeredrepresentations into world-ceninterested reader is particularly referred to papers by Binford (1), Kanade (2), Matsuyama (3), and Tsotsos(4) as well as the tered representations.To this point the descriptionscreated have all been in terms of a coordinate system that is "imexcellentcollectionof papersin the book ComputerVision Systems (5) and Part IV of the book Computer Vision (6). Those age centered" (also called "viewer centered" or "retinoreaders interested in the biological side of image understandtopic"). A key transformation is to convert this coordinate system to one that is "world centered" (also called "object ing are referred to the excellent book by Uttal, A To^tconorny of centersfl"),that is, the description is no longer dependent Visual Processes(7). This entry assumesthat the reader has a I M A G E A N A I Y S I S . S e e S c e n ea n a l y s i s ;V i s i o n , e a r
fn,"f"? ?Hifi"l r,Hl:I: il'tr#L:1il:;'J,#:H"?#
389
390
IMAGE UNDERSTANDING
on specificlocations in images. This is a crucial step-otherwise, the stored modelsmust be replicated for eachpossible location and orientation in space. Label entities dependingon system goals and world models. It almost never occurs that humans are given a picture or told to look out the window and asked to describe everything that is seen in a high and uniform degree of detail. Typically a sceneis viewed for a reason. What exactly this goal is has direct impact on how the scene is described, which objectsand events are describedin detail; and which are not. Second,scenesare always describedbasedon what is known about the world; they are describedin terms of the domain that is being viewed. A factory scene,for example, is almost never described in terms of a hospital environment-that would not be a useful description (unlessmetaphoric use is the goal!). This knowledge base permits the choice of the most appropriate "labels" to associatewith objects and events of the scene. Labels are typically the natural-langU age words or phrases that are used in the application domain. The processof finding labels and their associated models that are relevant is called "search." Models that are deemedrelevant may be termed "hypotheses." Each hypothesis must be "matched" against the data extracted from the images. In the case where the data is insufficient to verify a model, "expectations" may be generated that guide further analysis of the images. Labels are necessary for communication to other componentsof a complete intelligent system that must use interpreted visual information. The label set forms the language of communication between the vision module and the remainder of the brain. Infer relationships among entities. In viewing a scene,not only are individual objects and events recognizedbut they are also interrelated. Looking out the window, for example, one may see a tree in a lawn, a car on a driveway, a boy walking along the street, or a girl playing on a swing set. The relationships may play an important role in assisting the labeling process as well. These relationships form a spatiotemporal context for objects and events. Construct a consistentinternal description.This really applies to all levels of the transformation processthat is being describedhere. The output of an image-understanding system is a representation of the image contents, usually called an "interpretation." Care is required, however, in defining what an interpretation actually involves. Little attention has been given to this, and current systems employ whatever representation for an interpretation is convenient and appropriate to the problem domain. Basically, an interpretation consists of inferred facts, relationships among facts, and representationsof physical form. Issuesof consistencyand foundations of the underlying representational formalism are important, yet they have not received much attention with the IUS community. The output of an IUS usually takes one of two forms: a graphic rendition of the objects recognized is displayed, perhaps with naturallanguage labels identifying various parts, or textual output describing the characteristics of the objects observed and recognized,is generated. Some systems employ both methods, and the choice dependson the particular problem domain being addressed.
arise as distinct from so-calledlow-level vision or early vision?" and the second is "Is image understanding computationally the same as speech understanding?" The answer to the first question follows" There are two main reasonsfor the distinction: the bottom-up approach (see Processing,bottomup and top-down) embodiedin early vision schemesis inadequate for the generation of complete symbolic descriptions of visual input, and there is a need to describevisual input using the same terminolory as the problem domain. There are several basic realities that impact the design of image-understanding systems.The first is that images underconstrain the scenesthat they represent. The reason is straightforward: In human vision a 3-D sceneundergoes a perspectiveprojection onto a 2-D retina in order to become an image. Thus, much information is lost, particularly depth information. The image is just a snapshotin time of the scene,and both spatial as well as temporal continuity information is lost. Further, the image created is a distorted view of the scenethat it represents.The distortion is not only due to the perspective transformation, but, also,there is noise involved in the image creation process. Finally, a purely bottom-up (or data-directed) approach does not lead to unambiguous results in all cases.A data-directed scheme considers all the data and tries to follow through on every hypothesis generated. Consideration of all data and all possiblemodels in a system of size and scopecomparableto the human visual system leads to combinatorial explosion and is thus an intractable approach. Moreover, it can be nonconvergent, can only produce conclusionsthat are derivable directly or indirectly from the input data, and cannot focus or direct the search toward a desired solution. A general vision systemmust be able to representand use a very large number of object and event models. If the input is naturally ambiguous, & purely bottom-up activation of models will lead to a much larger set of models to consider than is necessary or salient. The working hypothesis of IUSs is that domain knowledge (qv), in addition to the bottom-up processes,can assist in the disambiguation processas well as reduce the combinatorial problem. How that knowledge is to be used is a key problem. The secondquestion that often arises is "Is image understanding computationally the same as speechunderstanding?" On the surface it may seem that the techniques applicable to the speech-understandingproblem are directly applicable to the image-understanding problem. A simplified view of the processleadsto this conclusion.The difspeech-understanding ferencesarise if content rather than only form is considered. Speechunderstanding (qv) may be regarded as the recognition of phonemes (qv), the grouping of phonemes into words, the gfouping of words into sequences,the parsing of word sequencesinto sentences,and the interpretation of the meaning of the sentences.Indeed,in a paper by Woods(8) the similarity is presentedin somedetail. In that paper Woodsspeculateson the applicability of the HWIM architecture for the image-understanding problem and concludesthat it may be worth the attempt. However, a closer examination of the differencesbetween speechand image interpretation tasks reveals that the image-understanding task is significantly different and more difficult. The similarities between the speechand image tasks are many. Both domains exhibit inherent ambiguity in the signal, and thus signal characteristics alone are insufficient for interpretation. Reliability of interpretation can be increasedby the IUS describing an when arise questions always Two basic to the uninitiated. The first question is "Why did this field use of redundancy provided by knowledge of vocabulary, sYtr-
IMAGE UNDERSTANDING
tax, semantics, and pragmatic considerations; and both domains seem to involve a hierarchical abstraction mechanism. The differences include the facts that: (a) speechexhibits a single spatial dimension (amplitude) with a necessarytemporal dimension, whereas images display two spatial dimensions as well as the temporal dimension; (b) a speechsegment has two boundary points, whereas an image segment, as a spatial region, has a large number of boundary points; (c) speechhas a reiatively small vocabulary that is well documented (e.9., in dictionaries) and images have much larger, undocumentedvocabularies; (d) grammars have been devisedfor languages,but no such grammars exist for visual data; (e) although speech differs depending on the speaker, images vary much more becauseof viewpoint, illumination, spatial position and orientation of objects,and occlusion; (f) speechhas a convenient and well-acceptedabstract description, namely, letters and words, whereas images do not; and (g) the speechsignal is spatially one-dimensional, and when sampled by the ear, there is no equivalent of the projection of a 3-D sceneonto a 2-D retina. Thus, it seems that the image-understanding situation is radically different, particularly in combinatorial terms, and it is for this reason that very different solutions have appeared. and Control Requirements Representational This section attempts to summarrze the experienceof the IU community in the design and implementation of IUSs with a statement of componentscurrently believed to be necessaryfor general vision systems. It should be clear that this is not a formal definition of an IUS in a strict sense;many of the requirements are really topics for further research. The section doesnot contain specificreferences;instead, it refers to other entries in this encyclopedia.Specificsolutions and vision systems and how they deal with each of these requirements appears in a subsequentsection. RepresentationalRequirements.Many IUSs distinguish three levels of representation, a low level, an intermediate level, and a high level. Theselevels do not necessarilyrefer to particular types of formalisms but rather simply point out that in the interpretation process,I transformation of representations into more abstract ones is required and that typically three levels of abstraction are considered.These levels can usually be characterized as follows: Low level includes image primitives such as edges,texture elements, or regions; intermediate level includesboundaries,surfaces,and volumes;and high level includes objects,scenes,or events.There is no reason why there should be only three levels, and in fact, the task of transforming representationsmay be made easier by considering smaller jumps between representations.It should be clear in the descriptionsthat follow which level or levels are being addressed. Representationof Prototypical Concepfs. A prototype provides a generalized definition of the components,attributes and relationships that must be confirmed of a particular concept under consideration in order to be able to make the deduction that the particular conceptis an instance of the prototypical concept. A prototype would be a complex structure spanning many levels of description in order to adequately capture surfaces,volumes, and other events, to construct discrete objectsinto more complex ones,to define spatial, temporal, and functional relationships for each object, and to assert
391
constraints that must be satisfied in order for a particular object in a sceneto be identified. ConceptOrganization Three kinds of abstraction are commonly used, namely, feature aggregation, called "PART-OF ", conceptspeciahzatron,called "IS-A", and instantiation, called "INSTANCE-OF". The PART-OF hierarchy can be considered as an organization for the aggregation of concepts into more abstract ones or as an org antzation for the decomposition of conceptsinto more primitive ones, depending on which direction it is traversed. The leaves of the PART-OF hierarchy are discrete conceptsand may represent image features. It should be pointed out that concept structure does not necessarily mean physical structure only, but similar mechanisms with different semantics may be used to also represent logical components of concepts.IS-A is a relationship between two concepts,one of which is a specialization of the other (or, in other words, one IS-A generalization of the other). An important property of the is-a relationship is inheritance of properties from parent to child concept, thus eliminating the need for repetition of properties in each concept.Finally, the relationship between prototypical knowledge and observedtokens is tho INSTANCE-OF relationship. These three relationships are typically used in conjunction with one another. Consideration of the semantics of these relationships is important, and such issues are discussed elsewhere (see Inheritance hierarchy). SpatialKnowledge. This is perhaps the main type of knowledge that most vision systems employ. This includes spatial relationships (such aS "above," "betwe€r," "left of"), form information (points, curves, regions, surfaces,and volumes), Iocation in space, and continuity constraints. Much of this is described elsewhere (see Reasonitg, spatial). Spatial constraints for grouping have appeared in the Gestalt literature in psychology and include the tendencies to group using smoothnessof form, continuity of form, spatial proximity, and symmetry. The PART-OF relationship is usedto aggregatesimple forms into more complex ones. Properties or attributes of spatial forms are also required, namely, size, orientation, contrast, reflectance, curvature, texture, and color. Maps are common forms of spatial knowledge representation, particularly for vision systemsdealing with domains such as aerial photographs or navigation tasks. Temporal Knowledge.Information about temporal constraints and time is not only necessary for the interpretation of spatiotemporal images but can alse provide a context in which spatial information can be interpreted. Time can provide another sourceof contraints on image objectsand events.Temporal constraints for motion groupihgs, in the Gestalt sense,include the tendencies to group using similarity of motion, continuity or smoothnessof motion, and path of least action. The basic types of temporal information include time instants; durations and time intervals; rates, such as speedor acceleration; and temporal relations such as "before," "during," or "start." Each of these has meaning only if associatedwith somespatial event aswell. PART-OF and IS-A relationships can be used for grouping and organizing spatiotemporal concepts in much the same fashion as for purely spatial concepts. A difficulty with the inclusion of temporal information into an IUS is that an implicit claim is made of existential dependency. That is, tf a relationship such as "object A appears before object B" is included in a knowledge base,and object B is observed,then according to the knowledge base, it must be
392
IMAGE UNDERSTANDING
true that object A must have appearedpreviously. This prob- scribedelsewhere(seeA* algorithm; Beam search;Constraint lem is further describedelsewhere (seeReasonitg, temporal). satisfaction; Rule-based systems; Search, best-first; Search, The Sca/eProblem. It has been well understood since the branch-and bound; Search,depth-first). A different categorrzaearly days of computer vision that spatial and spatiotemporal tion of search types, and one that is more frequently found in events in images exhibit a natural "scale." They are large or the IUS literature, is in terms of knowledge interactions. The small in spatial extent and/or temporal duration, for example. following schemesare describedbelow: model-directedsearch, This problem is different than the image resolution or coarse- goal-directed search, data-directed search, failure-directed ness problem, and there is no relationship between the two. search, temporally-directed search, hierarchical models, hetThis is dealt with in more detail in another entry (seeScale- erarchical models, blackboard models, and beam search.The choice of search method employed depends on a number of spacemethods),and it is important that an IUS deal with this as well. There are implications not only for the design of the factors, including the form of the representation over which image-specificoperations that extract image events (a given the search is to be performed, the potential complexity proboperator cannot be optimal for all scalesand thus is limited for lems, and the goals of the searchprocess. Saliency of a model dependson the statement of goals for a particular range of events that it detects well) but also for the choice of representational and control scheme.If spatio- the search process.The search can be guided by a number of temporal events require representation at rnultiple scales,the trigger features, for example, and any modelsthat are encounmatching and reasoning processesmust also be able to deal tered that embody those features are selected.The selectionof with the multiple scales.The unification of information from a model for further considerationis termed "hypothesisactivation." A searchprocessthat leads to a very large set of active multiple scalesinto a single representation is important. hypotheses is not desired since the objective of search is to Description by Comparisonand Differentiation Similarity measurescan be used to assist in the determination of other reduce the spaceof models. Matching and HypothesisTesting.Once a set of active hyrelevant hypotheseswhen matching of a hypothesisfails. This potheses has been determined, further consideration of each growth as the hypothesis space of of is useful in the control hypothesis takes place. The first task to be carried out is to well as for displaying a more intelligent guidance schemethan match the active hypothesis to the data. It is important to note random choice of alternates. The similarity relation usually relates mutually exclusive hypotheses.The relation involves that data here do not necessarily only mean image-specific the explicit representation of possible matching failures, the information. Matching is defined as the comparison of two representationsin order to discovertheir similarities and difcontext within which the match failure occurred,binding information relevant to the alternative hypothesis, as well as ferences.Usually, a matching processin vision comparesrepthe alternate hypothesis. Thus, the selectionof alternatives is resentations at different levels of abstraction and thus is one of the mechanisms for transforming a given representation into guided by the reasons for the failure. a more abstract one. The result of a match is a representation lnferenceand Control Requirements.A brief note is in order of the similarities and differencesbetween the given representations and may include an associatedcertainty or strength of before continuing this section on the difference between inferin the degree of match. belief ence and control, particularly since in some works they are The specific matching methods used depend largely on the process of deriving refers the to used as synonyms. Inference representational formalisms that are used to code the data new, not explicitly represented facts from currently known facts. There are many methods available for this task, and being compared.They can range from image-image matchitg, they are discussedin detail in other entries (see Inductive subgraph isomorphisffis, or shape matching to matching only selectedfeatures with a model, such as identifying structural inference; Inference; Reasoning entries). Control refers to the Matching processes,particularly ones that incomponents. and search, inference, many the of processthat selectswhich matching techniques should be applied at a particular stage of volve matching images directly, are usually very sensitive to processing. The remainder of this section briefly discusses variations in illumination, shading, viewpoint, and 3-D orienthese issues and others in roughly the order that a designer tation. It is preferred, therefore, to match abstract descriptions of a typical image-understanding system would confront such as image features against models in order to overcome some of these problems. However, for 3-D models it is not them. always the casethat image features can trigger proper models Searchand HypothesisActivation. The basic interpretation for consideration. Rather, the process must also involve the Perspecparadigm used in IUSs, as is developedin Historical of the projection of the model that can be determination sevare There test." and "hypothesize is tive and Techniques, (see Pattern matching). Matchitg; beginning matched in turn described are these and this, to aspects eral Generationand Useof Expectations.Expectationsare beliefs with search and hypothesis activation. A general vision system must contain a very large number of models that repre- that are held as to what exists in the spatiotemporalcontext of sent prototypical objects, events, and scenes.It is computa- the scene.The conceptof expectation-directedvision is a comtionally prohibitive to match image features with all of them, mon one that appears in most systems. Expectations must going from and therefore, search schemes are employed to reduce the bridge representations in a downward direction, commonly is term a "Projection" appearance. image to models models number of modelsthat are considered.Only the salient of the representations between connection the denote to used salient are which of needbe considered,and the determination the is, for It example, domains. differing in but concept same methof search The catalog problem. is termed the "indexing" apits actual and object prototypical ods includes breadth first, depth first, hill climbing, best first, relationship between a that required is mechanism a Thus, image. pearance in an dynamic progfamming, branch and bound, A*, beam search, coninformation gatherirtg or constraint satisfaction, relaxation takes objectposition, lighting, observermotion, temporal internal repan create to into account viewpoint and tinuity, deall are These production systems. and processes, labeling
IMAGEUNDERSTANDING 393 resentation of an object's appearancein an image. Complete projections may not always be necessary,and in most casesit s€€rn-sthat expectations of important distinguishing features or structures are sufficient. The most common use of expectations is in directing image-specificprocessesin the extraction of image features not previously found (seealso Parsing expectation-driven). Changeand Focusof Attention. Even the best of search and hypothesis activation schemeswill often lead to very large hypothesissets.Computing resourcesare always limited, and thus the allocation of resources must be made to those hypothesesthat are most likely to lead to progressin the interpretation task. This can be done in a number of ways, including the use of standard operating system measures for resourceallocation, as were used in an augmentedfashion in HEARSAY (9), ranking hypothesesby means of certainty or goodness-of-fitestimates, or by considering the potential of a hypothesis in conjuction with the expensethat would be ineurred in its evaluation. These best hypotheses, which are usually those that are confirmed or virtually confirmed, are also termed "islands of reliability." Not only is it imporant to determine a focusof attention but it is also important to determine when to abandon a current focus as unproductive. The change of focus can be determined in one of two ways: the focus could be recomputedeach time it was required or it could remain fixed and only change when circumstances necessitated the change. The latter is clearly more desirable; yet mechanisms for its implementation are few. It should be pointed out that a focus of attention doesnot necessarilyrefer only to a hypothesis set but may also refer to a region on an image or a subset of some representation. Certainty and Strengthof Belief. The use of certainty measures in computer vision arose due to two main reasons:biological visual systems employ firing rate (which may be thought of as a strength of response),as the almost exclusive means of neural communication, and computational processes available currently are quite unreliable. This strength of responsemay be thought of as brightness for simplicity. Lateral inhibition (one of the processesof neural communication), whereby neurons can inhibit the responseof neighboring ones based mainly on magnitude of the firing rate, is a common process,if not ubiquitous. It motivated the use of relaxation labeling processesin vision. In relaxation, the strength of responseis termed "certainty," and is often used as a measureof reliability of a correspondingdecisionprocess,for example,the goodnessof fit of a line to the data. Since visual data are inherently noisy due to their signal nature, measuresof reliability are important in the subsequentuse of information derived using unreliable processes. Yet another use of certainty is in hypothesis ranking. The ranking of hypothesesis useful not only for the determination of a focus of attention but also for determining the best interpretation. Most schemes introduce some amount of domain dependenceinto the control structure, and this seemsto lead to problems with respect to generality. An important problem is the combination of certainties or evidence from many sources. Inferenceand Goal Safisfaction.Inference (qv) is the process by which a set of facts or models is used in conjuction with a set of data items to derive new facts that are not explicitly present in either. It is also called "reasoning" (qt). The many forms of reasoning include logical deduction, inheritance, de-
fault reasonirg, and instantiation. These are discussed in length in other entries (seeInheritance hierarchy; Reasonitg, default). However, it should be pointed out that the vision problem adds a few different wrinkles to this task that may not appear in many other reasoning processes.It is not true in general that the data set is complete or correct, and processesthat can reliably draw inferences from incomplete data are required. Second,sincevision is inherently noisy and as describedabove requires reliability measures,inference schemesshould also permit reliability measures to be attached to derived conclusions. Finally, since the processof vision involves a transformation from images to a final description through many intermediate representations, a reasoning schememust be able to cross between several representations. Most IUSs are not explicitly driven by a goal when interpreting images. They typically have implicit goals, such as to describe the scene in terms of volumetric primitives, to describe everything in as much detail as possible,or to describe the scenein the most specificterms possible.Human vision usually does involve a goal of some kind, and the area of AI that is concernedwith how to achieve goals given a problem is called "planning." Systemsthat can plan an attack on a problem must contain meta-knowledge, that is, knowledge about the knowledge that the system has about the problem domain (see Meta-knowledge, meta-rules, and meta-reasoning).The meta-knowledgeallows the system to reason about its capabilities and limitations explicitly. Such systems have a set of operations that they can perform, and they know under which circumstances the operations can be applied as well as what the effects may be. In order to satisfy a goal, a sequenceof operationsmust be determined that, in a stepwisefashion, will eventually lead to the goal. Attempts to find optimal plans usually are included in terms of min imization of cost estimates or maximization of potential for success.In vision the sequence of operators may involve image feature extraction, model matchirg, and so on (seePlanning). HistoricalPerspectiveand Techniques The historical development of the techniques of image understanding provides an interesting reflection of the major influencesin the entire field of AI. The emphasisin the IU community has been primarily in the control structure, and this discussionbegins with the sequenceof contributions that led to the current types of control mechanisms.Rather, little emphasis has been placed on integrating the best of the early vision schemesinto IUSs, and one notices the range of weak solutions to the extraction of features. Little discussionis thus provided; however, in the description of control structures for specific systehs, appropriate notes are made. Control Structures.The heart of virtually all IU systemsis the control structure (qt). Features universal to all working IUSs are cyclic control involving feedback (see Cybernetics) and the requirement of specific solutions to the problem of uncertainty. This survey of the development of control structure highlights only those systems that require and use explicit models of objectsor events of the domain. Other important contributions that impact IUSs are allocated their appropriate historical due but are not consideredpart of the direct line of development. Finally, with two exceptions,the
394
IMAGEUNDERSTANDING Projectmodelsinto imagespace
Interpretation
\3:3i;? Model activationvia image features
\
\
Verification
/\-
t\ t\
Extractfeaturesbased on interpretation
Extractmost obviouscontours
Extractline drawing \rmage
Image
Figure 1. The controlstructureof Roberts(L2).
hypothesis of Marr (10) and the intrinsic image concept of Tenenbaum and Barrow (11), only implemented and tested systems are described in this section. Developingthe Cycleof Perception.Roberts was the first (in 1965) to lay out a control scheme for organizrng the various componentsof a vision system (L2). They are shown pictorially in Figure 1. He defined several of the major processingsteps now found in all vision systems:extract features from the imoge,in his case,lines; activate the relevant modelsusing those features; project the model's expectations into image space; and finally, choosethe best model dependingon its match with the data. This is not a true cycle, and becauseof the lack of feedback,it was very sensitive to noisy input. In 1972, Falk realized that Roberts's work involved an assumption that would rarely be satisfied in real application domains, namely, that of noise-freedata. If noisy data were to be correctly handled, enhancements to Roberts's processing sequencewere required (13). In Figure 2 Falk adds a new component,the fill in incompletenessstep,and closedthe loop, allowing partly interpreted data to assist in the further interpretation of the scene.His program was called INTERPRET. Shirai, in 1973,defined a system for finding lines in blocks world scenesand interpreting the lines using models of line junctions and vertices for polyhedral objects(14).Thus, he was able to use interpreted lines as guidance in subsequent line finding. He first extracted features from a reducedimage, thus smoothing out some of the noise and smaller detail features, and then used these gTossfeatures in subsequent guidance. Shirai's cycle is shown in Figure 3. Shirai, however, was not the first to employ reducedimages in a preprocessingstage.Kelly, in 1971,had the intuition that if an image that was reduced in size was processedinitially,
Segmentation
\
\
Figure 3. The controlstructureof Shirai (14).
instead of the full-size image, much of the noise could be reduced,and the resulting edgesof lines could be used as a plan for where to find edgesand lines in the full image (15). This was applied to the domain of face recognition. Kelly reduced an image to 64 x 64 pixel size, thus minimizrng noise effects, and then locatedthe outlines of the faces.Thoseoutlines then formed a plan for the full-size image, limiting the searchspace for the detailed facial outlines. However, Kelly's system contained no models and was a sequential two-step process. Several incarnations of the cycle appeared subsequently, and one example of note is presented here, namely, the L977 work of Tenenbaum and Barrow in their interpretation-guided segmentation(IGS) program (11).Their version of the cycle is shown in Figure 4. IGS experimented with several types of knowledge sourcesfor guidance of the segmentation process: unguided knowledge, interactive knowledg., both user driven and system driven; models; and relational constraints. They concludedthat segmentation is improved over unguided segmentation with the application of knowledge, and with little computational overhead-the more knowledge, the faster the filtering process.
All adjacentinterPretation setsdisjoint?
Failure
Success
Terminate
Performsafest merge
Initial partition lmage
Figure 2. The control structure of Falk (13).
{<-
lmage
segmentation control structure Figgre 4. The interpretation-guided of Tenenbaum and Barrow (adapted from Ref. 11).
IMAGEUNDERSTANDING 395 Model verification
\
Perhaps the most elegant protrayal of the cycle of perception, and also the coining of the term itself, is due to a 1978 contribution by Mackworth (16) and is shown in Figure 5. This basic cycle appears,in a variety of forms, in virtually all IUSs that have appeared since. Kanade's 1981 modification of the cycle (17) explicitly included the separation of scenedomain and image domain considerations, a requirement that was first pointed out in l97L by Huffrnan (18) and also independently by Clowes(19).This refers to the differencebetweenan object's 2-D appearance in an image versus an object's 3-D representation in the world. Figure 6 portrays Kanade's cycle. Tsotsosand colleaguesfurther elaboratedthe model for the ALVEN system by specifying exactly at which points of the cycle the different hypothesis activation (or indexing) methods are apptied (20). In addition, since his task was to understand visual motion, the element of time was also added. To this point in the development of the cycle of perception, although use had been made of different representational tools for organizing models,ho explicit considerationhad beengiven to how to best take advant age of the organi zation. Tsotsosused the commonorgantzattonal tools of spectalization(IS-A), decomposition (PART-OF ), and SIMILARITY (mutual exclusion of models, or winner take all) and added temporal precedencein order to organize a large set of models. His cycle is shown in Figure 7. Definitions of the different hypothesis activation methods driven by knowledge organization relationships were also provided by Tsotsos.The methods are briefly summarized below. Coal-DirectedActivation.The goal of the vision system is to find the most specific, or specialrzed,description in the system's repertoire for the image contents. The specialization of hypothesesinvolves top-down traversal, from general to specific, of an IS-A hierarchy, moving downward when concepts are verified. Verification of an IS-A parent concept implies
\ \
/
Model elaboration
Model invocation
./
\ \
Cuediscovery
,/
,/
tI I
lmage Figure 16).
5. The cycle of perception of Mackworth
(adapted from Ref.
Models
Scenedomain cues
lnstantiated model
A I
II
t
I
I
Picture interpretation
Picturedomain cues
lmage
Figure 6. The control structure of Kanade (adaptedfrom Ref. 17).
Updatecertainties Rankhypotheses Createinstances
/
Currentbest hypotheses
complet Refinements
Refined hypotheses \ I \ E l a b o r a t eT , - O - O hypothesesl'''""
\ \ \ \\
T.D.A
\
\ Intermediate hypotheses
A
\ Hypothesis-data matching
M-D.A
./
SpecializeI C_O_n hypothesesI I Initial hypotheses
Predictive hypotheses
Failure
Success
Failure
/
-{€ D.D.A
Success-
Extract tokens
,Jltxt"?. Figure 7. The control cycle for motion understanding of Tsotsos(adaptedfrom Ref. 20). Hypothesis activation types: DDA, data-directed activation; GDA, goal-directed activation; FDA, failuredirected activation; TDA, temporally directed activation; MDA, model-directed activation.
396
IMAGEUNDERSTANDING
that perhaps one of its IS-A children applies, although the that Marr did not propose a cycle of processingand that the confirmation of a concept implies that its IS-A parents must 3-D sketch represented all possible information derivable dialso be true. Multiple IS-A children can be activated, but a rectly from the image. In general, this is not true, and a more efficient schemewould be to activate one of the children schemewithout feedback is insufficient. In several models the issue of feedbackand the relationship if att children form a mutually exclusive set, or one from several such sets, and then allow failure-directed search to take between explicit models and their appearancein an image was mentioned.The projection of hypothesesinto image spaceis a over. difficult problem for which few solutions exist. As pointed out inModel-DirectedActivation.The elaboration of models previously, expectations have been used in most IUSs since too This hierarchy. PART-OF the volves top-down traversal of implies a constrained form of hypothesizeand test for compo- Kelty and Shirai's work. Expectationswere used in the SEER nents of classesthat reflect greater resolution of detail. Move- system of Freuder (22) to guide region growing and identificament down the PART-OF hierarchy forces activation of hy- tion of specificportions of a hammer. A thorough understanding of human body motions and a model of the allowed joint potheses corresponding to each of the components of the configuration enabled the design of a constraint propagation PART-OF parent hypothesis. poData-DirectedActivation.The PART-OF heirarchy can also network that integrated current motions and known body locations producing expected ones, hypothesized with sitions be traversed bottom up in aggregation mode. Bottom-up trajoints Qil. An interesting conclusion versal implies a form of hypothesizeand test, where hypothe- in 3-D for given body use of expectations is that the inforsystem's ALVEN from the compoas them have may that ses activate other hypotheses mation contained in an is-a hierarchy of conceptscan be exnents. Activation Failure-directedsearchis along ploited for the generation, verification, and modification of exFailure-Directed pectations of actual object appearancein a sequenceof images. the SIMILARITY dimension. Typically, several SIMILARITY If expectations fail, movement up the hierarchy to a more resulthe and given hypothesis, a for links will be activated general concept provides the next best alternative consistent set, discriminatory a as is considered hypotheses tant set of with the semantics of the interpretation. However, the key that is, at most, one of them may be the correct one. SIMILARmodels ITY interacts with the PART-OF relationship in that excep- problem of relating 3-D object viewpoint-independent good examA one. outstanding is an still ones image-specific to by handled are tions raised that specify missing components (24). Given system ACRONYM is the topic this ple on work of contains that hypothesis parent, the the hypothesis' PART-OF u g.o*etric object model and viewpoint and illumination, ACthe context within which the exception occurred. in the image. That TemporallyDirectedActivation Temporal searchis a special RONYM predicts partial object appearance identification are for required features important the only is, case of model-directed search along the PART-OF dimension. exis so computationally problem whole the predicted since as such events, temporal Concepts may represent compound pensive. In a events. overlapping or events, sequences,simultaneous HeterarchicalModels. A heterarchical model of vision is one sequenceeach element of the sequencehas a PART-OF relamade up of a collection of separatemodules,each module perit is tionship with the event. Thus, on activation of the class, some specialrzedtask and each communicating with forming same at the above, meaninglessto activate all parts, as stated all others as appropriate. Freuder was perhaps the first within time. Activation of the parts only occurswhen their particular the vision community to apply such an idea in his system for temporal specificationsare satisfied. recognizing tools called SEER (22). "Active knowledge" was viin early Marr, usually credited with contributions only of his term for the use of procedural knowledge about both the levels high for the mind in sion, also had specificprocesses well as general knowledge in directing the vision presentedin his Lg82book Vision QI). It would indeed tools domain as Knowledge was represented as semantic networks have been interesting to have seenan attempt at implementa- control. (qt). represented objects and links represented how Nodes viewed tion and testing of his ideas. Marr, in his own words, ,,recognition as a gradual processthat proceedsfrom the gen- objects help establish one another. Each object encoded procedural knowledge, and together the objects formed the eral to the specific and that overlaps with, guides, and conof modules, each communicating with other relevant strains the derivation of a description from the image." He set proposedthat a catalog of models be constructed using volu- modules. Another form of heterarchy is the "demon" scheme,where meiric primitives and organi zedusing a specialtzation hierarknowledge source continuously monitors a database of chy OS-A) as well as a decompositionhierarchy (PART OF). each about the images and of models to seeif its prereqModels were selectedbasedon the distribution of components assertions present. If found, the demon then carries out some are along principal axes of the derived volumetric primitives rep- uisites may involve changesto the database.Balder (25) thal actions indexing resented in the B-D sketch. He proposed three used a demon model for event analysis, and eachdemonrepreschemes:the primary one was the "specificity index," traversal sentedthe knowledge required to recognuzea particular event from generaito specific models (goal-directed);the secondary (seeDemons). ones, used in support of the first, were the "adjunct index," type other specificversions of heterarchy are presentedby Two (model-directed), traversal from models to model components ,,parent index,,' traversal from model componentsto par- Nevati a (2G)rtta Levine Q7). They provide two other views for and the composition of the collection of modules. They are preentmodels(data-directed).ThemodelprovidedrelativeorienAn sented in Figures 8 and 9. tation constraints used to determine absolute orientation. perhaps the main conclusion that can be drawn from the objectand image spaceprocessorthen related image-centered heterarchical models is that as the number of interacting modcentereddescriptions and computedrelative lengths of compoules grows, the communication and organizatton problems innent axes. This new information can be used to disambiguate note creasedramaticallY shapesat the next level of specificity. It is interesting to
IMACE UNDERSTANDINC Symbolic description
397
Sensor
M
\/ /
\/
Feature extraction
Semantic interpretation
\ Preprocessing
t
Domainindependent
I
lmage
lmagefeatures (edges,regions)
Datedriven
Figure 8. The controlstructureof Nevatia (adaptedfrom Ref. 26).
HierarchicalModels. Hierarchical models are comprisedof a specializedcollection of modules, but the communication pathways are restricted, reflecting an ordering of both processing stepsand levels of abstraction in the computation.One of the best known is due to Barrow and Tenenbaum (28) and is diagrammed in Figure 10. This model reflects a major contribution in representation, namely the idea of "intrinsic images."This is describedin Spatial Relationships,below. Another important hierarchical model that elaborates on Barrow and Tenenbaum's model is that of the VISIONS system (39). This version fills in several details regarding communication and control acrossthe multiple levels of representation that are present in all image-understandingsystems. Yet another specific type of hierarchical model emerged, conforming with the basic definition and philosophy but also attempting to provide a solution to the spatial scale problem. Uhr called these models "recognition cones"in his L972contribution (29), and they have also been termed "pyramid models" [seethe book by Tanimoto and Klinger for a collection of papers on this topic (30)1.The major distinctions come from the facts that each layer of the cone computesimage properties at successivelycoarser resolutions and each computation communicates only with computations occurring in layers immediately above or below or with computations within the layer. An unfortunate result of this idea is the linking of spatial scale with resolution; as noted earlier, the optimal scale for the
Intrinsicimages (distance, reflectance, orientation, etc.)
Symbolic
Segmentation D o m a i ns p e c i f i c
Interpretation Goal driven
Figure 10. The control structure of Barrow and Tenenbaum (adapted from Ref. 28).
detection of specificspatial forms has little relationship to image resolution. BlackboardModels. Blackboard models (qv) were borrowed for use in vision from the HEARSAY work in speechunderstanding. In fact, they are a specificform of heterarchy in that each knowledge source (module) can communicate with any other. Knowledge sources are organized hierarchically. The major difference and improvement over the versions of heterarchy that were presented earlier is that the communicaIntermediate-level Knowledge tion occurred through a global data structure called a blackprocessor base board rather than the communication pathways being fixed. The VISIONS system (31) incorporates this idea as well as pyramid processes.The knowledge sourcesdefined are inferH i g h - l e v e l encenet, 2-D curve fitting; 2-D shape;occlusion; specialattriprocessor bute matcher; 3-D shape;perspective;horizon; and object size. The VISIONS structure is shown in Figure 11. The advantages of blackboard models include their modularity; however, their utility in speechhas not been repeated in vision, primarily becauseof the important differencesbetween speechand vision. \ \ Beam Models. Once again, speech understanding influ\ encedthe design of a vision system. In this casethe HARPY lmage system (32) influenced the 1980 design of the ARGOS system Figure 9. The control structure of Levine (adapted from Ref. 27). of Rubin (33). Rubin's work is interesting becauseit was the
t t
IMAGE UNDERSTANDING
398
Modelsearchspace
Control M o d e bl u i l d e r MSS:E
MSS:V
SegmentationInterpretation processes processes #
-->
Schemaclasses ProcessinB
Objects
m.r'
Boundary analysis
lt\ d-|\
+
{
Objectclasses Volumeclasses
Curvesmoothing andfitting
Surfaceclasses 3- D-shape, manipulation
Regionclasses
Merged representations
Segmentclasses Vertexclasses
Region analysis
Feature extraction
STM ic lmage-specif model
LTM General knowledge
Representation
I lTnaaow
@ @ Knowledge sources
lmage
Figure 11. The Blackboard structure of VISIONS (adaptedfrom Ref. 31).
only attempt to usebeam search(qt) (alsocalled locussearch)in vision. Beam search producesa "beam", z pruned search tree that contains a list of near-miss alternatives around the best path. Both signal and model characteristics are included in this consideration. The scheme as realized in ARGOS is not one that has promise for general-purpose vision systems. ARGOS looked at images of downtown Pittsburgh, attempting to classify regions as sky, buildings, or mountains, for example. The network over which the beam search was performed was a large one whose nodeswere pixels or image regions and whose arcs were spatial relations. Rule-BasedApproaches.Rules (of the if (premise) then (action) form) were introduced into vision at about the same time that they appeared in production systems. The L974 in' troduction is due to Baird and Kelly (34), who claimed that context is a necessary consideration in the development of their paradigm for semantic picture recognition. They used inference rules to incorporate contextual considerations, and premises were features extracted from images. More recently, perhaps due to the successof the expert systems approach, ..rn"pl other vision systems have appeared that utilize rulebased knowledge and reasoning (seeRule-basedsystems). Typically, pure data-directed reasoning is insufficient as described above, and rules are fired in both goal-directed (backward-chaining) and data-directed (forward-chaining) modes (see Processing, bottom-up and top-down). Rules are used to represent various facts about images. For example, in the 1984 system of McKeown and co-workers, SPAM (35),
rules are used to encodespatial relationships among entities in the scene as well as to encode constraints on sizes and shapesof visual entities. Rule-basedreasoning is used to provide the system with the best next task basedon the strength of expectations as well as for the generation of expectations. Other IUSs that employ rule-basedreasoning are the systems of Nagao and Matsuyama (36), Ohta (37), Ferrie and co-workers (38), and Hanson and co-workers(39). RepresentationalFormalisms.The development of representational tools used in the IUS community mirrors quite closely developmentsin other subdisciplinesof AI. The use of heuristics (qv) reflects the power-basedera of AI. The appearanceof semantic networks (qv) in the memory-modeling community and their use by the knowledge representatiott (qv) and language-understandingcommunities (seeNatural-langu ageunderstanding) influenced their use in IUSs. Blackboards and beam searches were developed for the major speech-understanding systems (HEARSAY and HARPY, respectively) and subsequently appearedin vision systems.Minsky's frame theory (qv) (40), developedwith a specifi.ceye toward vision, was used in several vision systems.The successof expert systems prompted the use of rule-based approachesin IUSs as well. SpatialRepresentations. Vision systems require the explicit representation of points, curves, surfaces,and volumes. There are a number of schemesthat are employed, namelY, points, line segments, splines, fractals, and generalized cylinders, among others. As an example, the VISIONS system employs a
IMACE UNDERSTANDING
representation of 3-D complex surfaces and 2-D curves based on B-splines and surface patches and also makes use of the part-of and instance-of relationships in building complex structures. ACRONYM uses a generalized cylinder representation in conjunction with part-of and is-a organizations. There is no real consensusyet on what constitutes an adequate set of primitives for spatial representations.Discussions and examples of two points of view can be found in Marr and Nishihara (10), proponents of generalized cylinders, and in Pentland (41), a proponent of a fractal-basedapproach. Much work in representation and reasoning about space has appeared outside the vision community. Comparison of object location and the representation of the corresponding relations is consideredin Freeman (42). Kuipers (43) describes his TOUR model for route-solving problems and discussesthe spatial knowledge relevant to that task. McDermott and Davis (44) also include a representation for spatial knowledge and a scheme for reasoning about it. However, both Kuipers and McDermott and Davis were concernedwith spatial route-finding tasks, and this is not directly comparableto the reasoning required for vision systems. Spatial representations are covered in other entries (seeReasonitg, spatial; Representation, analog). The representation of maps is quite straightforward and does not require further elaboration. The interested reader should consult papers on the MAPSEE (45) or HAWKEYE (46) systems. Two specific representations can be considered as major contributions, namely, the schemesof Marr and of Barrow and Tenenbaum (28). Marr proposeda progression of representations that he termed the "primal sketch," the 2Vz-D sketch, and the 3-D sketch. The primal sketch represented information about the 2-D image, primarily intensity changes and their organization. The Zt/z-D sketch represented the orientation and depth of surfaces and discontinuity contours. Finally, the 3-D sketch represented shapes and their spatial organization in an object-centeredmanner. In contrast, Barrow and Tenenbaum claimed that the appropriate intermediate-level representation consistedof a number of separatefeature maps, all image centered, that perhaps interact in order to be computed unambiguously. These features include surfacediscontinuities, range, surface orientation, velocity, and color. Heurisfics.The use of heuristics (qt) appearsin most vision systemsin one form or another. Systemsthat used only heuristics, however, appearedonly during the power-basedera of AI and do not really qualify as IUSs using the definition requiring explicit object or event models. Those systems typically deal with blocks-world scenes. SemanficNefworks. Semantic networks (qv), that is, graph structures whose nodesrepresent objectsor events and whose arcs represent relationships between the objects and events, have made an important impact on IUSs. Two examples are the work of Levtne (27) and that of Badler (25). Levine's system deals with the interpretation of natural scenes,and he constructs a knowledge base with nodes representing entities such as sky, road, and house.Arcs represent spatial relations, such as left-of, above,or behind. Badler used the same idea but represents events as well as objectswith nodes,whereas arcs represented spatial as well as temporal relations. Frames.Minsky's frame theory (qv) (40) was one of the most influential works within the representation community, and since it was designed as a representation for vision, it left a mark on the IUS community as well. Frames are data struc-
399
tures representing a prototypical object or event. The components of the structure are slots that are filled with specific instances of visual entities. Slots may specify a type of instance, ffiay specify a default value that can be used if the instance is not found, and may have associatedconstraints that relate one slot to others. Frames, sometimes also called "schemata," are used in the SIGMA (47),ALVEN (20),ACRONYM (24), MAPSEE (45), and VISIONS (31) systemsamong others. A large collection of frames posesa serious indexing problem, and one solution for this is to organize the frames into a semantic network. In such a representation nodes are frames that represent objectsor events, and arcs are network organizattonal primitives, such as generalization-specialization and aggregation-decomposition. The similarity relationship, motivated by Minsky, is added to the ALVEN scheffi€,as well as a temporal precedencedimension, as further organi zatronal relations among frames. Rules. Rules may be used to encodeobject characteristics, spatial relationships among objects,constraints on shapesand sizes, and so oD, for use in an IUS. The use of rules in the SPAM system (35) has already been mentioned. In the VISIONS system (39) rules are applied to the attributes of the Iines and regions in an intermediate representation. Simple rules define ranges over a feature value and, if fired, are considered as a vote for an object label. Here image features include color, texture, shape, size,and location, and feature values include length, location, orientation, contrast, and width. They allow complex combinations of simple rules. For example, they have a rule that measures excessgreen present in grass by computing the appropriate mean of color values in the R-G-B ranges for pixels in the region in question. This approach can also be found in the work of Nagao and Matsuyama (36), and Ohta (37). Reasoningand Uncertainty processes.Relaxation-labeling procesRelaxation-Labeling ses appeared first as discrete constraint propagation (qv) schemesand then as probabilistic ones (seeReasonitg, plausible). The primary difference between the discrete and continuous schemesis that decisionsin the discrete caseare binarya label is either true or it is removed from consideration-and in the continuous caselabels have an associatedstrength that is increased or decreaseddepending on the constraints imposed on it by its neighboring context. One may think of strength in this context as a measure of goodnessof fit-it is not a probability in the formal sense.The reader is referred to the foundational paper by Hummel and Zucker (48) on continuous relaxation. Relaxation labeling is commonly used in recognition cone approaches,within layers of the cone, and hierarchically between layers. AIso, the excursion into a time-varying continuous relaxation scheme called "temporal cooperative computation" is presented by Tsotsos(4il. Evidential freasoning. One method for making decisions based on uncertain information is the use of Bayesian probabilities. This method is describedelsewhere(seeBayesian decision methods). Another method for combining evidence in order to draw conclusionsthat has been applied in vision is the Dempster-Shafer formalism (seeUncertainty and probability in AI, representation of). The major difference between Dempster-Shafer and Bayesian probabilities is that an explicit representation of partial ignorance is provided. Belief is repre-
IMAGE UNDERSTANDINC
sentedin the range [0, 1], and lower boundswithin this interval are moved higher and upper bounds are moved lower, reflecting the addition of supporting or conflicting evidence, respectively. The width of the remaining interval is regarded as ignorance. This scheme is being applied in the VISIONS work (39). ning. Reasoningsystemsthat deal priSpatiotempora/Reaso marily with axioms whose propositions are spatial relations or facts can be termed "spatial reasoners."Similarly, those dealing with temporal relations are "temporal reasoners"(seeReasoning, temporal), and those that deal with geometric information are "geometric reasoners." Grouping processes,such as those reflected by inferences along the part-of representation dimension,are also included here. It is clear that the inclusion of such reasoning processesis important in IUSs. ACRONYM (24) uses 3-D object models and can reason about complex coordinate transforms of them. It also includes an algebraic reasoner that reasons about sets of non linear algebraic symbolic inequalities and bounds and determines satisfying setsof those inequalities. Other systemsthat explicitly addressthe problem of spatial reasoning are SIGMA (47) and SPAM (35). In both casesthe reasoning is 2-D and is based on imagecentered representations. A specific type of spatial reasoning is the use of maps. The premise behind the use of maps is that explicit map-to-image correspondencecan be derived using modelsof the imaging processand modelsof the terrain in the maps. The correspondencecan be used to guide the interpretation of detailed features of the image. Spatial reasoning is discussedfurther in another entry (seeReasonitg, spatial). An exampleof an IUS that dealswith temporal reasoningis the ALVEN system (20). The form of reasoning is very different than the temporal calculusof Allen (50),which is an example of the pure temporal reasoning methods. Allen's scheme was not intended for vision, and it therefore displays several deficienciesthat are important for vision: It doesnot allow for strength of belief in a temporal relation; it doesnot provide a recognition structure for detecting and labeling temporal relations; and it does not account for the fact that in a real-time recognition situation, all data in time are not available to the system. The ALVEN framework incorporates all of these points, in addition to the fact that all temporal relations in ALVEN are really spatiotemporal. Planning.As mentioned previously, planning (qv) has played a role in vision since Kelly first used plans in 1971 in his program for face recognition (15). Kelly applied edgeoperators to a reduced image in order to extract the face outline and then expanded the outline to the original image size and searchedfor details only within this prediction window. This type of planning, using explicit prediction windows, has been used in many systems. An example of an IUS that uses planning with goal satisfaction is Garvey's system (51). In the domain of indoor office scenes, Garvey defined operators such as "find seat" (of a chair), "validate Seat," "grow seat," and similarly for all objects that were known. Sequencesof operators were planned and representedin an AND/OR tree (seeAND/OR graphs). Plans were scoreddependingon cost and confidence.On execution, the outcome of particular steps can be used to modify other parts of the plan. The system of Ballard, Brown, and Feldman (52) also has a limited planning capability. It is limited in that only a very small number of operators are available, and no plan hierarchy is constructed- In the domain of locating ribs in chest radiographs, tbr example,Ballard and co-
workers included three independent rib-finding procedures that were managed by an executive procedure. ExampleSystemsfor SpecificProblemDomains The description of systems is necessarily abbreviated and incomplete.All systems cannot be included, nor all details for each system included. The presentation is in tabular form, giving the reader the key pieces of information for each system, along with relevant pointers to the literature. All systems employ the basic cycle of perception in some form, perhaps with important enhancementsthat have been describedpreviously, unless otherwise noted.Thus, they all involve the interaction of both top-down and bottom-up methods. All systems make the assumption that knowledge can compensatefor poor quality input and image-specificsegmentation processes.All systems have demonstrated some reasonable level of performance, usually on a small set of carefully chosen example images. The systems are grouped according to application domain and are listed alphabetically using the system name or the principal author's name. Within each category, at least one example of sample input and output of a system is provided. Where more than one example is given, it will be for the purpose of illustrating performance of different control structures. The reader should not assumethat the omission of examples for a particular system is a statement on the system's quality. Aerial Photographs Name Authors Institution References Domain Representation
Control
Name Authors Institution References Domain Representation
Controt
ACRONYM (seeFig. L2) R. Brooks, R. Greiner, T. Binford Stanford University 54,24 (paperdates:L979,1981) Airport scenes Three-dimensional geometric models; generalized cones; ellipses, ribbons; frames; PART-OF; IS-A; object graphs for geometric constraints; restriction graphs for algebraic constraints; context graph; coarse-to-fine detail; models independent of viewpoint; user interface for model definition using volumetric Primitives Line finding; rule-based problem solving; graph matching between prediction graph and picture graph of image features; prediction of object appearance based on viewpoint and illumination, but only of important features; geometric reasonitg; algebraic reasoning Not available D. Ballard, C. Browr, J. Feldman University of Rochester 52 (paper date: 1978) Ship dock scenesin satellite photos Two-dimensional spatial knowledge; semantic network; meta-knowledge for planning; sketch map as intermediate image representation; procedural knowledge Distributed control; model-image mapping vra procedural knowledge of objects;executive choosesmost likely mapping procedure
IMAGE UNDERSTANDING
-
c,I=:f
#
F=t!|'= E
Portwind Fuselage
cl
Starboard wind $
(c)
@)
Figure 12. Example of the input and output from the AcRoNyM system (from Ref. bB). An original image is shown,with three stepstoward the labeling of the fuselageand wings (c-d).
Name Authors Institution References Domain Representation Control
Name Authors Institution References Domain
HAWKEYE H. Barrow, R. Bolles, T. Garvey, T. Kremers, J. Tenenbaum, H. Wolf SRI, International 46 (paper date: 1977) Aerial photographs Two-dimensional topographic maps as symbolic scenemodel; geometric camera model Parametric correspondencefor map matching camera model calibrated on landmarks; then used to predict precise locations of other features
Representation
Control
Name Authors Institution References Domain Representation
MAPSEE A. Mackworth, W.Havens University of British Columbia 55, 45 (paperdates: L977, lgSB) Freehand drawings of maps on satellite images
Control
Two-dimensional spatial knowledge; cartographic elements; schemata; IS-A; PARTOF; Waltz-like primary cues in drawings such as TEE, OBTUSE L, MULTI extended Waltz frltering (qv) to n-ary relations and hierarchies; region growing Not available M. Nag&o, T. Matsuyama University of Kyoto 36 (paper date: 1930) Aerial photographsof roads,houses,forests, fields, and rivers Tyo-dimensional spatial knowledge; regions with attributes including spectral information; objects defined using 2-D heuristics Blackboard-style (qv) specialized subsystems for specialized features; interpretation is image centered
4O2
IMAGE UNDERSTANDING
Name Authors Institution References Domain Representation
Control
Name Authors Institution References Domain Representation
Control
SIGMA T. Matsuyam&, V. Hwang Kyoto University, University of Maryland 47 (paper date: 1985) Aerial photographsof roads,houses,rivers, fields, and forests Frames; PART-OF; IS-A; rules attached to slots for constraint and instantiation information; 2-D spatial knowledg.; spectral knowledge Three communicating experts, geometric reasoner,model selector,and low level; intersection of prediction areas in image-centered representation; evidence accumulation in image-centeredrepresentation
Name
Authors Institution References Domain Representation
Control
Name SPAM (seeFig. 13) Authors D. McKeown, W. Harvey, J. McDermott Institution Carnegie-Mellon University References 35 (paper date: 1984) Airport scenes Domain Two-dimensional spatial knowledge; rules Representation to encode object attributes and relations; simple camera model; viewpoint, illumination, scale independent Edge- and region-basedsegmentation 1qv); finds islands and extends them using geo- Control metric models and rule-based reasonitg; rule-basedproblem solving (qv) and focusof attention
Outdoor Scenes Not available M. Levine McGill University 27 (paper date: 1978) Outdoor scenes with houses, trees, and roads Representation Two-dimensional spatial knowledge; pyramids for image features; viewpoint dependent; short- and long-term memory; relational database Short-term memory acts as blackboard; dyControl namic programming for segmentation;local graph matching for intermediate-level representation;relational databaseoperations; production system for high-level representation; confidencemeasures for region-object
Name Authors Institution References Domain
Name Authors Institution References Domain Representation
Control
NAOS B. Neumann, H. Novak University of Hamburg 56 (paperdate: 1983) Street and traffic scenes Case frames based on verbs of locomotion hierarchically organized;3-D shape;temporal knowledge;IS-A; PART-OF Linear programming for matching; expectations in time; question answering (qv) and connection with a natural-language system
Not available (seeFig. 14) Y. Ohta Kyoto University 37 (paper date: 1980) Outdoor color scenesof sky, trees,buildings, and roads Two-dimensional spatial knowledg"; color parameter representation; regions and attributes; rules for objectproperties and relations Rule-basedreasonitg; coarse-to-fineregion growing; rule applicability ranked on correctnessvalue; focus on best rules for execution VISIONS (seeFig. 15) A. Hanson, E. Riseman (and many others) University of Massachussettsat Amherst 31, 57, 58, 39 (paper dates: 1978, 1980, 1981,1984) Outdoor color scenesof housesand trees Initial development:2-D spatial knowledge; 3-D spatial knowledge; schemataorganized along PART-OF and IS-A; more recent development: rules for object hypothesis and focus of attention. Initial development: blackboard communication; processing cones and relaxation for edge and region extraction; procedural knowledge representation; more recent development: rule-basedfocusof attention; region and line algorithms without relaxation; intermediate grouping and organizational processes;sensor and representation fusion during interpretation; knowledge-directed feedbackto low level processing;someeffort to integrate evidential reasoning.
Indoor Scenes Name Authors Institution References Domain Representation
Control
Na'me Authors Institution References Domain
Not available T. Garvey SRI International 51 (paper date: 1976) Office scenesof known objects,telephones, desks,and chairs Three-dimensionalspatial knowledg.; relations; objects as conjunctions of histograms of local features; regions are lists of image samplesor bounding polygonsin space Based on planning of operator sequences; plans represented as AND/OR tree; involved three stages, acquire samples, validate and bound to object model; operators are object specific; cost/confidencescoring measures IGS (seeFig. 16) J. Tenenbaum, H. Barrow SRI International 11 (paper date: 1977) Rooms, mechanical equipment, and landscapes
(b)
k)
Figure 13. Example ofthe input and output from the SPAM system (from Ref35). (o) Original image of an airport scene; (6) region-based segmentation produced by SPAM; (c) the functional areas extracted by the system.
404
IMACE UNDERSTANDING
@)
(c)
Figure 14. Example of input and output from Ohta's system (from Ref. 37). (o) digitized input scene;(6) result of preliminary segmentation; (c) plan image; (d) result of meaningful segmentation (S = sky, T : tree, B = building, R = road, C = car, CS = car shadow).
Representation
Control
Two-dimensionalspatial knowledg.; region based;relational constraints; object models as 3-D polyhedral representations Generalized Waltz filtering (qv); semantic region growing; visibility matrix for 3-D models computed using camera model
Control
Medical lmages ALVEN (seeFig. L7) J. Tsotsos,J. Mylopoulos, H. Covvey, S. Zucker University of Toronto Institution 59,20,49 (paperdates:1980,1985,1986) References Evaluation of human left ventricular perDomain formance from X-ray movie Representation Two-dimensional spatial knowledge;spatiotemporal representation; frames organized with IS-A, PART-OF, similarity, temporal
Name Authors
Name Authors Institution References Domain Representation Control
precedence;slots have attached interslots constraints for verification and instantiation Combination of model, goal, data, failure, and temporally directed hypothesis activation; temporal cooperative computation for hypothesis certainty driven by knowledge organi zation semantics; temporal expectation generation; expectationfailure handled by prediction generalization along is shown in Fig. 17. Not available D. Ballard, C. Browl, J. Feldman University of Rochester 52 (paper date: 1978) Identification of ribs in chest radiograph See entry in Aerial PhotograPhs See entry in Aerial PhotograPhs
IMACE UNDERSTANDING
Labelling Key
W
ru ru
Figure 15. Exampleof input and output from the VISIONS system(from Ref. 39). (c, b) Original imagee;(c, d) final segmentationand labeling.
405
406
I M A G EU N D E R S T A N D I N G
F i n a l Re g i o n I n t e r p r e t a t i o n s Interpretations Regions Door Wall Floor Picture Tabletop C h ai r s e a t C hai r b a c k Waste Basket
1 2 3 4 5 6 7 I
Figure 16. Exampleof input and output from the IGS system(from Ref.11).
Name Authors Institution References Domain
Not available F. Ferrie, M. Levine, S. Zucker McGill University 38 (paperdate: 1982) Tracking cell motion and morphologyin microphotographimage sequences
Representation
Two-dimensional spatial knowledge; motion knowledg., including shape changes; region based; cell state changes encoded as rules
Control
Views next state prediction and best-match selection as minimtzatton problems; solution similar in form to a Newton-Raphson method; rule interpreter for cell identification and state changes
Researchlssues There are a gteat many issues outstanding in the field. Perhaps the most important one,and one that is not unique to IU, is the need for a scientific framework within which to design, describe,experiment, and documentexperiencesin IUS building. Few if any attempts at independentverification of claims made are carried out. In other scientifi.cfi,elds,independent
duplication of results is a crucial componentof the acceptance of a result as a contribution to the field. The lack of such activity in this area may be due to the lack of an overall framework for vision research;the "big picture" within which individual contributions can be placed and interrelated is missing. Most of the topics covered in this entry require further research, and many have already been mentioned. Additional topics specificallyaddressingthe openproblemsof the IU field are given below. What ls the Ro/e of Damain Knowledge?Is its application always necessary or does its application depend,perhaps,on the complexity of the scenes being interpreted? Many researchersin the vision community contend that most, if not all, visual interpretation tasks can be carried out without domain knowledge, and this issue needs to be explored more fully. A growing segment of the psychologycommunity raises the distinction between attentive and preattentive vision. These are fundamentally different from the high-level-lowlevel distinctions that computer vision draws and explicitly addressthe goals of a system in viewing a particular sceneas well as scenecomplexity. The two visual processesare distinguished by their parallel or serial nature, and domain knowledge may play a role in each. Schemesbe lntegrated How Can the Bestof the Early-Vision With High-LevelSchemesin a CoherentManner?There currently seemsto be no real relationship betweenthe techniques used to extract image features and those used to interpret them. Yet there must be an effective interface, if not also efficient representation transformations, in biological systems. Does this only What ls the Natureof Top-Down Feedback? impact search schemes,or could it also play a role in expectation generation, in fine tuning of image operators,in priming of semantic concepts,or in bridging the gap between imagecenteredand world-centeredrepresentations,and if so, how? for lmage DoesThere Exista SufficientSet of lmage Features ? lnterpretation What ShouldBe Donein Paralleland WhatSerially,WhYand How? How can computationsbe coordinatedand organized? Whatis the Natureof theMechanismThatAIIowsfor the Comgths? Stren binationof Evidenceor Response Motivatethe Design of lmageHow Canthe BiologicalSciences Systems?What goeson betweenthe input and Understanding is a totally unconstrainedprocess,and this points the output to the major objective in this field: the discovery of computational models that can transform images plus world knowledge into scene interpretations. Guidance from biological research on vision can assist in providing some constraints on the characteristicsof the interpretation process. FormalismExistThat Spansthe Many Doesa Representational nt, or is Therea Needfor Multi' Prese RequirementsVisionSystems Tools? ntational Represe ple, Coordinated Required Mechanisms Whatis the Natureof theComputational for Vision?Biologicalvision seemsto be nonlinear, time varyirg, hierarchical, and parallel with a superimposedserial component. Do formalisms exist that can deal with this? FromSystemBuilding? WhatExactlyCanBe Learned The bottom line is that although image-understanding systems can be engineered to perform reasonably for a tightly constrained domain, the engineering is not yet completely basedon sound scientificprinciples.There is still a long way to go before truly general-purposevision systemsappear.
(b) LEFT VENTRICLE exhibits: interval (0, 6) TRANSLATING-time r a t e ( m m / s )+ 1 5 , 3 3 , 1 5 , 3 3 , l , 2 I trajectory (rad) + 4.71, 1.05, t.24, 1.05,
2.36
interval (7, 15) TRANSLATING-Iime r a t e ( m m / s )+ 1 5 , 1 5 , 1 5 , 1 5 , 1 5 , 1 5 , 1 5 , 1 5 t r a j e c t o r y( r a d )+ 4 . 7 1 ,4 . 7 I , 4 . 7 I , 3 . I 4 , 4 . 7 1 ,3 . 1 4 ,
4.7r
VOLUME CHANGE-time interval (0' 16) -138,2,L20,57,54, rate(ml/s)+ -57, -216, *75, -168, -186, t20, t62,90,27,45, 90,90 specializations: UNIFORMLY CONTRACTING during (0, 1) SYSTOLE during (1, 6) UNIFORMLY CONTRACTING during Q,6) UNIFORMLY EXPANDING during (7, 11) DIASTOLE during (7, 16) UNIFORMLY EXPANDING during Q2, 14) UNIFORMLY EXPANDING during (15, 16) PERIMETER CHANGE-time interval (0, 6) r a t e ( m m / s )- - + 1 5 , - 1 5 0 , 1 5 , - 1 6 5 , - 1 6 5 , - 1 0 5 specializations: LENGTHENING during (0, 1) SHORTENING during (I, 2) LENGTHENING during (2, 3) SHORTENING during (3, 6) PERIMETER CHANGE-time interval (7, 8) rate (mm/s) ---+90 specializations: LENGTHENING during (7, 8) PERIMETER CHANGE-time interval (9, 16) rate (mm/s)+ 30, 75, 150, 60, 15, 60, 60, 60 specializations: LENGTHENING during (9, 16) WIDTH CHANGE-time interval (0, 16) rate (mm/s) > -15, -15, -60, -15, -60, 75, 45, 45, 45, 45, -15, -15
60,
LENGTH CHANGE-time interval (0, 16) r a t e ( m m / s )- t 3 0 , - 4 5 , - 1 5 , - 6 0 , - 6 0 , - 3 0 , - 3 0 , 4 5 , 1 5 , 1 5 , 1 5 , 45, 45, 45, 45, 45, 45
(e)
Others: Isometric contraction during (0, 1 ) No translation during (6, 7) No perimeter change during (6, 7 ) No perimeter change during (8, e) No translation during (15, 16) Exceptions to normal detected: Mildly dyskinetic-contraction during (3, 4) Ischemic anterior isometric relaxation during (6, 7) Severely poor systole during (7,7) Moderately dyskinetic-expansion during (9, 15)
Figure l?. Example of input and output from the ALVEN system (from Ref. 20). (o) Example of marker finding using motion hypothesis predictions; (b) highlighted extracted markers for one image of the image sequence;(c, d) inward and outward patterns of motion, respectively, for a complete heart cycle; (e) textual output describing the performance characteristics and anomalies detected by ALVEN.
IMAGEUNDERSTANDING BIBLIOGRAPHY
1. T. Binford, "survey of model-basedimage analysis systems,"Int. J. Robot.Res., l(1), 18-64 (Spring 1982). 2. T. Kanade, Model Representationsand Control Structures in Image Understanding, Proceedings of the Fifth IJCAI, Cambridge, MA, pp. 1074-1082, 1977. 3. T. Matsuyama, Knowledge Organization and Control Structure in Image Understanding, Proc. ICPR, Montreal, Quebec,pp. 1118rL27,,1984. 4. J. Tsotsos,"Knowledge of the visual process:Content, form and use," Patt. Recog.17, 13-28 (1984). 5. A. Hanson and E. Riseman,(eds.),ComputerVision Systems,Academic Press,New York, 1978. 6. D. Ballard and C. Brown,ComputerVision, Prentice-Hall,Englewood Cliffs, NJ, 1982. 7. W. Uttal, A Taxonomy of Visual Processes,Lawrence Erlbaum Hillsdale, NJ, 1981. 8. W. Woods, Theory Formation and Control in a Speech Understanding System with Extrapolations towards Vision, in A. Hanson and E. Riseman (eds.),Computer Vision Systems, Academic Press,New York, pp. 379-380, 1978. 9. L. Erman, F. Hayes-Roth,V. Lesser,and R. Reddy, "The HEARSAY-II speech understanding system: Integrating knowledge to resolve uncertainty," Comput. Suru. Lzr 213-253 (1980). 10. D. Marr and H. Nishihara, "Representationand recognition of the spatial organization of three-dimensional shapes,"Proc. Roy. Soc. Lond. B 2OO,269-294(1978). 11. J. Tenenbaum and H. Barrow, "Experiments in interpretation guided segmentation," Artif. Intell. 8(3), 24I-27 4 (L977). 12. L. Roberts,Machine Perceptionof Three-DimensionalSolids,in J. Tippett et al. (eds.),Optical and Electro-opticalInformation Processing,MIT Press,Cambridg", MA, pp. 159-197,1965. 13. G. Falk, "Interpretation of imperfect line data as a three-dimensional scene,"Artif. Intell. 3(2), 101-144 (197D. L4. Y. Shirai, "A context-sensitiveline finder for recognition of polyhedra," Artif. Intell. 4(2),95-119 (1973). 15. M. Kelly, "Edge detection in pictures by computer using planning," Machine IntelL 6, 397-409 (1971). 16. A. Mackworth, Vision Research Strategy: Black Magic, Metaphors, Mechanisms, Miniworlds, and Maps, in A. Hanson and E. Riseman (eds.), Computer Vision Systems,Academic Press, New York, pp. 53-60, 1978. 17. T. Kanade, "survey: Region segmentation:Signal vs semantics," Comput. Vis. Graph. Img. Proc. 13,279-297 (1980). 18. D. Huffman, "Impossible objects as nonsensesentenc€s,"in Ref. 13, pp. 295-323. 19. M. Clowes,"On seeingthings," Artif. Intell. 2,79-L16 (1971). 20. J. Tsotsos,"Knowledge organization and its role in the interpretation of time-varying data: The ALVEN system," Computat,Intell. 1(1), L6-32 (1985). 2L. D. Marr, Visioro,W. H. Freeman, San Francisco,CA, 1982. 22. E. Freuder, A Computer System for Visual Recognition Using Active Knowledge, Proc. of the Fifth IJCAI, Cambridge, MA, pp. 67L-677, t977 . Zg. J. O'Rourke and N. Badler, "Model-basedimage analysis of human motion using constraint propagation,"IEEE Patt. Anal. Machine Intell. 2, 522-536 (1980). 24. R. Brooks, "symbolic reasoning among 3-dimensionalmodels and 2-dimensionalimages," Artif. Intell. t7, 285-348 (1981). 28. N. Badler, Temporal SceneAnalysis: Conceptual Descriptions of Object Movements,TR. 80, Department of Computer Science,University of Toronto, L975.
26. R. Nevatia, Characterization and Requirements of Computer Vision Systems,in A. Hanson and E. Riseman (eds.),ComputerVision Systems,Academic Press,New York, pp. 81-88, 1978. 27. M. Levine, A Knowledge-BasedComputer Vision System, in A. Hanson and E. Riseman (eds.), Computer Vision Systems,Academic Press,New York, pp. 335-352, 1978. 28. H. Barrow and J. Tenenbaum,RecoveringIntrinsic SceneCharacteristics from Images, in A Hanson and E. Riseman (eds.),Computer Vision Systems,Academic Press,New York, pp. 3-26, 1978. 29. L. Uhr, "Layered'recognitioncone'networksthat preprocess,classify and describe,"IEEE Trans. Comput. 21,758-768 (1972) 30. S. Tanimoto and A. Klinger (eds.),Sfructured Computer Vision, Academic Press,New York, 1980. 31. A. Hanson and E. Riseman, VISIONS: A Computer System for Interpreting Scenes,in A. Hanson and E. Riseman (eds.),Computer Vision Systems,Academic Press, New York, pp. 303-334, 1978. 32. B. Lowerre and R. Reddy, The HARPY Speech Understanding System, in W. A. Lea (ed.), Trends in SpeechRecognition, Prentice-Hall, EnglewoodCliffs, NJ, Chapter 15, 1980. 33. S. Rubin, "Natural scenerecognitionusing LOCUS search,"Com' put.Vis. Graph..Img.Process,13, 298-333 (1980). 34. M. Baird and M. Kelly, "A paradigm for semantic picture recognition," Patt. Recog.6, 6L-79 (1974). 35. D. McKeown, W. Harv€I, and J. McDermott, Rule-BasedInterpretation of Aerial Imag€ryo Proceedings of the IEEE Workshop on Principles of Knowledge-BasedSystems,Denver, CO, pp. 145-158, 1984. 36. M. Nagao and T. Matsuyama, A Structural Analysis af Complex Aerial Photographs, Plenum New York, 1980. 37. Y. Ohta, A Region-OrientedImage Analysis Systemby Computer, Ph.D. Thesis, Kyoto University, Department of Information Science,1980. 38. F. Ferrie, M. Levine, and S. Zucker, "Cell tracking: A modeling and minimization approach," IEEE Patt. Anal. Machine Intell. 4(3), 277-290 (1982). 39. E. Riseman and A. Hanson, A Methodology for the Development of General Knowledge-BasedVision Systems,Proceedingsof the IEEE Workshop on Principles of Knowledge-Based Systems,Denver, CO, pp. 159-L72, 1984. 40. M. Minsky, A Framework for RepresentingKnowledge,in P. Winston (ed.),The Psychologyof Computer Vision, McGraw-Hill, New York, pp. 2LL-277, L975. 4I. A. Pentland, Perceptual Organization and the Representationof Natural Form, SR/ Technical Note 357, 1985. 42. J. Freeman, "survey: The modelling of spatial relations," Comput. Vis. Graph. Img. Process.4, L56-L71 (1975). 43. B. Kuipers, "Modelling spatial knowledge,"Cog.Sci. 2, L29-154 (1e78). 44. D. McDermott and E. Davis, "Planning routes through uncertain territory," Artif. Intell. 22, 107-156 (1984). 45. A. Mackworth and W. Havens, "Representing knowledge of the visual world," IEEE Comput. 16,90-98 (1983). 46. H. Barrow, R. Bolles, T. GarveY,T. Kremers, J. Tenenbaum'and H. Wolf, Experiments in Map-Guided Photo Interpretation, Pro' ceedingsof the Fifth IJCAI, Cambridge, MA, p. 696, t977. 47. T. Matsuyama and V. Hwang, SIGMA: A Framework for Image Understanding: Integration of Bottom-Up and Top-Down Analyses,Pro ceedings of the Ninth IJCAI, Los Angeles, CA, pp. 908915, 1985. 48. R. Hummel and S. Zucker, On the Foundations of Relaxation La' TR-80-1, Department of Electrical Engineering, betling Processes, McGill University, 1980. 49. J. Tsotsos, Representational Axes and Temporal Cooperative
INDUCTIVEINFERENCE Computation,in M. Arib and A. Hanson (eds.),Vision, Brain and CooperatiueComputation,MIT Press,Bradford Books,Cambridge, MA, 1986. b0. J. Allen, "Towards a general theory of action and time", Artif. Intell. 23, t23-t54 (1984). b1. T. Garv ey,PerceptualStrategiesfor PurposiueVision, SRI Technical note IL7,1976. 52. D. Ballard, C. Brown, and J. Feldman, An Approach to Knowledge-DirectedImage Analysis, in A. Hanson and E. Riseman (eds.),Computer Vision Systems, AcademicPress, New York, pp. 27r-282, L978. b3. R. Brooks, "Model-based three-dimensional interpretations of two-dimensional images," IEEE Patt. Anal. Machine Intell. 5(2), 140-150 (March 1983). 54. R. Brooks, R. Greiner, and T. Binford, The ACRONYM ModelBased Vision System, Proceedings of the Sixth IJCAI, Tokyo, Jap&tr,pp. 105-113,L979. bb. A. Mackworth, On Reading Sketch Maps, Proceedingsof the Fifth IJCAI, Cambridge, MA, pp. 598-606, L977. b6. B. Neumann and H. Novak, Event Models for Recognition and Natural Language Description of Events in Real-World Image Sequences,Proceedingsof the Eighth IJCAI, Karlsruhe, FRG, PP. 724-726, 1983 57. C. Parm&, A. Hanson, and E. Riseman,Experiments in SchemaDriven Interpretation of a Natural Scene,COINS TR 80-10,University of Massachussettsat Amherst, 1980. 58. B. York, A. Hanson, and E. Riseman, 3D Obiect Representations and Matching with B-Splines and Surface Patches, Proceedingsof the SeuenthIJCAI, Vancouver,BC, pp. 648-651, 1981. 59. J. Tsotsos,J. Mylopoulos,H. Covvey,and S. Zucker,"A framework for visual motion understandi.g," IEEE Patt. Anal. Machine Intell. 2, 563-573 (1980).
of this area may be found in the survey articles of Angluin and Smith (3) and Klette and Wiehagen (4) and the comprehensive paper of Case and Smith (5). An Example A simple example is given to motivate later definitions. Consider the problem of attempting to identify polynomials of one -- 0, L, 2, . . For examvariable r from their values on s 'l ple, the polynomial 12 + 3 generates the values 3, 4, , L2,
19,. "' Imagine that there is an unknown polynomial p(x) in a black box with a button. The first time the button is pressed, the valuep(0) comesout of the box. Subsequentbutton pushes causethe box to producethe values of p(n) for successivenumbers n. The button may be pressedarbitrarily often. There is no restriction as to when a guess can be made of what the polynomial p(x) is. The identification will be successfulif after somefinite time no more guessesare made, and the last guess made is equivalent to p(r), that is, represents the same function as p(r). There are a variety of methods that will successfullyidentify all polynomials with integer coefficients.One such method relies on the existence of fast algorithms for interpolating a polynomial of degreed through any d + 1 points. This method initializes by pressing the button, receiving a value ua, and guessingthe constant polynomial uo.Then iterate the following step. Assume that after d stages d values u0, ur, . . . ) ua-t have been received.Pressthe button to request another ttalue, ua.Next interpolate a polynomial pa&) of degreed through the points (i, u) for i: 0, 1, . . . , d.If this polynomial is different from the most recent previous guess,then producea new guess J. K. Tsorsos for pa(x); otherwise, do nothing. In either case the method Universityof Toronto then goes on to the next stage. If the polynomial p(x) in the black box is of degree d, then after d + 1 stages this method will converge to a polynomial equivalent to p(r). However, it is not assumedthat the method INDUCTION. SeeLearning. is informed of any bound on the degreeof p(r), so it must go on requesting values and interpolating polynomials indefinitely in order to be assured of correct convergenceon all polynomiINFERENCE INDUCTIVE als with integer coefficients.As an illustration, if the sequence of values starts out 1, 1,L,1, 1, one might strongly suspectthe general Inductive inference is the processof hypothesizing a constant polynomial I, but this is also the first five values in well as part learning as plays in a It rule from examples. activities that are not generally describedas learning, for ex- generated by the polynomial ample, pattern recognition, Program synthesis, and the con1)(r - 2)(x - 3Xr - 4) 1 * r(r struction of scientific theories. The primary paradigm in mathematical studies of inducwhose next value is 121. tive inference has been the notion of identification in the limit, defined by Gold in his seminal paper (1). A mathematically equivalent notion was used by Putnam, in his 1963 Voice of ldentifyingClasses of Functions America Lecture, to point out some of the difficulties with the mechanization of science(2). In very rough terms the idea is to The above example is a special caseof the problem of identifylook at the limiting behavior of an inference method as it is ing classes of functions. An identification problem for funcgiven more and more examples of some general rule. The in- tions specifiesa domain, a range, & set of functions from the ference method is permitted to make a finite number of mis- domain to the range, a method of representing the functions in takes as long as its guesseseventually "convetge" to being the set, and a method of presenting examplesof the functions. correct. A formal definition must specify what is meant by an In the example above the domain and range are the integers, inference method, rules, examples, guesses,correctness,and the set is those functions that can be representedby polynomiconvergence, which have been specified in many different als in one variable with integer coefficients, the method of ways, and the resulting inductive inference problems com- representation is to give a polynomial expressionfor computpared. Someof the more important ideas and results of induc- ing the function, and examples are pairs (n,f(n)) given in tive inference are described below. Other general treatments increasing domain order.
410
I N D U C T I V EI N F E R E N C E
For a formal theory of inductive inference the general plan is to consider sets of computable functions and choosecomputational representations for them. For a specific application, consider those functions from lists to lists that can be representedby LISP programs which are instancesof someparticular recursive scheme.However, to study the power and limitations of inductive inference methods abstractly, it is convenient to restrict attention to functions whose domain and range are the natural numbers and to choosea standard representation of computable functions by a general-purposeprogramrning system, for example, Turing machines (qv) or arbitrary LISP programs. This involves somecomputableencoding and decoding between the real inputs and outputs (lists, strings, terms, gaphs, programs, or whatever) and the natural numbers. The encoding and decoding may affect the efficiency, but not the ultimate power, of an inference method. Often interest focuseson functions that are total, that is, defined on every possible argument value. Assumittg every function in the class of interest is total, it might be attractive to restrict all the guessesof the inference method to be programs for total computable functions since any other guesses cannot possibly be correct. Inductive inference methods restricted in this way are called Popperian stnce they in some sensereflect Popper's requirement that a scientific theory aIways be falsifiable (6). A conjectured program, computing a nontotal function, cannot be shown incorrect by using a value for which the program is not defined as a counterexample.To do so would require a solution to the halting problem. For the samereason it is not always possibleto convert a non-Popperian inferencemethod into a Popperian one. There is no "easy" way to weed out the nontotal guesses.In fact, even for the case of identifying sets of total functions, Popperian methods identify a strict subset of the classesidentified by non-Popperian methods (7). Hence, allowing nontotal intermediate hypotheses increasesthe potential power of a method, though it may be less convenient in other ways. The remainder of this entry is based on the assumption that the classesof functions to be inferred are all total functions but that the methods are not necessarily Popperian. This issue is consideredfurther in the section on search. Any specification of an inductive inference problem must include a definition of what constitutes examplesand how they are presented. In the case of functions, examples are pairs of the ior* @, f(d). There are various ways that examplesmight be presented to an inference method. They might be supplied by some external agency in some arbitrary or predetermined oid.r or they might be supplied in responseto queries from the inference method of the form "Tell me the value of f on argument n." If one has in mind a "teachlt," or helpful sourceof examples, one might consider an order of presentation that depends on the function being presented in a useful way. In formulating this kind of setting, it is important to avoid trivial solutions that d.irectly encodethe answer in the order of presentation. In any case, it is essential to assume that every argument-valuepairisavailable(eventually)totheinference method. Often, interest will focus on what classesof functions are identifiable and not on the efficiency of the inference method, in which case it is permissible to assume without any loss of generality that all dat a are always presentedas /(0) , f(L), f(2), A method with accessto queries can be simulated on . . data presented in increasing domain order by reading and
storing the values of f until the one correspondingto the queried argument appears. So, to summarrze, most results concerning the abstract identifiability of classes of functions are stated in terms of classesof total computable functions from the natural numbers to the natural numbers using examplespresentedin increasing domain order. The next object to be specifiedis the actual inductive inference method. An inductiue inference method is a program with special instructions to request the next input value and output its next guess.To run the inference method on a particular function f start the program; whenever it requests the next input value, it receives the next value f(n) from the sequencef(0), f(l),f(z), . . . , and whenever it outputs a guess,that guessis appendedto the end of an initially null guess sequence.The inference method may run forever (as in the example),eventually reading in every input value and outputting an infinite sequenceof guesses,or it may stop reading inputs or stop producing guesses,or both. There may be no correlation between when it reads inputs and when it producesguesses.In any case,the inference method producesin the limit a finite or infinite sequenceof guessesincluding every guessthe method makes with the function f as input. Each guessis a string of symbols that may be interpreted as a program in the general programming system that has been chosen(LISP programs' Turing machines, or whatever). It is this sequenceof guesses that is used to define the correct convergenceof the inductive inference method. There are two basic criteria of correct convergence,called EX and BC. EX stands for explanatory. An inductive inference method M ts said to EX-identify (or explain) a function f if, when M ts run with the function f as :[nput,either the sequenceof guesses of M is finite and the last guess in the sequenceis a program that correctly computesf o, the sequenceof guessesis infinite and after some finite point all the programs are syntactically equal and correctly compute /. The idea is that the inference method eventually stops changing its guess and settles on a correct guess of a program to compute the function. In the example above,the method describedcorrectly EXidentifies any function specifiedby a one-variable polynomial with integer coefficients.EX is definedto be the classof all sets U of functions such that there is an inductive inference method that EX-identifies every function in U.If "PoIy" stands for the set of functions specifiedby a one-variable polynomial with integer coefficientsthen Poly e EX. BC stands for behaviorally correct. BC is a weaker criterion of successthan EX in that the inductive inference method may continue changing its guess indefinitely as long as after some finite time all the guessesare correct programs for the function being presented. That is, an inductive inference method M Bc-identifies a function f if , when presentedwith the function f as input, either the sequenceof guessesproducedby M rs finite and the last guessis a program that correctly computesf or the sequenceof guessesis infinite and there is some finite point in the sequenceafter:which all the guessesare programs that correctly computef .BC is the class of all sets U of functions such that there is an inductive inference method that BC-identifies every function in U. One intuition behind BC is that an inference method might be continuing to "tinker" with its guess, patching it, making it faster, making it smaller, without qulte knowing whether program equivalencewas being preserved.If one always usesthe most recent conjectureof an inference method that is correctly BC-identifying some
INDUCTIVE INFERENCE
function, then eventually this tactic will only produce correct behavior (outputs). Clearly if a method M EX-identifies a set U of functions, it also BC-identifies [/, so every set in EX is also in BC, that is, EX is a subclassof BC. It has been shown that EX is a proper subclassof BC (5,8). Hence' methods that are allowed to continue "tinkering" with their guessesare strictly more powerful than methods that must eventually stop tinkering. In fact, if a method could tell for sure whether its changesto its current guess preserved program equivalence, it could be modified to one that stopped tinkering. Thus, to achieve the increased power, methods that make changeswithout knowing whether equivalenceis preservedmust be tolerated. It has been shown that there is no single inductive inference method that BC-identifies every total computable function. If "Tot" stands for the set of all total computable functions, then Tot # BC. The proof is a kind of diagonahzation in which a specifictotal computable function fu is defined in such a way as to foil any particular inductive inference method M. Consequently, there is no "universal" method to BC-identify or EX-identify every total computable function. Specificmethods identify particular subsetsof Tot.
411
guagesover that alphabet, the set of legal sequencesof examples of a given language, the types of inference methods allowed, and a criterion of successfulidentification. Most of the issues of function identification are applicable to the case of language identification, so only some of the considerations unique to language identification are sketched below. If L is a language,then a positiueucampleof L is an ordered pair (w,I) such that ru is a string in L, and a negatiueexample is a pair (nr,O)such that w is a string not in L. A complete presentation of L is an infinite sequenceof positive and negative examples of L such that every string eventually appears as the first member of a pair in the sequence.Thus, & complete presentation eventually classifiesevery string as to its membership in L. If L is not the empty language, then a positiue presentation of L is an infinite sequenceconsisting of all and only the positive examplesof L. Thus, a positive presentation eventually enumerates every element of L but does not give explicit information about strings not tn L. In the case of a positive presentation, it is natural to drop the redundant 1s in the pairs. If L is the language consistingof strings of 0s and ls with an even number of 1s, I completepresentationof L might begin ( 1 1 , 1 ) (, 0 1 0 0 , 0 )(,0 0 0 , 1 )(,L 1 0 1 1 0 , 1 .) , . ,
of Languages ldentifyingClasses
and a positive presentation might begin
Gold'smotivation in his foundational paper on inductive inference(9) was to study an abstract model of the processof learning the grammar of a natural language. Using Chomsky's formalism (10,11)of "grammar" allowed Gold to concentrateon the syntax of language without the complication of a semantic component. Grammatical inference is the name given to the resultant sub area of inductive inference. A formal language in this senseis any set of strings using symbolsfrom somefixed alphabet. Roughly speakitrg,& formal grammar for a language L ts a program for generating all of the elements contained only in L. For example, the set of all strings of 0s and 1s that contain an even number of Ls is a language, as is the set of all the words in a particular dictionary that are palindromes (read the same backward as forward). The first of these can be listed by a program that systematically goes through all the strings of 0s and ls and outputs them if the number of 1s is even. The secondlanguage can be listed by a program that simply has all of the finitely many palindromes in the diction ary stored in a big table. A language Z could be representedas a function fr from the set of all strings over the fixed alphabet to the values 0 and 1, with filut) _ 1 if and only tf w is a string in L. This might suggestthat grammatical inference is simply a special caseof the inference of functions, but the concerns and results are somewhat different. For example, not all the explanatory classes(for language identification) are included in the smallest BC language identification class (L2,13). One difference between grammatical and functional inference is that a language may be recursiuelyenu,rrLerable but not recursiue,that is, there are languagesZ such that there is a program to list all of the elements contained only in L but such that the function fn defined above is not computable.Another differenceis that for a language L it is natural to distinguish between positiue examples (elementsof L) and negatiueexamples(strings that are not elementsof L) and to considerpresentationsthat consistof positive examplesonly. An inductive inference problem for formal languages is given by specifying a fixed alphabet of symbols,a classof lan-
1 1 1 1 ,0 , 0 0 1 0 0 1 00, 0 0 , 1 0 1 ,1 0 0 1 ,0 0 0 0 1 1 1, 1 , 0 1 0 1 0 1 0 1. , As in the caseof functions, one could also considermethods with accessto an oracle or informant to answer questions of the form "Is the string u in the unknown language?"Ignoring efficiency considerations,presentation by informant is equivalent to complete presentation. Another form of presentation that has been consideredis stochasticpresentation. Assume that each language has an associatedprobability distribution defined on its elements. Then a sequenceof examples is obtained by repeatedly drawing strings from the language in independent experiments according to the probability distribution. The sequencesof examples are the same as in the case of positive presentations,but there are now probabilities associated with them. Gold (1) investigated the difference between positive and complete presentations of languages and found that positive presentationsapparently constitute a very severelimitation. For the basic criterion of identification, very large classesof languages can be identified from complete presentations, including the regular languages, the context-free languages, and the context-sensitivelanguages.However, there is no machine to correctly identify in the limit even all the regular languages from positive presentations. If one believes that children learn language from what is essentially positive presentations,this suggeststhat it is inappropriate to model their learning as the identification of context-free grammars in the limit. [Wexler and Culicover (14) have addressedthe question of a more reasonableformal model of language acquisition, as have Osherson,Stob, and Weinstein (15).1 However, Gold's results concerning positive presentations must be qualified in various ways. Gold also consideredlimiting the computational resourcesof the presentation of examples and showedthat, under the assumption that the sequence of exampleswas being generated by a primitive recursive program (16), all recursively enumerable languages could be identified in the limit from positive presentations.In this case the problem is one of function identification, where the func-
412
INDUCTIVEINFERENCE
tion to be identified is the enumerator of examples. (Roughly speaking, the technique used by Gold was to concentrate on how the source of examples works rather than on what language it is enumerating.) Horning (17) has shown that if the criterion of identification is relaxed to permit failure with probability zeroin the limit, the stochasticcontext-freegrammars can be successfully identified from stochastic presentations. Thus, complexity-boundedor stochastic presentations can be used to overcome some of the limitations of positive presentations. Angluin (18-20) has shown that there are several interesting nontraditional classesof formal languagesthat are identifiable in the limit from positive data. The limitations of positive data discoveredby Gold may partly reflect the inappropriateness of the classesof the Chomsky hierarchy (9) for studying inductive inference. Another issue in the identification of languages is how to interpret the guessesof the inductive inference method. They might be interpreted as grammars (enumerators) or decision procedures.For context-free languages the inference method might output glammars or parsers. In the caseof context-free languages there is an effective and efficient translation between grammars and parsers, but for more general classesof languages there is no effective way to convert grammars (enumerators) into decision procedures,even when they exist. It has been shown that even for the case of identifying classescontaining recursive languages (for which decisionproceduresexist), strictly more classesof languagescan be identified when enumerators are allowed as guessesthan if only decision proceduresare allowed (12,2L). Searchand lts Variants A fundamental method of inductive inference is to search (qv) in some systematic way through the spaceof possiblerules to find the first rule that is consistent with aII the examples seen so far and to make that the current guess. In order for this method to be computable,there must be a program to enumerate all possible rules and an algorithm for determining whether a given rule is consistentwith a finite set of examples. For correct identification in the limit every incorrect rule must be inconsistent with some finite initial segment of every legal presentation of examples of the correct rule. Search is applicable to the problem of inferring the contextfree languages over somefixed alphabet from complete presentations. It is not difficult to think of a program that will list a context-free grammar for every context-free language over some fixed alphabet. To determine whether a context-free grammar is consistent with someinitial segmentof a complete pt"r"ntation, just check that it generates each of the positive L*"tnples and none of the negative examples.(There are relatively efficient algorithms for determining whether a given string is generated by a given context-free grammar.) It is clear that if the current guess is a correct grammar for the language, it will never be found to be inconsistent. Also, if the current guess is incorrect, either it generates some negative example or it fails to generate some positive example in any .o*pl,*te presentation, so it is eventualty discardedand never Iater reappears. Since a correct grammar occurs at some first Iocation in the enumeration of grammars, and eventually all the preceding incorrect grammars are discarded, the search does correct identification in the limit of all the context-free languages over the given alphabet.
Search also gives us an alternative method of identifying the one-variable polynomials with integer coefficients from their values on 0, L,2, . . , the problem describedin our first example. There are a finite number of such polynomials of degree d whose coeffrcientsare all bounded in absolute value by d, and it is easy to list them, increment d, and continue. To test whether a given polynomial is consistent with an initial segmentof values,just calculate its values on 0, I, 2, . . , n. However,this searchwill in general take exponentially longer than the interpolation method first described since the number of polynomials preceding the first polynomial of degreed grows exponentially rn d. Exponential growth of the rule space as a function of the "size" of the rule being identified is quite typical and is a severe limitation on the practicality of methods based on search.A section below describessomeimprovements that can be made in the efficiency of methods based on search,but in generat the best that can be done is to make it "less exponential." One of the advantages of search is that it is a very general method and does not depend on much domain-specificknowledge. For example, it is obvious how to modify the search method describedaboveto allow for functions that are sums of polynomials and exponentials, but it is not at all clear how to modify the more efficient interpolation method for this case. Another advantage of search is that if the hypothesis spaceis searchedin increasing order of size or complexity, the method convergesto the smallest or least complex hypothesis consistent with the data. More Powerful Search.Applying the search method to the problem of identifying all the total computable functions from valuesf(0), f(1),f(2), . . . doesnot yield an effectivestratery. This is becausethere is no effective enumeration of all of the total computable functions. In particular, atrY program that Iists programs must either include programs for nontotal functions (i.e., prosams that do not halt on someinputs) or must fail to output any program for some total functions. Given an enumeration that is guaranteed to contain only programs for total functions, it would be possible to check whether each prog1am is consistent with any initial segment of examples, but some total computable functions would not be on the list. On the other hand, if an enumeration that contains programs that may not halt on some inputs is used, there is no effective way of checking whether such an arbitrary program is consistent with the data. One approximate solution is to set somekind of bound, for examp\e, n3, and, when checking whether a program is consistent with some finite initial segment of examples, to discard the program if it is found to run for more than nBsteps on any input oi length n. Thus, the consistencycondition is that the progra- g"nerate all the examples in the initial segrnentand do * "quickly" (i.e.,within the given bound).This consistency condition is effectively decidablefor any program, so the list of proglams. -proglams can be all syntactically legal is to set some a priori condition this of form general tn. bound, which is a total computable function h(r), and to consider programs that have execution complexity at most h(x) on all but finitely many inputs r. l"Execution complexity" might be a measure of running time, spacerequired, or any measure satisfying Blum's axioms for a complexity measure (22).1Any function for which there is such a program is called h-easy. A slight refinement is required to deal with the possibility that
INFERENCE 413 INDUCTIVE In the case of deterministic finite state machines used as language recognizers,there is a natural generalization operaof tion: Merge two states of M and propagate the consequences lanM' The . recognizet get finite state a new the merge to guage recogntzedby M ' is a superset of the language recognized by M, so this operation can be used in the "bottom-up" direction; that is, if the current hypothesis fails to generate some positive example, it may be replacedby a more general one by merging two of its states. Another domain in which useful relationshipsbetweensyntax and semantics exist is fi.rst-order logic. If A and B are atomic formulas, define A > B provided that there is a substitution o such that B _ lalx, S(r))) and B - P(a, f(g(x), g(a))), the substitution cr > in reflected relation is This syntactic A B. yl that shows I S@) the semantics:The universal closure of A implies the universal closureof B if and only if A = B, that is, A is more general than B. Consequently,if B is found to be too general, all the atomic formulas A = B may be discardedas well. Reynolds (24) has investigated this structure and shown that the atomic formulas (up to alphabetic variation) form a lattice under this ordering. Thus, for any set S of atomic formulas, there is a least upper bound, called the least comnlon generalization of S, and a greatest lower bound, called the greatestcomrnoninstanceof S. The usual unification (qv) algorithm computes the greatest common instance of two atomic formulas, and the antiunification algorithm discovered by Reynolds and independently by Plotkin (2il computes the least common generalization of two atomic formulas. Plotkin has extended the notion of a least common generaltzation. Reynolds has also shown that the covering relation of the lattice of atomic formulas has a simple, easily computable form. This implies that all the immediate instances(or generalizations) of an atomic formul a are easily computed. Shapiro (26-29) terms this relation a refinement and extends it to a most general refinement for first-order Horn clauses,or PROLOG statements. Shapiro's refinement consists of three operations: unifying two variables; substituting a most general term for a variable; and adding the negation of a most general atom to the clause. He shows that any Horn clause over a fixed language can be generated by an appropriate sequence of these operations starting from some most general atom. Moreover, if a clause C' is obtained from a clauseC by any of these operations,then C logically implies C'. Shapiro uses this refinement operator in his PROLOG program synthesis system to direct a search for correct candidate axioms from the most general toward the more specific. If a current clause is found to be false by comparison with the examples,its one-steprefinements becomenew candidate axioms. Shapiro has also used a variety of specializedrefinement operators to synthesize particular syntactic classesof PROMore EfficientSearch.There are somegeneral techniquesto LOG programs. reduce the high cost of search for inductive inference. Laird (30) has abstracted the notion of refinement and Consider the caseof identification of languages.When the shown that it applies in a variety of domains. He describes current hypothesis incorrectly generatessomenegative exam- very general algorithms applicable to domains that possess ple, it is safe to ignore any hypothesis that generatesa super- (upward or downward) refinements and shows that they conset of the language of the current hypothesis, and, dually, if verge to a correct hypothesis. One of his examples is a most the current hypothesis incorrectly fails to generate someposi- general refinement for general (not just Horn) clauses,making tive example, it is safe to ignore all hypothesesthat represent Shapiro's approach applicable in a wider domain. subsets of the current language. To make use of this, there Mitchell (31) has given a different abstract treatment of must be useful relationships between the syntax of hypotheses certain methods of using generalization and specialization opand the languages they generate. erators. He assumesa general-to-specificordering on the space
there may be finitely many exceptions to the complexity bound, but essentially the method describedabove,with h(x) instead of n3, successfullyidentifies in the limit all the h-easy functions. It has been shown that any searchmethod that uses an effectively enumerable list of programs for total computable functions identifies a subset of h-easyfunctions for some h, so this method is as powerful as any based on enumerating programs for total functions. One way to make a search method more powerful is to permit the execution complexity to depend not only on the input (as in the h-easycase)but also on the output. In general, a complexity bound that dependson the suzeof the output is not very useful in estimating the resource requirements of a program if only the input is known. However, when checking whether a program is consistent with an initial segmentf(0), . , f (n), not only is the input r known but also the f(l), correct value of the output f(x).If h(*,y) is a total computable function of two arguments, a function f(x) is called h-honestrf there exists a program to compute f@) such that for all but finitely many values of x, the execution complexity of the program on input r is at most h(x, f @)).A modification of the h' easy method that discards a program if it runs for more than h(x,y) stepswhen checking consistencywith the example(r, y) successfullyidentifies in the limit all the h-honest functions. Since there are classesof functions that are h-honest but not g-easyfor any g, these methods are strictly more powerful than the h-easymethods. In fact, Blum and Blum Qil have shown that any class of functions identified by an inference method that is reliable on all the partial computable functions is contained in an h-honest class of functions for some h. (Reliable methods are discussedbelow in Other Criteria of Identification.) Thus, h-honest methods are as powerful as any method that is reliable in this sense.However, the h-honest methods do not identify all the sets of functions that are identifiable; that is, there are sets of functions in EX that are not identified by any h-honest method. A still more powerful classof methodsdevisedby Blum and Blum is based not on an a priori estimate of the execution complexity of the programs under consideration but on an a posteriori comparison with competing hypotheses. Roughly, the idea is to discard a program while trying to check its consistency with the data if there is some other program that seemsto be getting the correct answers "much more quickly" and for "many more values." "Much more quickly" and "many more values" are quantified in a specificway for each method of this class.Blum and Blum have shown that these methods are as powerful as any that are reliable on all the total computable functions. However, even these methods do not get all identifiable sets of functions; there are sets of functions in EX that are not identified by any method in this class.
414
I N D U C T I V EI N F E R E N C E
ample pattern. Shinohara (4L,42) has given efficient methods for inferring other classes of pattern languages and has applied them to inferring patterns in a data entry system. A number of researchershave investigated the structure of LISP prograffis, data, and computations. They have found a rich source of information to use in the task of identifying LNP programs from samplesof their I/O behavior. Smith (43) gives an excellent survey of this work. Summ ers (44,45)describesa general approachto synthesizing LISP programs basedon a recursive schemeand a patternmatching procedure. For each pair consisting of an input list and an output list, the output is expressedas a compositionof Direct Methods the base functions cor, cdr, and cons and the input Iist in a unique way. (Certain restrictions are placed on the programs more polynomial interpolation, of case Sometimes,as in the to ensure uniqueness.)A matching procedure is invoked to direct and efficient methods than search have been found that yield good hypotheses from given data or a query-oracle for find a recunence relation among the expressionsfrom the difspecificdomains. In somecasesthe methods are simple enough ferent pairs. The relation is then synthesizedinto a recursive to analyze precisely; in many casesthey are heuristics whose program using the given scheme.Summers has shown that if a correct recurrence is found, I correct program is synthesized. performance is difficult to analyze. Biermann and Feldman (32) describe a heuristic method The implementation of his matching procedureused a heuristhey term k-tails for identifying nondeterministic finite state tic. Subsequent work by Guiho, Jouannaud, Kodratoff, and transducers from I/O behavior. The method has a user-con- others (46-.4U has generalized and refined this approach, trolled parameter k and builds a nondeterministic acceptorby characterizing the class of programs that are correctly synthemerging all points in the sample strings that exhibit the same sized and the IIO examples that are sufficient and proposing (observable)I/O behavior on strings of length k or less. The more powerful and more efficient matching algorithms. user's control of k allows some tuning of the method. It is shown that lf k is large enough, this method correctly con- Complexity Questions verges in the limit. Brayer and Fu (33) and Levine (34,35) Some identification methods are exhaustive searchesand run have generalized the method of /a-tailsto the problem of inferin time that grows exponentially with the "size" of the hypothring tree grammars" Miclet (36) proposes another heuristic esis to be identified, and others are more efficient, direct, simiof basedon state mergirg, with a more flexible criterion methods that run in polynomial time. There are some partial larity than ft-tails. Angluin (20) has also used the gener&Iizationstructure for results from the theory of computational complexity that shed finite autom ata, describedabove, to give a method that prov- tight on which problems should not be expectedto have efficient solutions. ably finds the smallest k-reuersibleregular language that inIn the case of finite-state machines used as recognizersof method The strings. positive example cludes a given set of Gold (49) has shown that the problem of finding a languag€s, linear nearly and k fixe d for every polynomial time runs in finite-state machine with a minimum number of states comtimewhen&:0. NP-hard. Crespi-Reghizzi and co-workers (37-40) have found a class patible with given positive and negative examplesis polynoin runs that is known algorithm no that implies This of algorithms for inferring context-free grammars from brackthat a implies it problem. Moreover, this to solve parse mial time to equivalent are eted samples.That is, the examples to be used problem could this for algorithm polynomial time all in which language, unknown the from trees for sentences of large collection for a algorithms polynomial time construct the labels of internal nodes have been erased. Equivalently, a finding for example, problems, optimtzatton difficult grammatiother the example sentencesare phrase marked, but no graph. weighted in a Tour Salesman Traveling minimum they The algorithms phrases. cal categoriesare assignedto the Thus, it seemsunlikely that even a heavily modified search consider assign internal labels (grammatical categories) acin polynomial time on aII cording to the "context" of the node in the parse tree and use method for this problem wiII run find a machine with the guaranteed to is it as long as the labels to construct grammatical productions. Depending cases the best one Realistically, of states. number possible minimum speciare on how the "context" is defined,different algorithms good "on methods or approximations is efficient for hope may Crespi-Reghizzi by investigated is fied. This general approach extended been have results Gold's and Mandrioli (39), who term it abstract profi'les. Particular average" for this problem. (50) to show that the problem remains NP-hard instances of this class of algorithms yield efficient polynomial by Angluin set is quite dense and that the related example the if even grammars and time methods for the free operator precedence regular expression of minimum length a finding problem of find methods These grammars. the free homogeneousk-profile positive and negative data is also NPgiven a grammar for the smallest free operator precedence(respec- .o*p.tible with hard. tively, free homogeneousk-profile) language that generatesall These results indicate that finding a hypothesis of miniof the given samPles. mum size compatible with given data is computationally diffia infers efficiently that (18) method given a has Angluin domains. In the case of general smallest one-uariablepattern langudge that includes a given cult even for very restricted difficult: Any algorithm is extremely problem the programs positive sample. A one-uariable pattern is a pattern like to be of minimum guaranteed all programs of set lists a patthat the from obtainable J4xxgxL2, which generatesall strings compute can only list a finite numthey functions for the size The r everywhere. for string nonnull a of tern by substitution programs. tChaitin (51) gives a quantitative treatstrings gALLgLLz,347767769776L2are generated by the ex- ber of such
of all possiblehypotheseswith certain properties and describes an algorithm that maintains two sets of hypotheses:one set of most general hypotheses that have not been contradicted by any negative example and another set of most specifichypothesesthat generate all the positive examples seen so far. The true hypothesis lies somewhere between these two sets; the two sets can be used to give partial information about the true hypothesis and also to chooseinformative examples for test. Mitchell gives conditionsunder which his algorithm will eventually converge to the correct hypothesis.
I N D U C T I V EI N F E R E N C E
ment of the relation between the size of a program and how many such programs it can list.l A somewhat different approach to quantifying the computational complexity of inductive inference problems has been taken by Daley and Smith 62). They have defined axioms for complexity measures for inductive inference machines in the spirit of Blum's treatment of the computational complexity of functions Q2). An inference complexity measure prescribes a functional for each inductive inference machine that can be used to determine in the limit the complexity of the machine on a given sequenceof inputs. The axioms cover existing measures, such as the number of mind changesmade by a machine enroute to a successfulidentification, and also the number of steps until correct convergence.From the axioms Daley and Smith are able to prove the existenceof sets of functions which are arbitrarily difficult to infer and sets for which there is no most efficient inference machine. Other Criteria of ldentification Identification in the limit is only one interpretation of the idea of successfulidentification. A number of restrictions and extensions of the basic criteria EX and BC have been studied, and someof them are describedin the first part of this section. The secondpart of this section coverssomeof the definitions of nonlimiting criteria of identification. Variationsof ldentificationin the Limit. A number of modifications of the basic criteria, EX and BC, of successfulidentification have been studied. The restriction that all the guesses be programs for total functions is termed Popperian and is discussedabove.Popperian inference has been studied by Case and Ngo-Manguelle (4). In addition to further restrictions on the behavior of machines, somemodifications take the form of relaxations of restrictions. Generally, both types of modifications are consideredin order to more accurately model particular situations or to render certain technical problems more tractable. Reliability is an additional restriction on EX-identification that requires that whenever an inductive inference method converges,it does so coruectly. Thus, failure of identification will be signaled (in the limit) by an infinite number of changes of hypothesis. [Reliability has been studied by Minicozzi, who used the term strong identification (53).1Different types of reliability may be distinguished, depending on what class of inputs, the total recursive functions or the partial recursive functions (the requirement applies to Ref. 23.) Methods based on search are generally reliable, at least over the total recursive functions. Reliability makes it possible to combine two inference methods (into one that identifies the union of the classes they identify) bV switching back and forth between them when there is a change of hypothesis. Although reliability makes certain problems easier, there are classesof functions that are EX-identifiable, but not reliably so. The class of sets of functions reliably identifiable is closed under union; this is not true of the classesEX or BC. A set of functions is said to be EX-identified by a finite team of machines Mu Mz, . . . , M n provided that every function in the set is EX-identified by at least one of the machinesM;. Smith (54) has shown that there are sets of functions EX-identifiable by a team of k + 1 machines, but not by any team of k machines, even with respect to BC-identification. This particularly strong "critical mass" phenomenon indicates that diversity of
415
approach,more than any other factor, can enhancethe likelihood of a successfulinference. Another way to relax the definition of identification is to consider probabilistic methods, that is, methods that have accessto a random-number generator and may identify a given function for some but not all the possible random sequences. Such an inductive inference method EX-identifies a set of functions with probability p if, for every function in the set, the measure of random sequencesthat causethe machine to do a successfulEX-identification is at least p. Pitt (55,56) has shown that any set of functions EX-identifiable by a team of & machines is EX-identifiable with probability Llk, that any set of functions EX-identifiable with probability exceeding ll(k + 1) is EX-identifiable by a team of k machines; and analogously for BC-identification. In particular, a set of functions identifiable with probability exceeding t is identifiable with certainty. Thus, the notions of teams and probability coincide. There is an additional notion of uncertainty, termed frequency identification, that Pitt has also shown to coincide with teams and probabilistic identification. A different type of relaxation of restrictions is to allow "bugs" in the fi.nal guess,modeling the fact that complex programs and complex scientific hypotheses alike are almost certainly never "bug free." Case and Smith (5) have defined anorrlalies,that is, argument values where the final guessand the correct function disag€€, and proved that for all k there are sets of functions EX-identifiable allowing k + 1 anomalies that are not EX-identifiable allowing only ft anornalies, and analogously for BC-identification. Bc-identification with any finite number of anomaliesin the final guessis a relaxed enough identification criterion to admit the existence of an inductive inference method to identify all the total recursive functions. The inference of programs that are correct a certain percentage of the time (perhaps with infinitely many anomalies) has been investigated (57,58). For EX-identification the number of times an inductive inference method changes its hypothesis before converging is one measure of the complexity of the inference process.One may fix a bound of k and consider inductive inference machines that make no more than ft changesof hypothesis.In the casek - 0, the machine makes at most one guess;this is also termed fi,nite identification. For every k there is a classof functions EX-identifiable with k + 1 changesof hypothesis that is not EX-identifiable with k changesof hypothesis by any machine. Trade-offs have been consideredbetween numbers of machines in a team, changes of hypothesis, probability, and anomalies. Smith (54) has shown an exact trade-off between the number of anomalies and the number of team membersfor EX-identification. Daley (59) has shown that the same tradeoff also holds for BC-identification. Wiehagen, Freivalds, and Kinber (60) have investigated the relationship between the probability of identification and the number of changesof hypothesisfor EX-identification by probabilistic methods. Another modification of the criterion of EX-identification is to consider that a machine has successfullyidentified a function only when it convergesto a "smallest" program for the function. The sets of functions identifiable under this criterion are dependent on the particular system chosenfor expressing prograffis, unlike the other general results describedherein. However, Chen (6L,62)has shown that by interpreting "smallest" to be "no more than a recursive factor larger than the smallest program," the class is invariant under change of pro-
416
I N D U C T I V EI N F E R E N C E
gramming system. Chen has shown that not all the sets of functions in EX are identifiable if a nearly minimal program is required unless a finite but unboundednumber of anomaliesis also permitted.
have been consideredby Maryanski and Booth (68) and Gaines (69,70). Feldman (71) and Feldman and Shields (72) have investigated an axiomatic approach to combinations of grammar or program complexity and derivational or computational complexity. The measure and algorithm of Horning, described ldentification,Not Necessarilyin the Limit. Identification in above, are an instance of the general theory. the limit places no constraint on the rate of convergenceor on (73) has proposed another criterion of successful Valiant the quality of the intermediate hypotheses.Attempts to meaidentification for stochasticlanguag€s,which might be termed quality sure the of hypotheseswith respectto a finite amount of data generally focus on the size, simplicity, or probability of probably approximately correct identification. The idea is that there is a parameter n related to the size of the unknown the hypothesis and how well it fits or explains the data. a As a specificexample, consider the identification of regular hypothesis, and after sampling the unknown hypothesis (in n) number of times, the identification algopolynomial languages. If hypotheses are finite state acceptors,one meathat with "high" probasure of the sizeof a hypothesis might be the number of states of rithm should conjecture a hypothesis hypothesis. "High" true the from "too not different" is bility the acceptor. One strategy would be to ask for a smallest hythe conjectured hybetween probability "difference" the and pothesis compatible with given data. When the data are posiquantified using n.Yaliant gives pothesis one are the true and tive only, a smallest acceptor will be the one-statemachine that succeedin this sensefor three that acceptsevery string, so the criterion is trivial in this case. identification algorithms using different information propositional formulas of classes Since the regular sets cannot be identified in the limit from oracles. positive data, it is not surprising that this criterion is trivial. When the data include both positive and negative striilgs, the smallest compatible acceptor is a nontrivial answer (and Two Applications would lead to identification in the limit), but the difficulty of Shinoh ara @2) has applied his algorithms for the efficient finding it is NP-hard (49). identification of the regular pattern languages to detecting An alternative strat egy would be to ask for a "good fit" to patterns in a data entry system. For example, if in entering a the data, that is, the acceptor that generates a minimal set bibliography there is a fixed pattern to the keywords AUcontaining the positive part of the sample. This will always be THOR, TITLE, and so on, then his system can detect the patan acceptor for exactly the finite set of strings in the positive generate the fixed portions of the patpart of the sample, another trivial answer. However, restrict- tern and automatically user to supply the variable portions. prompting the tern, ing the domain to the k-reversibleregular sets yields a nontri(74,75) and implemented an editing-bydesigned has Nix vial criterion that leads to correct identification in the limit. in a general-purposescreeneditor. It is based faeility ucarnple Moreover, there is a polynomial time algorithm to find a hyon detecting regularities in a sequenceof I/O examples of a pothesis satisfying this criterion (20). transformation. The user confronted with the task of text One method of combining the notions of a "good" hypothesis makin g a sequenceof similar transformations to a series of and a "good" fit to the data is to use a Bayesian (qv) analysis. text items may invoke the editing by example system and Ccnsider the context-free languages.A stochasticcontext-free or more of the desired transformations. The system grammar has probabilities associatedwith alternative ways to make two synthesizea program to make the intended transto attempts rewrite each nonterminal and defines a probability distribuformations. The user may then run this program' optionally tion on the language that it generates.Given a stochasticconunder the user's close control, or may alter the progfam ditext-free grammar G and a finite sequenceof strings S, it is rectly or by giving additional examples of the desired transforpossible to calculate the probability that G would generate S mations. Nix gives a careful analysis of the class of transforin a sequenceof independenttrials, representedby Pr(S I G). If mations (which he terms gap programs) that may be there is also a probability distribution defined on the spaceof synthesized and algorithms and heuristics for synthesizing all stochasticcontext-free grammars G, denotedPr(G), then it them. is natural to ask for the "most probable" grammar given a Other systems that could be viewed as applications of the finite sample S. That is, ask for a gramm ar G that maximizes of inductive inference, for example, META-DENtechniques Pr(G I S),which, by Bayes'stheorem is equivalent to finding G (76), are treated fully in other entries. DRAL to maximize Pr(G)Pr(S I G). Horning (17) has formalized this setting, given a particular kind of probability distribution on grammars, and proved that a search algorithm to maximize BIBLIOGRAPHY Pr(G)Pr(S I G) convergesin the limit with probability L to the correct grammar, assuming that the strings in the sample are 1. E. M. Gold,"Languageidentificationin the limit," Inform.Contr. generated by independent trials. lo, 447-474 (1967). kind (63-65) of this use of the advocated has Solomonoff 2. H. Putnam,Probabitityand Confirmation,CambridgeUniversity NY, L975. Bayesian approach, in which the a priori probabilities of the Press,New Rochelle, g. D. Angluin and C. Smith, "Inductive inference:Theory and methhypothesesare based on the theory of program size complexity. Cook, Rosenfeld, and Aronson (66) have investigated a ods," Cornput.Suru. L5,237-269 (1983). contextstochastic of inference to the approach hill-climbing 4. R. Klette and R. Wiehagen, "Researchin the theory of inductive free grammars, in a setting similar to Horning's. The hilt inference by GDR mathematicians: A survey," Inform. sci. 22, L49-169 (1980). climbing avoids the exhaustive search of Horning's approach, Mude Van der analyze. to but its effectivenessis very difficult b. J. Case and C. Smith, "Comparison of identification criteria for for (67) approach Bayesian the considered machine inductive inference," Theor. Comp. Sci. 25, t93-220 have walker and (1983). inferring stochasticregular grammars. Other mixed measures
I N D U C T I V EI N F E R E N C E 6. K. Popper,The Logic of ScientificDiscouery,HarperTorch Books, New York, 1968. 7. J. Caseand S. Ngo-Manguelle, Refinementsof Inductive Inference by Popperian Machines, Technical Report, SUNY at Buffalo, Department of Computer Science,1979. 8. J. M. Barzdin, "Two theorems on the limiting synthesis of functions," Latuii Gosudarst.Uniu. ucenyeZapiski 21O,82-88 $974) (in Russian). 9. M. Harrison, Introduction to Formal Language Theory, AddisonWesley, Reading, MA, 1978. 10. N. Chomsky, "On certain formal properties of grammars," Inform. Contr. 2, I37 -167 (1959). 11. N. Chomsky,"Three modelsfor the descriptionof languages,"IRE Trans. Inform. Theor. 2, I13-L24 (1956). L2. J. Case and C. Lynes, Inductive Inference and Language Identification, Proceedings of the ICALP 82, Springer-Verlag, Berlin, p p . 1 0 7 - 1 1 5 .L 9 8 2 . 13. D. N. Oshersonand S. Weinstein, "Criteria of languagelearnirg," Inform. Contr. 52, L23-138 (1982). t4. K. Wexler and P. Culicover, Formal Principles of Language Acquisition, MIT Press,Cambridge,MA, 1980. 15. D. Osherson,M. Stob, and S. Weinstein, Systemsthat Learn, MIT Press,Cambridge, MA, 1986. 16. M. Machtey and P. Young, An Introduction to the General Theory of Algorithms, North-Holland, Amsterdam, 1978. L7. J. J. Horning, A Study of Grammatical Inference,Ph.D. Thesis, Stanford University, Computer ScienceDepartment, 1969. 18. D. Angluin, "Finding patterns common to a set of strings," J. Compuf. Sys. Scl. 2L, 46-62 (1980). 19. D. Angluin, "Inductive inference of formal languages from positive data," Inform. Contr. 45, IL7-135 (1980). 20. D. Angluin, "Inference of reversible languages,"JACM 29,741765 (1982). 21. R. Wiehagen, "Identification of formal languages," Lect. Not. Comput. Sci.53, 571-579 $977). 22. M. Blum, "A machine-independent theory of the complexity of recursive functions," JACM 14, 322-336 (1967). 23. L. Blum and M. Blum, "Toward a mathematical theory of inductive inference,"Inform, Contr. 28, 125-155 (1975). 24. J. C. Reynolds, "Transformational systems and the algebraic structure of atomic formulas," Machine Intell.5, 135-151 (1970). 25. G. D. Plotkin , "A note on inductive generalrzation,"Machine Int e l l . 5 , 1 5 3 - 1 6 3( 1 9 7 U . 26. E. Shapiro, Algorithmic Program Debugging, MIT Press, Cambridge, MA, 1983. 27. E. Shapiro, Algorithmic Program Diagnosis, Proceedingsof the Ninth ACM Symposium on Principles of Programming Languageq Albuquerque, NM, pp. 299-308, 1982. 28. E. Shapiro, A General Incremental Algorithm that Infers Theories from Facts, Proceedings of the Seuenth IJCAI, Vancouver, 8.C., pp. 446-451, 1981. 29. E. Shapiro, Inductive Inference of Theories from Facts, Technical Report, Yale University Computer ScienceDepartment TR L92, 1981. 30. P. D. Laird, Inductive Inference by Refinement,Technical Report, Yale University Computer ScienceDepartment RR-376,1985. 31. T. M. Mitchell, "Generalization as search,"Art. Int. 18,203-226 (L982). 32. A. W. Biermann and J. A. Feldman, "On the synthesisof finitestate machines from samples of their behavior," IEEE Trans. Comput. C-21, 592-597 (L972). 33. J. M. Brayer and K. S. Fu, "A note on the k-tail method of tree grammar inference," IEEE Trans. Sys. Man Cybernet. SMC-?, 2e3-300 0977).
417
34. B. Levine, "Derivatives of tree setswith applicationsto grammatical inference," IEEE Trans. Patt. Anal. Machine InteIL PAMI-3' 285-293 (1981). 35. B. Levio€, "The use of tree derivatives and a sample support parameter for inferring tree systems,"IEEE Trans. Patt. Anal. Machine Intell. PAMI- 4, 25-34 (1982).
36. L. Miclet, "Regular inference with a tail-clustering method," IEEE Trans. Sys. Man Cybernet.SMC-10, 737-743 (1980).
3 7 . S. Crespi-Reghizzi,"An effective model for grammar inference," Inf. Proc. 7t,524-529 Q972). 38. S. Crespi-Reghrzzi,G. Guida, and D. Mandrioli, "Noncounting context-freelanguages,"JACM 25, 571-580 (1978). 39. S. Crespi-Reghizzi and D. Mandrioli, Abstract Profiles for Context-Free Languages, Technical Report, Istituto di Elettrotechnica ed Elettronica del Politechnicodi Milano, Report No. 80-6, Milano, Italy, 1980. 40. S. Crespi-Reghizzi and D. Mandrioli, Inferring Grammars by Means of Profiles: A Unifying View, Internal Report, Istituto di Elettrotechnicaed Elettronica del Politechnicodi Milano, Milano, Italy, 1980. 4L. T. Shinohara, Polynomial Time Inference of Extended Regular Pattern Languag€s, Proceedings,RIMS Symposium on Software Scienceand Engineering, Kyoto, 1982;Lecture notesin Camputer Science41, L15-I27 (1983). 42. T. Shinohara, Polynomial Time Inference of Pattern Languages and its Applications, Proceedingsof the SeuenthIBM Symposium on Mathematical Foundations of Computer Science,Hakone, Jap&tr,pp. 191-209,1982. 43. D. R. Smith, A Survey of the Synthesis of LISP Programs from Examples,in A. W. Biermann, G. Guiho, and Y. Kodratoff, (eds.), Automatic Program Construction Techniqttes,MacMillan, New York, t982. 44. P. D. Summers, "A methodology for LISP program construction from exampl€s,"JACM 24, 16I-175 (1977). 45. P. D. Summers, Program Construction from Examples,Ph.D. Thesis, Yale University Computer ScienceDepartment, L976. 46. J. P. Jouannaud and G. Guiho, "fnference of functions with an interactive system,"Machine Intell. 9, 227-250 (1979). 47. J. P. Jouannaud and Y. Kodratoff, "An automatic construction of LISP programs by transformations of functions synthesizedfrom their input-output behavior, Int. J. Pol. Anal. Inform. Sys. 4, 3 3 1 - 3 5 8( 1 9 8 0 ) . 48. J. P. Jouannaud and Y. Kodratoff, "Charactertzation of a class of functions synthesizedby a Summers-like method using a B.M.W. matching technique," Proceedings of the Sixth IJCAI, Tokyo, Jap&D,pp. 440-447, 1979. 49. E. M. Gold, "Complexity of automaton identification from given data," Inform. Contr. 37, 302-320 (1978). 50. D. Angluin, "On the complexity of minimum inference of regular sets,"Inform. Contr. 39, 337-350 (1978). 51. G. J. Chaitin, "Information-theoretic limitations of formal systems," JACM 2L, 403-424 (1974). 52. R. Daley and C. Smith, "On the complexity of inductive inference,"Inform. Contr. 69, L2-40 (1986). 53. E. Minicozzr) "Some natural properties of strong identification in inductive inference," Theor. Comput. Sci. 2,345-360 (1976). 54. C. H. Smith, "The power of pluralism for automatic program synthesis," JACM 29, IL44-t165 (1982). 55. L. Pitt, A Characterization of Probabilistic Inference,Proceedings of the 25th Annual IEEE Symposium on Foundations af Computer Science,IEEE, New York, pp. 485-494, 1984. 56. L. Pitt, Probabilistic Inductive Inference,Ph.D. Thesis, Yale University, Computer ScienceDepartment, 1985. 57. J. Royer, A Note on Asymptotic Explanatory Inductive Inference, University of Chic?go, 1984.
418
INFERENCE
reasoning (qv). Programs that perform in this way are often called inference engines and range from interactive, with the user guiding the generation of inferred statements, to fully autonomous. There are various kinds of inference depending on the methodology used and the requirements of the application. The most common types of inference methodologiesin Ar ., An such are logical inferences, to conclude B from Au that any interpretation that makes all of the A; true also makes B true. Examples are simple syllogish, modus ponens' and substitution for universally quantified variables (see Logic, propositional and Logic, predicate). An important subclass of logical inference is based on the resolution (qv) rule. 63. R. J. Solomonoff,"Complexity-basedinduction systems:Compari- Resolution works on disjunctions, called clauses,of simple forsons and convergencetheorems,"IEEE Trans.Inform. Theor. IT' mulas, called literals. In its simple Boolean form the rule aI24, 422-432 (1978). lows a program to conclude C V D from the two disjunctions -p V C and p V D. (In implicative form this is equivalent to 64. R. J. Solomonoff,"A formal theory of inductive inference," Inform. Contr. 7, L-22, 224-254 (1964). concluding -D + C from -D + p and p + C.) In the first-order 65. R. J. Solomonoff,"Inductive Inference Theory: A unified Approach logic case the program first renames the variables in the two to Problems in Pattern Recognition and Artificial Intelligence, clauses and then finds a most general substitution that proProceedings of the Fourth IJCAI, Tbilisi, Georgia, pp. 274-280, ducesa pair of opposite-signedliterals. For example,P(a. tr) V r975. -P(y, b) V R(y) lead to Q(a) V R(a) (seeResolution, Q(r) and 66. C. M. Cook, A. Rosenfeld,and A. R. Aronson, "Grammatical inferbinary for a more complete treatment of resolution). A similar enceby hill-climbing," Inform. Sci. 10, 59-80 (1976). inference rule, paramodulation, exists for reasoning about 67. A. Van der Mude and A. Walker, "On the inference of stochastic equality. As with resolution, two clausesare used. One clause -regular grammars," Inform. Contr. 38, 310-329 (1978). must contain a literal ofthe form s t,say s - f V C. A term in 68. F. J. Maryanski and T. L. Booth, "Inference of finite-state probabi- the other clause, say s' in D(s'), is matched with s after renamlistic grammars," IEEE Trans. Comput. C'26, 52L-536 Q977). ing of variables, and the correspondinginstance of C V D (t) is 69. B. R. Gaines, "Behavior/structure transformations under uncer- inferred (seeTheorem proving for a more completedescription tainty," Int. J. Man-Machine stud. 8, 337-365 (1976). of paramodulation). Other inference rules have been proposed 70. B. R. Gaines,"Maryanski's grammatical inferencer,"IEEE Trans. for special applications like set theory and higher-order logic Comput. C-28, 62-64 (1978). as well as other general-purpose methodologies like matrix A. Feldman, "Some decidability results in grammatical inferJ. 7L. reduction. (1972).
58. C. Smith and M. Velauthapillai, On the Inference of Approximate Programs, Technical Report, University of Maryland, TR L4'27, 1985. 59. R. Daley, "On the error correcting power of pluralism in inductive inferenca," Theor. Comput. Sci. 24r 95-104 (1983). 60. R. Wiehagen, R. Freivalds, and E. B. Kinber, "On the power of probabilistic strategies in inductive inferenee," Theor. Comput. Sci. 28, 111-133 (1984). 61. K. J. Chen, Tradeoffs in Machine Inductive Inference, Technical Report, SUNY at Buffalo, Department of Computer Science,No. 178,1981. 62. K. J. Chen, "Tradeoffs in the inductive inference of nearly minimal size programs,"Inform. Contr. 52,68-86 (1982).
ence," Inforrn. Contr. 2Or244-262 72. J. A. Feldman and P. Shields, "Total complexity and the inference
s,"Math. sys.Theor.ro, rer-rsi-ir sill'of bestprogram
73. L. G. Valiant, "A theory of the learnable," CACM 27, LL34-LI42 (1e84). 74. R. Nix, Editing by ExampLe,Proceedingsof the Eleuenth ACM Symposium on Principles of Programming Languages, ACM, New York, pp. 186-195, 1984. 75. R. Nix, Editing by Example, Ph.D. Thesis,Yale University Computer ScienceDePartment, 1983. 76. B. G. Buchanan and E. A. Feigenbaum, "Dendral and meta-dendral: Their applications dimension," Artif. Intell. tL, 5-24 (1978). D. ANct utlt Yale UniversitY C. H. Str,tttH University of MaryIand This work was partially funded by the National ScienceFoundation under grants numbered MCS -8404226and MCS-8301536and by the National Security Agency under grant number MDA-904-85-H-0002.
. I
InferenceSystems In addition to a set of inference rules, inference systemsbased on logic require a control system or set of strategies and heuristics (qt) to direct the choice of formulas to use for inferring new information. In most problems the number of possible inferences is vastly larger than the number of inferences actually used in the problem's solution. LogicalInference. Logical inference systemshave beenused as the reasoning component for a variety of applications including proving theoreffis, proving the correctness of programs, generating prograils, designing electronic circuits, and many others. Resolution forms a basis for logic programming (qv), of which PROLOG is an example. Most expert systems and production systems use a type of inference that is very much like (and in some casesidentical to) resolution (see Logic programming, Theorem proving, Expert systems, and Production systems for further examples and discussion).
NonclassicalLogicalInference.Although the majority of inferencesystemsused in AI are basedon classical logical inferences as above, these may not be adequate for many situations. Many researchers feel that classical logical inference, INDUSTRIAL AUTOMATION. See Automation, industrial. for example, does not addressthe issue of cause and effect in -p + g is true because any reasonable way. For example p & the hypothesis is always false and not because q necessarily INFERENCE has anything to do with p. Various different logics, each with programs its own specialrules of inference,have been proposed(see,€.9., Inference in AI refers to various processesby which supposifacts and from Logic, modal). In addition, some more ad hoc systems have as opposedto people draw conclusions of general activity been proposed(see,€.g., Reasoning,causal). more the of subactivity a tions. It is usually
INFORMATIONRETRIEVAL
Negative lnformation. An important recent type of inference rule is based on the handling of negative information. Using such a rule, a program can infer a negative statement if that negative statement is normally true and there is no evidenceto indicate an unusual situation. An example is "In the desert one can assume (i.e., ean infer) that it is not raining unless there is someevidenceto the contrary." A special application of this kind of inference occurs in database settings where it can often be assumed that data not in the database are not related. For example,if the tuple (100,CS102)is not in the enrolled-in relation in a university database,then it is to be expected that student number 100 is not in the class. In deductive databases this is called the closed-world assumption (CWA). By making such inferences, a system can avoid storing the myriad of negative facts and rules, an important consideration in applications like commonsensereasoning and databaseswhere normally the volume of negative information is orders of magRitude larger than the positive information. Care must be taken when assuming negative information that contradictory results are not obtained. For example, a system might know p V q but not have enough evidence to concludep nor enough evidence to conclude q. However, assuming the negation of both, that is, both -p and -Q,leads to a contradiction (see Circumscription, Logic, nonmonotonic, and Reasonitg, default).
419
INFORMATIONRETRIEVAL Information retrieval (IR) has not generally attracted the attention of workers in AI. But AI techniques should contribute to IR, if only in the long run, and IR offers very worthwhile problems as well as some techniques to AI.
lR Systems
Information retrieval sometimes refers to information management in general; it is used here, conventionally, for the retrieval of documents, typically papers rather than books, through index descriptions referring either to the full documents or surrogates like abstracts (1-4). Index descriptions may in the limit be the documents or surrogates themselves or, more commonly, be extracted natural-language items for example keywords or phrases, or items from a controlled vocabulary like a thesaurus or list of subject headings. Indexing may be at file time or, retrospectively and perhaps more flexibly for the user, at search time; automation in particular allows query word or string matching against document or surrogate texts. Index term normalization ranges from word stemming to the replacement of given natural-language terms by preferred conceptlabels from a notionally autonomous(natural or artificial) indexing language. The function of indexing ExemplaryGeneralization. A quite different kind of infer- languag€s, which may have an explicit classificatory strucence is inductive inference in which a program attempts to ture, is to overcome lexical and conceptual variation in the abstract fronncxamples (seeInference, inductive). These infer- interests of descriptive consistency(5). The basic object of indexing is to indicate what a document ence mechanisms are often used in learning situations. For given that life is too short for full text or even surrois about program learn to to distinguish differmay attempt example, a ent shapes by being shown examples of the shapes and at- gatereading. Requestshave also to be indexed, and the matchtempting to induce abstract properties of the different types. ing problem of IR stems primarily from the difficulties of describing documents and needs for them. Document systems Probabilisticand/or Statisticallnference. Yet another kind of also operate over time; fields and users change and with them reasoning involves probabilistic inference and/or statistical both descriptions and the language of description. But the real inference. Examples of one such kind of inference are de- problem of IR is the scale problem: IR is about retrieving the scribed in the entry on Bayesian decisionmethods.Mathemat- few relevant documentsfrom the many nonrelevant ones (maical theories based on probability and statistics are used to jor services may hold millions of abstracts). Thus a valid deaccept or reject proposed hypothesesand to draw other kinds scription may still not be discriminating enough to promote of conclusions. These theories are quite well developed and precision, that is, the proportion of retrieved documentsthat computable in a straightforward numerical way. Thus, it is are relevant to the user's need. The feature of IR not characthe design of the hypothesesand the use to which the conclu- teristic of AI is that statistical phenomena begin to count: sions are put that has more to do with AI than the actual a term legitimately describing 1 document may also, without the user realizing it, describe 5000 others (2,3), most of method of reaching the conclusion. which are in fact irrelevant. Complex terms, that is, ones conIn addition to the articles in this work cited above, the reader sisting of several units forminga whole, or combinations of may consult the general referencesfor detailed studies of logi- terms are thus precision devices aimed at overcoming this problem. cal inference in AI and its applications. The inverse of selection is grouping. A valid description may not be a predictable one given the user's inevitably limGeneral References ited view of a large mass of documents.An index description C.-L. Chang and R. C.-T. Lee, Symbolic Logic and Mechanical Theo- assigns a document to a class of documents with the same rern Prouing, Academic Press, New York, L97L. description or to several classesdepending on the way relaD. Loveland, Automated Theorem Prouing, North Holland, Amstertions between its own description and those of other docudam, 1978. ments are exploited. Grouping is a recall device aimed at inL. Wos, R. Overbeek, E. Lusk, and J. Boyle, Automated Reasoning: creasing the chance of retrieving relevant documents,that is, Introduction and Applications, Prentice-Hall, Englewood Cliffs, at increasing the proportion of relevant documents that are NJ, 1984. retrieved. Grouping may be achieved explicitly through the use of common index terms for different texts or implicitly L. HnNScHEN through term associationsand, as with precision devices,may Northwestern University be emphasisedat document file time or at request searchtime. INFERENCE,GRAMMATICAI. See Inductive inference; Patter From a processing point of view, therefore, an IR system recognition; Semantics. can be thought of as involving several basic operations:docu-
I 420
INFORMATIONRETRIEVAL
ment analysis, request analysis, request and document comparison, extraction of matching items, evaluation of output, and, possibly, request reformulation with further comparison and extraction. The analysis operations in particular may depend on a prior process,the construction of an indexing langnage, and the individual processesmay vary greatly in complexity according to the retrieval stratery being adopted. Theseprocessesare essentially the same for manual and automatic systems:gains from automation comefrom the ability to handle large amounts of material and to carry out complex string manipulations, as in word fragment matching, or numerical calculations, &s in statistically based weighting. Unfortunately, although retrieval systemsaim at precision and recall, maximrzing both at once appearsto be an impossibility: extensive observation shows a trade-off. This seemsto be an ineluctable consequenceof the sloppinessof all the system's many components (requests, documents, descriptions, languag€s, user needs, user assessments,etc.). and hence of their interactions, especially as these are compoundedby the size of the document set. Each componentof a retrieval system has multiple aspectsand relations, implying the existenceof connectionsthat are difficult to identify and control in indexing and searching. There will be much about the system the user does not know. More specifically, which documents are relevant to his need is by definition unknown. Researchin IR over the last 25 years has established some techniques for improving both recall and precision against baseline strategies, and for controlling indexing and searching aimed at one or the other. For example, in relevancefeedback, query terms can be discriminatingly weighted for further searching according to their relative distribution among relevant and nonrelevant documentsin an initial evaluated document sample. However, it has proved difficult to demonstrate convincingly and with clear implications for large systems that any novel strategies, and particularly ones associated with automation, are especially effective. On the other hand, it also appearsthat there are no manifestly superior retrieval strategies and that simple strategies sensibly implemented are as good as more elaborate ones. Current practice reflects this situation. There are now many large document retrieval services(e.g.,Lockheed'sDIALOO, offering users rapid searchesof substantial holdings. They apply conventional preautomation indexing techniques' for exa*pi., use a thesaurus, or more modern but straightforward text-based ones, and formulate requests as Boolean expressions on terms. There is littte application of research-deiived operations like term weighting or coordination matchirg, which generatesa document ranking by numbers of term hits. These servicesare very valuab1e,for goodand obvious reasons, and have many satisfied users, although their performance on formal measures like recall may be poor. It is extremely difficutt to conduct proper IR evaluation experiments, especially on any scale,but observation suggests that a typical system might well achieve no more than 407o precision for 20vorecall. There appear to be the inherent limitations to IR systems, stemming from their averaging tendency: indexini and searching resources, whether derived from documents and requests or assignedthem, naturally reflect the typical rather than the eccentric. Classification hierarchies, keyword networks, term frequency lists, and samples of retrieved documentsmay be offeredthe user to help in specifying or respecifying his/her request; term linking or weight-
ing may also be applied automatically to emphasisethe infrequent rather than the frequent. But either way the mass of detail in the system has to be boiled down to recurrent patterns if it is to be managed. Thus in document retrieval there is always a tension between the user as an individual and the system as a collective. There appears to be more mileage in concentrating effort on the particular request than on the set of documents as a whole, but the effectivenessof the request characterization is inevitably limited by those of the documents. in lR Al Techniques Three aspects of indexing and retrieval are most obviously fields for AI: individual document and request indexitg, the characterization of collections as wholes, and support for the search process.The attempt to apply AI techniques may be justified either as automating human operations for which there are already automatic substitutes but ones that are or are believed to be not very satisfactory, for example, choosing index terms, or as automating human operations that are still Iargely outside the system, for example, expressinga need as a request. Thus one might seek to use AI natural-language processing techniques (see Natural-language interfaces; Natural-language understanding) to identify and select content-bearing text items either directly constituting index terms or mapping onto them. The underlying problem in automatic langUage processingfor indexing is the extent to which syntactic structure has to be indicated (or available) in the index description: the more complex the structure of the description, the less likely it is to match. Term coordination (i.e.,Booleanconjunction) seems to represent the right level of constraint for descriptions as wholes, when used for automatic searchitg, but individual terms themselves may have to be complex objects explicitly or implicitly relating elements. Such compound terms, which are essentially precision devices,are common in artificial, or controlled, indexing vocabularies and in manual keyword indexing. Their real value has never been clearly established,and it may be that the conjunction of simple terms forming a whole description achievesthe same effect. Further, it may be possible to obtain compound terms by statistical techniques, by the crudest string segmentation, or by using simple syntactic techniques. But such techniques may not provide rich enough structural descriptions of terms to support mapping, or text transformations generating equivalencesets of strings for text searching. Observation of human language use might suggest the requirement of better and more fully characterized linguistic units, implying the application of the more powerful language-processingtechniques involving semantics with which AI is concerned; and some beginnings have been made to use semantic techniques as a means of identifying well-founded term sources and hence motivated equivalenceclasses(6). The depth of text understanding required to identify terms and sources may not be very great, and the effort is much reduced if processingis confined to request texts. But vocabulary problems and those of handting compoundnouns, for instance, in processing ranging over an extensive subject field and technical literature, are very great and are outside the scopeof current processingtechniques, which are effective at the cost of severe domain and task limitations'
INFORMATION RETRIEVAL
Atthough these sentence-basedtechniques may identify terms, they contribute nothing to the selection of terms required to alhieve the necessary reduceddescription of a whole do..t*ent (or even abstract). Automatic term selection is currently basedon simple text and collection statistics; one might Iook to discourse processing in the AI sense (see Discourse understanding) for more refined derivative selection from identified candidate terms. But current bottom-up processing is far from being able to handle extensive technical text. Frame- or script-basedprocessing(seeFrame theory; Scripts), that is, the use of object paradigms or even scenarios, is a potential top-down alternative to these bottom-up approaches, tut for document as opposedto request processing.But though frame and script techniques have been shown to work (7), this has only been in a very limited senseand for short messages rather than for full paper sources.The formulation and application of a script set adequate for large sets of long texts is clearly an appalling problem even if it is acceptedthat they can do no more than produce rather stereotyped charactefizations. Similar difficulties arise with the less aggressivelynormative approach embodiedin New York University's text formatting, where sentence parsing allocates text elements to domain headings or caseroles (8). Script-basedtechniques, &s their use for summarizing shows, could, however, produce fully structured index descriptions of value to the human reader, if ones overspecializedfor searching by current methods (seeScripts). Indeed, it may be allowed that the transformation of meaning structures sought by AI as necessaryto language understanding could justify the use of much more elaborate document and request descriptions than current automatic matching can exploit. Script-based techniques also have implications for the treatment of collections as wholes. Collection charactertzations are needed to provide context for individual document and request descriptions. Automatic techniques for constructing collection characterizations have been largely confined to the derivation of associative networks (seeAssociative memory; Semantic networks) or classifications, that is, relatively weak, lexically oriented structures offering a fairly superficial charactertzation of the document set. There would appear to be a case,if retrieval is to be made more effective, and indeed if questionsare to be answereddirectly rather than indirectly by supplying documents,for associatingdocument setswith propositional knowledge bases. SRI's Hepatitis Knowledge Base project (9) was designedto provide suchknowledgebasesupport for question answering with documentary amplification, whereas Researcher (10) represents an initial attempt at the (assisted) construction of an explicit representation of the knowledge contained in texts. But integrating or even just relating explicit and implicit knowledge, as well as the summary and the detailed, is not easy. This issue arises more generally in providing support for query formulation, currently supplied for large systemsby human intermediaries. Attempts are being made to give the user not merely passivesupport, like thesaurus displays,but active support in the form of an expert system replacing the intermediary. Comparing the IR case with "classical" expert systems (qv) shows critical differences:documentary domains are very large and ill-bounded; the knowledge manipulated by the system is secondary,not primary-the user wants the knowledge in the documents, whereas the system deals with knowledge about the documents; and the way the documentary knowl-
421
edge is expressed,in its linguistic text, is part of that knowledge.Expert system design in this area comesup against very complex and ill-understood knowledge. A straight expert system, moreover, will not be enough: user modeling will be required, implying the very challenging task of modeling the "anomalous state of knowledge," that is, ill-defined lack of knowledge, the user is seeking to remedy. Finally, though simply implementing AI techniques in IR is hard enough, there is the further problem of establishing whether the results are of use. IR system performanceevaluation calls for very demanding experiments, presenting many problems of sampling, variable control and interpretation and implying testing on a scale and with a rigor far beyond that ordinarily considered,Iet alone practiced, in AI (4). AI has (long-term) potential for IR. But IR offers both methodological challenges and substantive applications to AI: the scattered and tentative work that has been done on multipurpose and integrated inquiry systems emphasizesthe importance of dealing with radically different kinds of knowledge. Thus it is important to allow an information request to be simultaneously a request for knowledg", for data, and for documents and, in doing this, to reco gnizethe genuine need of the user for accessto texts he can read and not have others read for him. Again, helpful indexing descriptions can be produced without the aid of world knowledge or inference, and the statistical properties of word usage can be sufficiently informative to justify their use along with other forms of knowledge representation.
BIBLIOGRAPHY Characteristics, 1. F. W. Lancaster,InformationRetrieualSystems: Testingand Eualuation,2nded.,Wil"y, New York, 1979. 2. C. J. van Rijsberg€tr,Information Retrieual,2nd ed., Butterworths,London,1979. 3. G. Saltonand M. J. McGill, Introductionto ModernInformation Retrieual,McGraw-Hill,New York, 1983. 4. K. SparckJones (ed.;,InformationRetrieualExperiment,Butterworths,London,1981. 5. W. J. Hutchins,Languagesof Indexingand Classification, Peter Peregrinus,Stevenage, U.K., L975. 6. K. SparckJonesand J. I. Tait, "Automaticsearchterm variant generation," J. Documentat. 40, 50-66 (1984). 7. G. De.Iong,An Overviewof the FrumpSystem,in W. Lehnertand M. Ringle (eds.), Strategies for Natural Language Processirg, Erlbaum, Hillsdale, NJ, 1982. 8. N. Sager, Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Database,in M. Yovits (ed.),Aduancesin Computers,Vol. L7, AcademicPress,New York, L978. 9. J. R. Hobbs, D. E. Walker, and R. A. Amsler, Natural Language Accessto Structured Texts, in J. Horecky (ed.),COLING 82: Proceedingsof the Ninth International Conferenceon Computational Linguistics, North-Holland, Amsterdam, pp. 127-L32, 1982. 10. M. Lebowrtz, Researcher:An Experimental Intelligent Information System, Proceedingsof the Ninth International Joint Conference on Artiftcial Intelligence, Los Angeles, CA, pp. 858-862, 1985. K. SpeRCKJonps University of Cambridge
422
INHERITANCE HIERARCHY
the IS-A link (18). Other names for the IS-A relation are AKO (A Kind Ol used in FRL), SUPERC (Super Concept,used in Inheritance hierarchies are an outgrowth of the classical no- KL-ONE), and VC (Virtual Copy, used in NETL). The most tion of taxonomic hierarchy as an organization for knowledge. basictaxonomy question,"is an Jca y," can be answeredby any Figure 1 shows a small hierarchy consisting of the classmam- inheritance system by taking the transitive closure over the mal, the superclassvertebrate, and the subclasseselephant, IS-A relation or its equivalent. Other types of query are possisheep,and dog. In such a representationit is not necessaryto ble, some of which require that IS-A links be traversed in the state that elephants, sheep,and dogs are vertebrates because oppositedirection. One example is "list all the mammals." Some systems distinguish between classes(such as mamthat can be derived from the fact that all mammals are vertebrates. Such deductions,which are a form of syllogistic reason- mal) and individuals (such as particular mammals, like ing are accomplishedin AI programs using a specialtzedinfer- "Clyde the elephanf"), although both types of object can inherit properties. FRL does not make this distinction, but ence,(qt) technique called inheritance. Taxonomy is only the beginning of inheritance reasoning. NETL and KL-ONE provide different node types for classes AI researchershave addedmachinery for representing proper- and individuals. NETL usesthe same link type for two types of ties of classes,exceptionsto inherited properties,multiple su- inheritance; a VC link is used to express both "Clyde is a perclasses,and "structured" concepts with specific relations mammal" and "elephants are mammals." KL-ONE prescribes among the structural elements. In addition, inheritance rea- a different link, the INSTANCE-OF link, for individuals to soning naturally leads to simple forms of default (qv) and non- inherit from classes;the SUPERC link is used only from submonotonic reasoning (qv), and can be used to reason about classesto superclasses.Few AI systems stop at simple hierarchies of classesand individuals; they usually include addiprototypesand typical instancesof classes(1). machinery such as slots, defaults, exceptions, and tional Today, inheritance hierarchies are the backbone of most with which they can perform rather sophisticated demons, (qv) (2), (3,4), (qv) KRt, LISP-basedAI languages,such as FRL Krypton KL-ONE (qv) (5,6), SRL (7), and Omega (8,9). They are also sorts of reasoning. But some recent systems,such as discrimina(19), taxonomic purely for efficient inheritance use found in most semantic networks (qv) (10), including parallel prover for the rest of their reaones such as NETL (11). Inheritance hierarchies have found tion and employ a theorem soning. their way into programming language design as well: in object-oriented programming languages (qv) such as SIMULA (qv) (I2), SMALLTALK (qv) (13), and LOOPS (qv) (14), in the lnheritanceof Propertiesand Slots LISP Machine (qv) flavors system (15), and in the Ada-derived Most inheritance systems provide a way to associatepropertype facility (16). SeeCarnese(17) for a discussionof the role ties with each class; the properties may then be inherited by of inheritance in contemporary programming languages. the class's instances and subclasses.In frame-basedinheritance systems these properties are called "slots" (20) (see Frame theory). To indicate that mammals are warm-blooded SimpleTaxonomicHierarchies in a frame system, the mammal frame would be given a blood The simplest inheritance system, a pure taxonomic hierarchy, temperature slot with the value "warm" as its fiIler. If the relationships. consistsof classeslinked by subclass-superclass value of elephant'sblood temperature slot is sought, and this Each class has at most one immediate superclassso that the slot is empty, the frame interpreter will proceedup the IS-A hierarchy is a rooted tree. The links between classesare norchain from elephant to mammal and look in mammal's blood mally called IS-A links, as in "an elephant IS-A mammdI," but temperature slot. When it finds the value "warm" in that slot, there is no standard terminology for the components of an it will conclude that elephants are warm-blooded. Queries inheritance network, nor is there any standard semantics for about slot values could of course be answered by a conventional database system if alt slot values were stored in the databaseexplicitly. But unlike databases,AI knowledgebases inference rnechanisms-of which inheritance is the most use Vertebrate important-so that they can avoid storing most of their knowledge explicitly. Inheritance obviates the need for an explicit blood temperature value for elephants, sheep, dogs, or any other subclassof mammal, thus saving a considerableamount of space.Another advantage of inheritance is that it can be used to generate reasonableassumptionsin the caseof incomplete information.
INHERITANCE HIERARCHY
Mammal
Multiple Inheritance
Elephant
SheeP
Figure. 1. A simple taxonomic hierarchy.
A natural extension to tree-structured inheritance hierarchies allows objectsto inherit from multiple superclasses,as shown in Figure 2. This is called multiple inheritance. In such a system Clyde might inherit properties from three classes called "eleph a.nt," "circus Star," and "veteran of the Punic wars." Muftiple inheritance results in an inheritance graph that is a DAG (directed acyclic graph) rather than a tree. Such graphs are also called tangled hierarchies, a term popularized
INHERITANCE HIERARCHY
423
;::'ffi.J fiHffi:::Jilt;ttHffiN,f flTffiIJ il::,T
Figure. 2. A tangledhierarchy.
by Fahlman in the NETL system (11). Sometimesinheritance graphs are referred to as lattices, but they are rarely true lattices, since in a lattice every pair of nodes has a unique meet (lowest common superior) and join (highest commoninferior), unless the meet or join is undefinedfor those two points. This uniqueness constraint does not apply to tangled hierarchies. For example, elephant and giraffe might have several lowest common superiors, such as mammal, herbivore, and jungle dweller. The search (qv) algorithm for multiple inheritance systems is more complex than for simple, tree-structured hierarchies. In a tree-structured hierarchy, to test wheth er rcis a subclass of y, it sufficesto start at x and follow IS-A links upward until reaching eith er y or the root of the tree. Multiple inheritance requires a more complex search strategy becauseat each node there may be several upward paths to consider. One may use depth first, breadth first, or some other search method on the inheritance graph. Search time grows at worst linearly with the number of nodes in a tree-structured hierarchy, but in a tangled hierarchy it can grow exponentially becausethe profusion of paths may result in nodesbeing searchedmore than once.For example, if "elephant" has IS-A links to both "mammal" and "herbivore," and both those classeshave IS-A links to "animal," then "elephant" will have two paths to "animal." An inheritance algorithm based on depth-first search may search the animal node twice in looking for the value of someproperty of elephants. In practical applications one would not expect redundant searches to be a significant problem, unless the search includes invocation of demons (describedbelow). One technique for avoiding redundant searchesaltogether is Parallel Marker Propagation (seebelow). Inheritance and First-Order Logic There is an ongoing rivalry between advocates of frame-based and semantic network representations and advocates of formal logic (qt) as a representation langu age (2L). At one time it was argued that frame-based systems could express concepts outside the domain of logic and, also, that they were a more elegant and natural formalism than logic for organizing knowledge. These arguments have less force today. Inheritance
But most inheritance reasoners suffer from a murky semantics: Their true meaning can be determined only by examining the code that implements their inference algorithms because there is no independent formal specification to refer to (22). Logic-based systems offer the twin advantages of a wellunderstood formal semantics and a universally acceptednotation. Hayes (23,24) and Nilsson (25) argue that inheritance systems are merely notational variants of logic (qv) since their nodes and links ean be translated in a straightforward, mechanical way into sentencesin first-order logic. In their translations, classes are one-place predicates, inheritance links from individuals to classesare logical terms, and links from subclassesto superclassesare universally quantified implications. Properties (slot values) can be expressedeither using two-placepredicatesor unary functions plus equality. To illustrate, below are three sentencesin first-order logic that together encodethe knowledge that Clyde is an elephant, elephants are mammals, and mammals have a blood temperature of "warm." Elephant(Clyde). Elephant(r) ) Mammal(r). Mammal(r) ) BtoodTemp(x,warm). For real inheritance systems,especially frame systems,the situation is more complex than this straightforward logical translation suggests.Real systems include default values and permit exceptionsto inherited defaults, but default reasoning cannot be done in first-order logic. Hayes and Nilsson express the belief that this aspect of inheritance is not beyond the reach of logic; one must simply switch to a nonstandard language, such as nonmonotonic logic (26) or default logic Q7) (seeReasoni.g, nonmonotonic;Reasoning,default). But they do not attempt such a translation. None appeareduntit 1983, when Etherington and Reiter gave a translation for a small fragment of NETL in nonmonotonic logic (28). Another reason why real frame systems are difficult to formalize is that the procedural knowledge (demons)found therein can be arbitrary piecesof LISP codecapable of all sorts of reasoning strategies, some of which might not be formalizable in any presently understoodform of logic, just as default reasoning was once unformaltzed. Although logical language is conciseand has a well-defined meaning, standard logical notation may not be the best formalism for expressing even simple forms of knowledge. Knowledge in logic is expressedas a set of unordered, unconnected sentences.AI researchershave found it advantageous to group related facts into structures, such as frames. In addition, the organi zation of knowledge as a graph structure can lead to efficient inference algorithms. The proponents of logical representations point out, though, that the graph-structured organi zation found in semantic nets can be viewed as merely an efficient indexing scheme for retrieval of logical formulas, with no semantic significance. Inheritanc. ,yri"-, of the future might be implemented as theorem provers but continue to use network notation and the network metaphor for knowledge as syntactic sugar for their user interface.
---]
424
INHERITANCE HIERARCHY
SimpleInheritanceExceptions
Exceptions and Multiple Inheritance
Knowledge about the real world is seldom expressedin absolutes; normally it consists of useful generalizations accompanied by exceptions. Mammals generally bear live young, but platypuses, which are mammals, do not; they lay eggs. Of courseplatypuses are rare. The necessity ofreasoning in situations where knowledge is incomplete means programs must be able to make reasonable assumptions based on what they know is typically true. If one knows that r is a mammal but cannot prove it is not a platypus, one may find it profitable to assume that it is a typical mammal that bears live young unless one is told otherwise. This sort ofreasoning cannot be expressedin first-order logic. Consider the following attempt at formalizing this knowledge about mammals and platypuses:
The way exceptions are handled in multiple-inheritance systems depends on the multiple-inheritance algorithm. Some systems, when searching for a slot value for a frame, search upward from each of the frame's immediate superiors and return a list of the values they find. In FRL, for example, if the mammal and herbivore frames both have information in their metabolism slot, as shown in Figure 3, and elephant inherits from both frames, a request for the value of elephant'smetabolism slot will return a list containing both values. But if a value is placed directly into elephant's metabolism slot, this value will be the only one returned becausethe search does not proceed any further. FRL's ability to handle exceptions comesfrom its policy of not searching a frame's superiorswhen any information is available locally. Other choices of search algorithm are possible. One could use depth-first searchto look for slot values and stop as soonas a single value is found. Or one could search all the frame's superiors in parallel, using breadth-first search, and stop when a single value is found. The latter technique is called shortest-path inheritance since the slot value with the shortest path from the start of the search is the one that will be found first. Shortest-path inheritance is the technique NETL originally used to implement exceptions.In a tree-structured inheritance hierarchy all these search strategiel are equivalent, but under multiple inheritance they give different results. Touret zky showed that none of these simple search techniques is guaranteed to give intuitively correct results under multiple inheritance (29,30).Considerthe hierarchy in Figure 4. Nixon inherits from two frames, Quaker and Republican, and Republican in turn inherits from Hawk. Quakers are typically pacifists, while Hawks are of course nonpacifists. Is Nixon a pacifist or not? The algorithm that searches aII a frame's superiors would return both yes and no, which clearly is inconsistent. Depth-first searchwould arbitrarily return one
Mammal(r) ) Reproduces(r, live birth). Platypus(r) ) Reproduces(r, egg laying). Platypus(r) ) Mammal(r). Reproduces(r, y) and Reproduces(r, z) ) y : z. The last line above expressesthe fact that an animal can only have a single method of reproduction. Unfortunately, the above axioms are inconsistent, since from Platypus(Penny) one can derive both Reproduces(Penny,egg laying) and Reproduces(Penny,live birth). This leads to the false conclusionthat egg laying equals live birth. Supposethe rule that mammals normally bear live young is reformulated in order to take platypuses into account: Mammal(r) and -Platypus(*) )
Reproduces(*,live birth)
Now the only conclusion that can be drawn from Platypus(Penny) is Reproduces(Penny,egg laying)' which is correct. But what can be inferred from, say, Mammal(Bertha)? Nothing, since Bertha must be proved a nonplatypus before one can conclude that she reproduces by live birth. There is no way in classical first-order logic to simply assume, in the absenceofcontrary evidence, that a particular mammal is not a platYPus. Due to this limitation, default reasoning cannot be done in first-order logic, but it can easily be demonstrated in inheritance systems. The trick of implementing defaults in inheritance systems is in the search algorithm. Supposea mammal frame is created and given a reproduction slot with value live birth, and a platypus frame is also created whose reproduction slot has the value egg laying. Platypus has an IS-A link to mammal. Let Bertha be an instance of mammal, and Penny an instance ofplatypus. Ifone asks for the value ofBertha's teproduction slot, the search algorithm will travel from Bertha to mammal, where it finds the value live birth in the slot, and so it will return live birth. In Penny's casethe search would stop at platypus where it finds the value egg laying; therefore, it never reaches the mammal frame and never looks in mammal's reproduction slot. As long as the search algorithmworks by ascendingthe IS-A hierarchy and stopping at the first slot value it finds, subclassessuch as platypus can override the default values they inherit from superclasses,such as mammal, simply by specifying new values. Unfortunately, this simple search technique leads to counter-intuitive results in multiple-inheritance systems.
Herbivore
Mammal
Metabolism
Metabolism
Item2
ItemI
ls-a M a m m a l ,H e r b i v o r e
Figure 3. Two sourcesof metabolism information.
INHERITANCEHIERARCHY
425
Elephant Pacifist
Pacifist
Color No
Yes
Republican
Gray
Royal-elephant ls-a
Elephant
ls-a Hawk
Color White
nt-prince Elepha
ls-a Q u a k e rR, e p u b l i c a n
ls-a Royal-elephant
Figure 4. An ambiguousnetwork.
value or the other, but one cannot predict which becauseit depends on the order in which the algorithm examines the Nixon frame's superiors. Thus, whichever answer is received cannot be determined from the knowledge expressed in the network. Shortest-path search would decidethat Nixon was a pacifist since the path to Quaker is shorter than the path to Hawk. But this distinction is not really pertinent; one can obtain the opposite result without changing the intuitive meaning of the network by inserting a few extra frames along the path from Nixon to Quaker so that the path to Hawk becomes the shorter of the two. Path length is simply the wrong criterion for choosing between possible inferences under multiple inheritance. One solution to the dilemma posedby Figure 4 is to write off such networks as inconsistent. But there is a better solution: Figure 4 can be viewed as ambiguous rather than inconsistent. It simply has two interpretations that are both individually consistent, although mutually inconsistent. In one, Nixon is proved to be a pacifist becausehe is a Quaker. In the other, Nixon is proved to be a nonpacifist because he is a Hawk. But in first-order logic, sets of axioms always have unique theories; they simply cannot have more than one interpretation. Thus, in order to view Figure 4 as an ambiguous set of axiomS, one must be willing to go outside first-order logic. In a secondclass of examplesTouretzky (29,30)introduced redundant IS-A links, as illustrated in Figure 5. Clyde is an elephant prince, hence a royal elephant, and hence an elephant. Royal elephants are not gray. The explicit statement that Clyde is an elephant, although redundant, is certainly true. But conventional inheritance search algorithms depend on a strict hierarchical ordering of classesin order to draw reasonableconclusionsabout exceptions.Redundant links interfere with this ordering by allowing the search algorithm to
Elephant-prince Elephant,
Figure 5. A networkwith a redundantIS-A link.
skip levels, for example, to go from Clyde to elephant directly without searching the intervening nodes.A depth-first inheritance reasoner operating in Figure 5 could return either gray or white as the value of Clyde's color, depending on which of his two superiors it examined first. A shortest-path algorithm, on the other hand, would always conclude that Clyde was gray-clearly an error. In FRL, where values from both Clyde's superiors would be returned, the result would be that Clyde was both white and #ay, which is also unsatisfactory. The obvious solution to his dilemma is to ban all redundant statements from inheritance systems. But since such statements are necessarily true, being among the system's own inferences, this would leave one in a awkward position in which it is very difficult to assign meaning to inheritance networks. Touretzky's view is that there is nothing wrong with redundant links; the problem lies with naive search algorithms. He proposesan "inferential distance ordering" as the proper approach to multiple-inheritance reasonirg; this is an ordering on possible proof sequences(and hence inferences) rather than on classesand instancesdirectly. The ordering can be shown to handle exceptionscorrectly even when redundant links are present, and it admits multiple consistent theories for networks such as Figure 4. Unfortunately, it appearsthat
426
INHERITANCEHIERARCHY
inference algorithms that operate accordingto inferential distance are more expensive to implement than simpler strategies such as shortest-path reasoning. Expressiveness of InheritanceSystems Inheritance systemsare not nearly as expressiveas first-order logic even though they can do somethings that first-order logic cannot. Most inheritance systemsprovide no way to make explicit negative statements. One can say that Clyde is an elephant but one cannot say he is not a giraffe. Instead of admitting negation explicitly, most inheritance systems rely on an idea called negation as failure: if the system fails in an attempt to prove an assertion, it assumesthe assertion is false. Negation as failure is useful for default reasoning (as when one assumesa particular instance of a mammal is not a platypus), but it is no substitute for an adequaterepresentation for negation. Inheritance systems also lack a representation for disjunctions. There is no way to say things like "Clyde is either an elephant or a giraffe" or "Clyde's color is either gray or pink.,' And inheritance systemspermit only a few forms of quantified statements; one cannot arbitrarily nest quantifiers as is possible in logic. These limitations on expressivenessare not without compensation. The simplicity and efficiency of inheritance reasoners are a direct result of them. As the representation language becomes more flexible, inheritance reasoners become more like theorem provers; the rise in computational complexity can be dramatic (31).
InheritingSlot Constraints Slots in a frame system can normally have any type of filler, but sometimesit is desirable to constrain either the type or number of fillers. Suppose a frame describing instances of an action, such as ingestion, has agent and object slots. One may wish to restrict the agent slot to contain only animate beings and the object slot to contain only edible objects.Information about the potential fillers of a slot is a constraint on the slot that can be inherited and used in several ways. First, if a slot is filled with something that violates the constraints placedon it, the frame interpreter can generate an error messageor invoke procedurescalled demons (qv) to handle the inconsistency. Second, if constraints are accessible,they can be referred to directly by any procedurethat operateson the knowledgebase.A story-understanding system (seeStory analysis), for example, could use the known constraints on an ingestion frame's object slot to recognizethat the sentence"John ate the cost of the inspection" does not refer to an instance of ingestion; costs are not edible objects.Since constraints are inherited along with the slot, frames for eating and drinking, being instances of ingestion, inherit the same slots and slot restrictions. But it is often useful to modify inherited constraints. The Object slot of the drinking fraffi€, for example, might be further restricted so that it could be filled only by an instance of a fluid. The number of fillers a slot may have may also be constrained. A frame describing a bicycle as a type of wheeled vehicle would necessarily constrain the wheels slot to have exactly two fillers. In a recognition-through-matching application, if an objecthas just one wheel, or three, the bicycle frame
cannot match it because a number constraint would be violated. In KL-ONE there are deliberate restrictions on how inherited constraints may be modified. A concept may replace an inherited number constraint only with one that is more restrictive, i.e., a subrange of the original range. Inherited value restrictions may not be overridden, but they too may be made more restrictive, for example, replacing "edible thing" with "edible liquid." These restrictions assure that there are no inheritance exceptions in KL-ONE. Its designersbelieve that an AI reasoner should treat exceptional caseselsewherethan in the taxonomic component(6).
StructuredConcepts Certain semantic network systems, notably KL-ONE and NETL, treat conceptsas objectswith important internal structure, such as the objects'parts and attributes. Each element of the structure is representedby a node called a "role;" the role names, constraints, and the relationships among roles define the essenceof the concept.For example,the KL-ONE concept of an Arch might include three roles called Postl, Post2, and Lintel, whose values are constrained to be blocks. Intrinsic to the definition of an arch are the constraints that each role must be filled by a different block, the posts must both support the lintel, and the posts must not touch each other. Figure 6a showsan instance of a blocks-world arch, and Figure 6b shows the concept of an arch in one version of KL-ONE. The RoleD links are role definitions,and the V/R (Value Restriction)links place value restrictions on the roles. The required relationships between the various roles are describedin the portion of the diagram labeled Structural description. Figure 7 showsan instance of an arch in KL-ONE. The arch is called ARCH-31, and its Post1, Post?, and Lintel roles are filled by BLOCK-18, BLOCK-ZI, and BLOCK-2l, respectively. The Individuates link between ARCH-3l and ARCH shows that ARCH-31 denotesan instance of the ARCH concept and therefore inherits its roles and structural description. The RoleF links connectARCH-31 with its role fillers, and the Satisfieslinks show which ARCH role each ARCH-SI role satisfies.The links from ARCH-3l to its superior concept,namely the Individuates link and the various satisfies links, form what in KL-ONE terminology is known as an "inheritance cable." This term calls attention to the fact that inheritance in KL-ONE is a multifaceted relationship, involving inheritance from conceptto concept,role to role, and structural description to structural description. Although roles are similar to the slots of a frame system, they are a more complex and more refined idea than slots. In addition to being inherited, roles can be differentiated into subroles, and subroles can be further constrained and placed in certain relationships. Figure 8 shows how, for the game hide-and-seek,the Players role for gamescan be differentiated into two subroles, Hiders and Seekers.Any assertions made about the players of games would be inherited by both hiders and seekers in games of hide-and-seek.Thus, there is an inherited value restriction on both hiders and seekersthat they be animate. Roles can act as pseudoindividuals in a way that slots do not becauseone can make assertionsabout the objectthat fills a role without saying what the object is. This is illustrated in Figure 9 using NETL notation. Working animals have an
HIERARCHY 427 INHERITANCE owner role, and circus animals, who are working animals, also have a trainer role. Clyde's trainer is also his owner. The nodesfor owner and trainer in Figure g are called ROLE nodes in NETL becausethey define new roles. The nodes for circus animal's owner, Clyde's owner, and Clyde's trainer are called MAP nodes becausethey denote instances of inherited roles. Clyde is representedby an INDV node,indicating that he is an individual who exists in the world. Although ROLE, MAP, and INDV nodes are all drawn as open circles, they are distinct node types; of these, only INDV nodes assert the existence of real-world objects.The IS-A link from trainer to animal lover indicates that any individual who fills the trainer role of some animal may be inferred to be an animal lover. The doubleheaded arrow between Clyde's owner and Clyde's trainer, called an EQ link, indicates that the two nodes refer to the same object. Thus, although one may not know who Clyde's owner is, one can say that his owner is the sameas his trainer, and so by inheritance we can deducethat Clyde's owner is an animal lover. The notion that conceptscontain internal structure intrinsic to their definition leads to a different type of inheritance reasonitg, one based on matching. Given a hierarchy of concepts, each with its own set of roles and structural descriptions, the goal of a classifier algorithm is to take the description of an object as input and find the most specific concept that describesthat object. Thus, if one has an object that is a vehicle, has two wheels, and is without a motor, the classifier algorithm will decide that it is not just an instance of a wheeledvehicle, but more specifically,it fits the descriptionof a bicycle.Classifiersare a major use of inheritance in KL-ONE (32). Since roles, constraints, and structural descriptions are all inherited, this type of classifier is a form of inheritance reasoner.
RoleD RoleD RoleD
Structural description
Splitsand Partitioning It is sometimesuseful to split a class into disjoint subclasses, such as splitting living things into animals and plants. Then, if one knows that Clyde is an elephant and one tries to assert
(b) Figure
6. (o) A blocks-world arch; (b) an arch in KL-ONE.
Satisfies
Individuates RoleF RoleF
':.:.:.:.:':':.i:'iiii:rl;:
Arch-31i
ii!!iii:.l:,:.'....JJJ;.:.:
i Block-25
Figure
7. An instance of an arch.
INHERITANCE HIERARCHY
Animate
"Seekers"
RoleD
Hide-and-seek
RoleD
Figure 8. Differentiating a role in KL-ONE.
that Clyde is a cabbage,the SPLIT node in Figure 10 (NETL InheritableRelations notation) will complain. Disjointness constraints can easily Another extension to the inheritance idea allows one to define be implemented as extensionsto a basic inheritance reasoner. relations, such as "bigger than," between the members of two Another way to use this information is to figure out, by inhericlasses,and let them be inherited by the respectivesubclasses tance, which classesClyde cannot be a member of : since he is and instances (29). Thus, when it is asserted that elephants an elephant, he cannot be a plant or any subtype of plant. are bigger than rabbits, if one knows that Clyde is an elephant Parallel marker propagation (described below) is useful for and Joe is a rabbit, one may infer that Clyde is bigger than making these types of inferences rapidly. Joe. A split is called a partitioning if all the subclassesare disInheritable relations, like inheritable properties, are subjoint and their union covers the entire superclass.If cars are ject to exceptions.For example, if citizens dislike crooks,but partitioned into front-wheel drive, rear-wheel drive, and fourgullibl e cittzens do not dislike elected crooks, then this must wheel drive models,and one knows that Clyde'scar is neither be taken into account when reasoning about citizens and a front-wheel drive car nor a four-wheel drive car, one may crooks. Touretzky (33) argues that in order to represent relaconcludethat it is a rear-wheel drive car. Although partitioning depends on the inheritance hierarchy to determine class memberships, reasoning about partitionings is more complex than simply taking transitive closures.Given an object lying Livingthing somewhere in a partition, only when all but one of the subSplitnode classesmaking up the partition have been ruled out can the object be assigned to the remaining one. Partitionings illustrate how, as inheritance languages becomericher in expressive power, they becomemore like logic. Plant Animal lover
Working animal
Circus animal
Circus animal's owner
Elepha nt
Cabbage
Trainer Proposedlink
Clyde's trainer
Clyde
Figure 9. Example of roles in NETL.
Clyde
Figure 10. Splitting a class into disjoint subclassesin NETL.
HIERARCHY INHERITANCE tions and their exceptions properly, one must first have an explicit representation for negation. lnheritanceof Demons
429
of the interpreter. In some systeffis,such as KL-ONE, one can add machinery for expressing almost any sort of restriction in a declarative style, although the interpreter may then become quite complex. In other systemsthe procedural approachmust be used for all but the simplest constraints; yet this is no hindrance since such proceduresare easily written (in LISP) and inexpensiveto execute.In the late 1970sthere was considerable debate over the advantages of each approachto knowledgerepresentation. Winograd offers a gooddiscussionof what was then called "the procedural/declarativecontroversy"(34).
In frame systems the most general form of slot constraint is the demon, or procedural attachment, which is usually just a piece of LISP code. Like other slot information, demons are inherited. The two most common types are IF-NEEDED and IF-ADDED demons.An IF-NEEDED demonwill computeslot values as they are needed;this is useful when the values are expensiveto computebecausevalues that are not neededwill Propagation Parallel-Marker not be computed. Once the demon computesa slot value, it might store it in the slot so that next time the value can be Parallel-marker propagation is an inference technique espefound without invoking the demon. But if the value is excially effectivein inheritance applications.Historically, it is a pected to change often, the demon might refrain from storing refinement of the spreading activation idea introduced by it, so that the demon will be reinvoked every time a request is Quillian in his semantic memory system, one of the earliest made for the slot's value. semanticnetworks (35).Although there has been somediscusIf one is looking for a value for one of the elephant frame's sion of building special-purposehardware for parallel-marker slots and there is nothing at elephant, mammal, vertebrate, or propagation(36),at this point the main attractions of the techany higher frame, then one may begin invoking IF-NEEDED nique are simplicity and concisenessin expressing inference demons, starting with those stored at elephant, until one of algorithms. them returns a value. The order in which demons at various A parallel-marker propagation machine (PMPM) consists levels of the hierarchy are invoked normally follows the same of a collection of simple processorsthat play the roles of nodes rules as the algorithm for locating slot values: Start at the and links in a semantic network. Each processoris provided bottom and work up the IS-A tree. with a small amount of local memory called marker bits. ProIF-ADDED demons are useful for checking for constraint cessorscan set or clear these bits in responseto commands violations by slot fillers and for triggering inference or advireceived over a broadcast bus. In addition, a processorthat sory procedures associatedwith a frame. For example, the link can copy the state of one of its tail node'smarker acts a as bicycle frame might have an IF-ADDED demon attached to the wheels slot to check the number of fillers; if there are not bits to the correspondingbit in its head node, or vice versa. Thus, links serve to prop agate markers through the network. exactly two wheels, the demon could proposea switch to anOne of the strengths of the PMPM is that it can compute very other frame such as unicycle or tricycle. fast transitive closures over a given link type, such as is-a Often there is a choice between expressing knowledge prolinks, becauseall the processorsexecutethe same commands cedurally, through demons, or expressingit in a declarative parallel. in fashion. Consider the number constraint on the bicycle frame's Figure 11 illustrates how a particular type of marker propwheels slot. In KL-ONE and SRL, where number constraints agation algorithm, called an activation scan, works. Suppose are expressedby lists of form (min man), the constraint on the one wants to find out whether Clyde is a mammal. First one wheels slot would be encodedas the list Q 2). Since number tells all the processorsto clear their marker 1 bit. Then the constraints are handled directly by the interpreter, it is a trivClyde node is told to set its marker 1 bit, and the result is ial operation to add a number constraint to a slot. Furtherin Figure LLa. Marker 1 is called Clyde's activation shown more, the declarative representation of number constraints makes them an accessiblepart of the knowledge base, avail- mark becauseit will be used to activate his description in the able for inspection and modification by user-written proce- network. Now a command is broadcastto propagatemarker 1 upward acrossIS-A links, and the result is Figure 11b. When dures. If number constraints were implementedas IF-ADDED the command is repeated,the result is Figure 11c.Repeating demons,one would incur two penalties.One would be forcedto the command one more time doesnot mark any new nodes,so create a separatenumber constraint demon for every slot to be constrained, which is wasteful of spaceand fails to capture a the transitive closureis complete.Note that the mammal node significant generalization about number constraints. And, now bears Clyde's activation mark. This indicates that Clyde since the min and maxcvalues of each number constraint is indeed a mammal. The important thing about marker propagation scanssuch would be embeddedin a piece of LISP code, the constraint information itself would not be directly accessible;it could as the activation scan is that they run in time proportional to the depth of the inheritance Saph, independentof the number only be used by the demon in which it was embedded. In other situations procedural representationsare to be pre- of nodesin the graph or the fan-out of any particular node. No ferred. Consider a frame for representing right triangles. The matter how many facts are learned about Clyde and about lengths of the three sides of such a triangle are constrained to elephants, or how many elephants there are in the knowledge obey the Pythagorean theorem: the square of the hypotenuse base,as long as the depth of the graph doesnot increase,the must equal the sum of the squaresof the other two sides.Such knowledge base can be searchedin constant time. a constraint would be impossible to express in declarative fashion without extending the slot-restriction language to handle algebraic expressionsinvolving the values of several lnheritancein ProgrammingLanguages slots and adding some notation for expressingequality; this Object-oriented programming languages (qv) such as Simula would bring with it a correspondingincreasein the complexity (r2), SMALLTALK (13), LOOPS (14), and the LISP Machine
INHERITANCE HIERARCHY
Elepha nt
Clyde (a)
Clyde
(b)
from AI inheritance systems. In the latter, although there may be a distinction between classesand individuals, the data structures for the two are similar, as are the primitives for manipulating them. But in object-orientedprogramming languages there is a sharp distinction between classesand instances.Classesdescribethings; instancesare t}re things; and the internal representations of the two are entirely different. In AI systems classesare valid objectsof discourse;e.g., one can ask such a system how many legs an elephant has without creating any elephants. In programming languagesclassdefinitions have no use except for creating instances. Some languages,such as Simula, do not even provide a way to create or manipulate classesdynamically; the class hierarchy is established by declarations at compile time and remains fixed during the life of the program. The secondaspect in which AI inheritance systems differ from programming language type systems is that only the former are used for default reasoning (qu). The inheritance system is part of the specificationof the AI system in which it resides;the behavior of an AI system that doesdefault reasoning depends on the multiple-inheritance algorithm provided and the way exceptionsare treated. In contrast, the characteristics of a programming language'stype system have no infl.uence on the specificationsor behavior of programs written in that language; they are of concern only to the programmer. The inferences made by a programming language's type system merely relieve the programmer from some redundant coding.
BIBLIOGRAPHY 1. T. Winograd, "Extended inference modes in reasoning by computer systems,"Artif. Intell. 13(1 and 2),5-26 (April 1980).
Clyde (c)
FigUre 11. Three steps in the activation scan of Clyde.
flavor system (15) normally orgartuzetheir object types into a hierarchy. An object type, or "class," is essentially a record structure with a set of attached procedures known as "methods." Two forms of inheritance take place in these systems. First, a subtype inherits all the componentsof its parent type's record structure, just as frames inherit all the slots of their parent frame. Second,a subtype inherits all the methods of its parent type, but it can override these methods by supplyittg methods of its own. Inheritance in object-oriented programming languages is quite similar to inheritance in frame systems.There is multiple inheritance and a need to represent exceptions, as when a subtype replaces a method of its parent type with one of its own methods. Carnese (17) gives a detailed account of how these issues are handled in four contemporary progTamming languag€s, and Cardelli (37) gives a more theoretical account of multiple inheritance in type systems.There are two aspects, though, in which programming language type systems differ
2. R. B. Roberts and I. P. Goldstein, The FRL Manual, Technical Report AI Memo 409, MIT Artificial Intelligence Laboratory, June L977. 3. D. G. Bobrow and T. Winograd, "An overview of KRL, a knowledge representation language:' Cog.Sci. 1(1), 3-46 (January L977). 4. D. G. Bobrow, T. Winograd, and the KRL researchgroup, Experiencewith KRL-0: One Cycle of a Knowledge RepresentationLanguage, Proceedings of the Fifth IJCAI, Cambridge, MA, pp. 213222, L977. 5. R. J. Brachman, A Structural Paradigm for RepresentingKnowledge,Ablex, Norwood,NJ, 1987. 6. R. J. Brachman and J. G. Schmolze,"An overview of the KL-ONE knowledgerepresentationsystem,"Cog.So. 9(2), L7L-216 (April 1985). 7. J. M. Wright and M. S. Fox, SRL/1.5 User Manual, Technical Report. Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA, 1983. 8. G. Attardi and M. Simi, Semantics of Inheritance and Attributions in the Description System Omega, Technical Report AI Memo 642, MIT Artificial Intelligence Laboratory, Cambridg", MA, 1982. 9. C. Hewitt, G. Attardi, and M. Simi, Knowledge Embedding in the Description System OMEG A, Proceedings of the First National Conferenceon Artificial Intelligence, Stanford, CA, pp. I57-L64, 1980. and Use 10. N. V. Findler (ed.),AssociatiueNetworks: Representatiort, of Knowledge by Computers,Academic Press, New York, 1979.
INTETTICENCE 11. S. E. Fahlman,NETL: A Systemfor RepresentingandUsing Real' World,Knowledge, MIT Press, Cambridge, MA, 1979' L2. O. Dahl, Simula 67 Common Base Language, Technical Report, Norwegian Computing Center, Oslo, 1968. 18. A. H. Borning and D. H. H. Ingalls, Multiple Inheritance in smalltalk-8Q, Proceedings of the second AAAI, Pittsburgh, PA, August L982, pP. 234-237 . L4. D. G. Bobrow and M. Stefik, The LOOPS Manual (preliminary version), Working Paper KB-VLSI-81-13, Xerox Palo Alto ResearchCenter, August 1981. 15. D. Weinreb and D. Moon, Lisp Machine Manuol, MIT Artificial Intelligence Laboratory, Cambridge,MA, 1981. 16. Department of Defense,ReferenceManual for the Ada Programming Language, Ada Joint Program Office, Washington, DC, 1982. 17. D. J. Carnese, Multiple Inheritance in Contemporary Programming Languag€s,Ph.D. Thesis, MIT Laboratory for Computer Science,Report MIT LCS TR-328, Cambridge,MA, 1984. 18. R. J. Brachman, "What IS-A is and isn't: An analysis of taxonomic links in semantic networks," IEEE Comput. 16(10),30-36 (October 1983) (specialissue on knowledgerepresentation). 19. R. J. Brachman, R. E. Fikes, and H. Levesque,"KRYPTON: A functional approachto knowledge representation,"IEEE Comput. 16(10),67-73 (October1983). 20. M. L. Minsky, A Framework for RepresentingKnowledge,in P. H. Winston (ed.), The Psychologyof Computer Vision, McGraw-Hill, New York, 1975, Chapter 6. 2L. D. Israel, "The role of logic in knowledge representation," IEEE Compuf. 16(10),37 -41 (October 1983). 22. R. J. Brachman, On the Epistemological Status of Semantic Networks, in N. V. Findler (ed.), AssociatiueNetworks: Representation and Use of Knowledge by Compu.ters,Academic Press, New York, pp. 3-50, 1979. 23. P. J. Hayes, In Defence of Logic, Proceedingsof the Fifth IJCAI, Cambridge,MA, pp. 559-565, 1977. 24. P. J. Hayes, The Logic of Frames, in B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial Intelligence,Tioga, Palo AIto, CA, pp. 451-458, 1979. 25. N. J. Nilsson, Principles of Artifi,cial Intelligence,Tioga, Palo Alto, cA, 1980. 26. D. V. McDermott and J. Doyle, "Non-monotonic logic," Artif, Intell. 13(1 and 2), 4L-72 (April 1980). 27. R. Reiter, "A logic for default reasoning,"Artif. Intell.l3(1 and 2), 81-132 (April 1980). 28. D. Etherington and R. Reiter, On Inheritance Hierarchies with Exceptions," Proceedingsof the Third AAAI, Washington, DC, pp. 104-108,1983. 29. D. S. Touretzky, The Mathematics of Inheritance Systems,Morgan Kaufmann Publishers Inc., Los Altos, CA, 1986. 30. D. S. Touretzky, Implicit Ordering of Defaults in Inheritance Systems, Proceedings of the Fourth AAAI, Austin, TX, pp. 322-325, 1984. 31. R. J. Brachman and H. J. Levesque,The Tractability of Subsumption in Frame-Based Description Languag€s,Proceedingsof the Fourth AAAI, Austin, TX, pp. 34-37. 1984. 32. J. F. Schmolze and T. A. Lipkis, Classification in the KL-ONE Knowledge Representation System, Proceedings of the Eighth IJCAI, Karlsruhe, FRG, pp. 330-332, 1983.
35.
86.
g7.
431
sentation and, (Jnderstanding, Academic Press, New York, pp' 185-210,1975. M. R. Quillian, SemanticMemorY,in M. L. Minsky (ed.),Semantic MIT Press,Cambridge,MA, pp.227-270, Information Processing, 1968. S. E. Fahlman, Design Sketch for a Million-Element NETL Machine, Proceedings of the First National Conferenceon Artificial Intettigence,Stanford, CA, August 1980, pp. 249-252. L. Cardelli, A Semantics of Multiple Inheritance, in G. Kahn, D. B. MacQueen, and G. Plotkin (eds.), Semantics of Data Types, Lecture Notes in Computer Science, Vol. L73, Springer-Verlag, New York, pp. 5L-67, 1984.
General References R. J. Brachman, On the EpistemologicalStatus of SemanticNetworks, in R. J. Brachman and H. J. Levesque,(eds.),Readingsin Knowl' edgeRepresentation,Morgan Kaufmann Publishers, Inc., Los Altos, CA, 1985,pp. 191-2I5. J. G. Carbonell, "Default reasoning and inheritance mechanisms on type hierarchies," rn Proceedings of the Workshop on Data Abstraction, Databases and Conceptual Modelling, Pingree Park, CO, June 1980,pp. 107-109. Published as S/GART Newslett.74, and SIGPLAN Notices16(1), 107-109 (January 1-981),by the Association for Computing Machinery. M. L. Minsky, A Framework for Representing Knowledge, in P. H. Winston (ed.), The Psychologyof Computer Vision, McGraw-Hill, New York, L975,Chapter 6. D. S. TounnrzKY Carnegie-Mellon University
INTELLECT The INTELLECT system becameoriginally known under the name ROBOT (seeL. R. Harris, "ROBOT: A high performance natural language processorfor Data Base Query," SIGART Newslett.6l,39-40 (February t977); L. R. Harris, "ROBOT: A High Performance Natural Language Data Base Query System," Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Cambridge, MA, pp. 903-904, 1977). INTELLECT is a commercially available English (naturallanguage) interface to database management systems (DBMS). It uses an ATN parser that can handle a variety of sentence fragments. The database itself is used as semantic store and helps to disambiguate input sentences.INTELLECT is consideredto be the first commercially successfulAI product. It is marketed by the Artificial Intelligence Corporation, Walthaffi, Massachusetts. J. Gnlmn SUNY at Buffalo
INTETLIGENCE
Modern, systematic study of intelligence began in the mid1800s. The first such work was conducted by Sir Francis 33. D. S. Touretzky, Inheritable Relations: A Natural Extension to Galton. Galton's view of intelligence was that it distinguished Inheritance Hierarchies, Proceedingsof the CSCilISCEIO Workshop on Theoretical Approaches to Natural Language Understandthose individuals who had genius (e.g.,demonstratedby making, Halifax, Nova Scotia, May 1985, pp. 55-60. ing contributions to science,literature, art) from normal indi34. T. Winograd, Frame Representationsand the Declarative-Proce- viduals. His thesis was that men of genius had sense of indural Controversy, in D. G. Bobrow and A. Collins (eds.),Repre- sight, a better command of knowledge, and so on. Given an
432
INTELIIGENCE
assumption that knowledge must be processedby the senses(such as by"llsight, hearing), those individuals demonstrating genius must have more refined sensory and motor faculties. Thus, Galton argued, intelligence could be measured by assessingconstructs such as visual acu ity, reaction time, pitch discrimination, and the like. However, even though a great volume of data was collectedon the psychophysical abilities of individuals, no evidence for a g.n.rul association of genius with those abilities was found. A great amount of attention [something on the order of 7000 articles and books were published onlntelligence as of -predicting, 1968 (1)l has been given to defining, describirg, and understanding human intelligence in the hundred years since Galton's early investigation of the concept. Early research in this century was primarily devoted to examining intelligence as a single broad construct. More recent study has focused on particular facets and components of intelligence. Several threads of thought have consislently remained central to these investigations. Thesefundamental issuesand findings are discussedin detail below. Defining fnteffigence Intelligence can be viewed as composedof many facets or can be defined as a global concept. For example, one theorist (2) suggestedsimply that intelligence can be defined as "the abitity to learn." On the other hand, other theorists have claimed that intelligence is more complex than such a restricted definition. Binet and Simon (B), for example (the developersof the first modern test of intelligence), suggestedthat intettigence represented'ijudgement,otherwise called goodsense,practical sense,initiative, the faculty of adapting one's self to circumstances.. . . " The key conceptfor these and most other definitions regards adaptation of the individual to the demands of the environment. From a broad perspective,then, approaches to defining and measuring intelligence are concernedwith assessing the mechanisms for learning (qv) and the results of learning (whether in terms of declarative knowledgr, i.e., knowledgeabout things, or proceduralknowledge,i.e., knowledge of how to do things). Below, & few general approachesto defining intelligence are reviewed. Global Theoriesof Intelligence.Humphreys (4) has given a broad definition of intelligence that adequately r,tnt*urizes the character of the construct. He states: "Intelligence is the resultant of the processesof acquiring, storing in memory, retrievirg, combinirg, comparing, and using in new contexts information and conceptual skills; it is an abstraction." In this framework intelligence is seenas a product of, rather than the mechanisms for, the processesof learning. In a sense,Humphreys has described the individual's "repertoire" for reasoned, purposeful thought and action. Intelligence from this perspectiveis the foundation upon which all new information is sensed,perceived,integrated, and ultimately acted upon. However, it is important to keep in mind a perspective of intelligence that is somewhat relativistic. As Jensen (5) points out (at least in regards to the measurement of intelligence), cultural and historical influences put constraints on our view of what intelligence is. Jensen states that if people lived in a hunting-based society,intelligence might be defined as involving "visual acuity" and "running speed,rather than vocabulary and symbol manipulation." Societal influences help focus our conceptof what representsintelligence and intelligent be-
havior. For example, current views of intelligence in Western civilization tend to emphasizestructured r".uro.ing over more flexible, less accessibleconceptsof creativity (qv) and innovation. Perhaps 50 or 100 years from now, the emphasis may change. From this perspective the definition of intelligence represents a cultural consensus.Certainly, in everyday life, attributions that people make about the intelligence of others reflect such generalizations(6). _ More specificity regarding a definition of intelligence follows from the pragmatic outlook provided by Boring (z) some 60 years ago. Boring stated: "Intelligence as a measurable capacity must at the start be defined as the capacity to do well on an intelligence test." On the surface, this definition may sound entirely circular; however, this is not the case.In faci, Boring's definition may be a nearly optimal way of demonstrating just what intelligence involves. That is, this view gives impetus to discussing the pragmatics of what intelligence does consist of. Omnibus intelligence tests have changedlittle (at least in trorrrrsof general forms or content) in the 80 or so years since the original measureswere developedby Binet and simon (3). For example, one major test [the Wechsler Adult Intelligence Scale(8)l of intellectual abilities containsthe followi.rg .J-ponents: Information. For this component the examinee must answer questions regarding the environment he or she lives in; including questions about history, geography,and wellknown pieces of literature. Picture completion. In this subtest examinees must examine a series of figures and determine what componentmust be added to each figure to make the figure complete (a veridical representation of some object). Digit span. This is a test of memory facility. Examinees are given series of numbers (of different lengths) they are required to repeat back to the examiner. Two methods are used in determinin g a person's span of memory. The first method is to repeat back the numbers in the order they are given. The secondmethod requires the examinee to repeat back the numbers in a reversed-ordersequence. Picture arrangement. This subtest requires the examinee to infer, from a set of separate animated panels, what the correct (i.e., logical and consistent)temporal order of the panels should be. Vocabulary. This is a free-form test of an examinee'scommand of word meanings.The examiner calls out words,and the examinee gives definitions of the words. Block design In this subtest an examinee must manipulate a set of patterned blocks into configurations designated by given figures. The test taps both spatial visuali zation and spatial manipulation. Arithmetic. This subtest contains word problems involving mental addition, subtraction, multiplication, and division calculations. Object assembly. In this subtest several puzzlepiecesmust be put together to complete a figure. Comprehension. Here, questions regarding relatively universal conceptsmust be answered.Some of these questions are given so as to require interpretation of commonsayings. Digit symbol. In this subtest a person must remember (or look up) an arbitrary set of associationsbetween a seriesof
INTELLIGENCE 433 of digits and novel symbols. The examinee fills in a table possible. as fast as symbols associated the with digits deterSimilarities. In this test examinees are required to mine the underlying constructs that are common to two objects or concePts. Without attending to the actual structure of intelligence (i.e., the variety of 6road and specific intellectual abilities, which are discussedin detail below), this set of subtestsshould illustrate the dependenceon assessinginformation retrieval (qv), reasoning (qv), word meanirgs, concept understanditg, facility with memory, spatial visuali zation, and so on that positakes place in most intetligence tests. In this respect,the (and are others) many Boring and Jensen, Humphreys, tions'of in agre"**rri regarding a definition of intelligence from both theoretical and pragnatic perspectives' Structureof Intelligence Although most theorists agree on the defining characteristics of the gtolut construct of intelligence, historically there has been a great deal of conceptualdispute regarding the structure of intelligence. At one end of the continuum some theorists claim that intelligence is a single amorphousconstruct. At the other end of the continuum of thought are theorists that maintain that there are as many as I20 different and independent sourcesof intellectual abilities. Any general depiction of intelligence requires a discussionof these schoolsof thought, along with a comparison among the various divergent theories.
Although this description of the structure of intelligence was elegant, there were problems in terms of agreement with the data, especially with respect to tests (of similar content or format) thai .o*r1"ted with each other more than would be predicted on the basis of each test's correlation with general inteuigence. Thurstone (11) found that different, common,intellectual abilities could be found to coalescefrom a wide sampling of mental tests. That is, rather than a single mental Lngine as the sole nonunique determining factor of mental test scores,there appearedto be a small set of "Primary Mental Abitities." Through many empirical investigations of human intelligence, Thurstone identified several primary abilities (or factors) that were revealed by commonvariance shared by mental tests. These abilities were identified as number, visuali zrng, memory, word fluency, verbal relations, perceptual speed,induction, and deduction. The initial thrust of this work by Thurstone and his colleagues was on the relatively independent nature of these primary abilities. That is, rather than inferring that individuals either had more or less general intelligence, Thurstone inferred that a profile (or complete view of the relative strengths and weaknesses)of individuals revealed a more precisepicture of intelligence. In this view intelligence was not a single entity but could be manifest with very different capabilities in various d.omains.For example, a person could be relatively gifted in number facility but also have average ability levels in terms of verbal relations or perceptual speed. Further researchby Thrustone Q2) demonstratedthat general intelligence could be ultimately revealedby examination of the correlations among the primary mental abilities. That is, there is common variance shared by the abilities that implies a general intellectual ability at a higher level of abstraction. This finding, then, indicated that there is someaspectof intelligence that shows that individuals high or low on any particular ability should show relative levels on other abilities that are also high or low, respectively. However, the fact that the correlations between these primary mental abilities are moderate in magnitude leads one to a conclusionthat intelligenceis not uniquely an amorphous "mental engine" but may be thought of as a capability that may be expressedin specific ways that do not directly imply superiority on all things intellectual.
CeneralIntelligenceand PrimaryMental Abilities. The first comprehensivetheory of intelligence was put forth by Spearman in 1904 (g). On the basis of patterns of individual differenceson a variety of intellectual ability tests (such as those described above), spearman maintained that two underlying aspects of intelligence could be found. The first aspect was gurr"tul intelligence (g). General intelligence was implied from examination of universally positive correlations (i.e., similar rank orderingsof individuals acrosstests) among tests requiring rnental processing. That is, the "factor" (or construct) of general intelligence representedthe "common variance" shared by different mental tests. [The amount of common variance shared by two measures is described by the magnitude of Pearson product-moment correlations (10). When it is possibleto predict, to some degree,a person'srelative standing on one test from knowledge about the person's relative standing on another test, the two tests will have a nonzerocorrelation, and the tests will be said to have common variance.l The fact that all such mental tests revealedpositive intercorrelations provided the justification for the construct of general inteltigence. For Spearman, the direct inference from these intercorrelations is that the conceptof inteliigence represents some type of "mental energ"y"that determines performanceacrossa wide variety of tasks that are ostensiblydependent on mental operations (to one degree or another). The positive correlations among tests, though, was not perfect (i.e., proportion of shared variance of less than 1.0). This feature of the data led Spearman to maintain that there are specific abilities in addition to general intelligence that are identified with the unique variance of individual tests. Spearman thus posited that any mental test could be decomposed into variance associatedwith general intelligence and variance associatedwith that test's specific ability.
Facet and Hierarchical Theories.The discovery of several common intellectual abilities (i.e., common to several test measures) Ied to other developments regarding the mapping out of the structure of intelligence. On the one hand, efforts by Guilford and his colleagues (13) brought about a fragmented model of intelligence that contains over 120 different abilities. On the other hand, efforts by a variety of researchers such as Burt (14),Vernon (15), Horn and Cattell (16), and Humphreys (17) have focusedon describing the various interrelations between abilities of varying levels of generality/specificity. Each line of thought is important to understanding what the structure of intelligence is; and each is discussed in turn. The descriptive framework of intelligence put forth by Guilford has been called the Structure-of-Intellect rnodel. Guilford has maintained (18) that the basic nature of any specific intellectual ability can be representedby placement in a three-dimensional array of facets. The dimensions considered fundamental (along with the categories in each dimension) are as follows (19):
434
INTELLIGENCE
Operations-what the respondent does. These include cognition, rnernory,diuergent production (prominent in creatiueactiuity), conuergentproductian, and eualuation. Contents-the nature of the materials or information on which operationsare performed. Theseinclude figural, symbolic (e.g., Ietters,numbers), semantic(e.9.,words), and behauioral(information about other person's behauior, attitudes, needs,etc.). Products-the form in which information is processedby the respondent.Products are classifiedinto units, classes,relations, systems,transformations, and implications. A spatial illustration of the Structure-of-Intellect model is presented in Figure 1. Illustrations of the products of intellectual processesare presented in Figure 2. Given the array defined by the five operations,five contents (after a split of figural content into visual and auditory con-
production roduction
Operation:
Figural Sym bol i c
4""r""*
Semantic
[
B e h a vi or a
Figure 1. Guilford's Structure-of-Intellect model (from Ref. 13). Courtesy of McGraw-Hill.
O Units
irTl OL
ato Classes
Oo Oc Relations
tents), and six products, there are a full 150 different facets of intelligence according to Guilford's model (20). However, even this model cannot be said to be a completerepresentation of all of the important aspectsof intelligence. More recently, Humphreys has proposedthat other dimensions should be addedto the model, such as speed(i.e., requirements for speedof mental processing),sensory modality (e.9., visual, auditory, tactile), and others. Each addition of a dimension multiplicitively increasesthe total number of intellectual abilities specifiedby the model. It should be obvious that a taxonomy of intelligence (a periodic table of intellectual abilities, if you will) becomes quite unworkable very rapidly. Also, at least at a basic level, the implied orthogonality (i.e., independence)of these structure dimensions is at odds with the data that imply the existence of general intelligence. Guilford has acknowledgedthat, indeed, these dimensions are intercorrelated. No longer can this model be tightly defined as independent dimensions.The problems related to the nonexhaustive nature of the model suggestthat this perspective is not plausible as a genuine allinclusive representation of intelligence. The ultimate utility of the Structure-of-Intellect model, though, comes from the specification of what the important dimensions and categories of intellectual operations may be rather than through provision of a method for determining each and every facet of intelligence. Two other sets of representations offer greater agreement with the corpus of data on intellectual abilities as well as provide the open structure necessaryto the domain. The first such representations of the structure of intelligence are hierarchical organizations (see Fig. 3). Explicit in these theories [variously specifiedby Burt (14), Vernon (15), Horn and Catte}} (16), and Humphreys (17)l is a general intellectual ability factor (or g;. Although the terminology for this general ability is the same as that of Spearman, the construct here is quite different. These theories agree that other ability factors are important componentsof intelligence. The g ability factor represents the highest node in a hierarchy of ability factors. The influence of such a factor has been estimated by Vernon as reflecting anywhere from roughly 20 to 40Voof the variance in a population of "all human abilities." The different theories in this class diverge when it comesto identification of factors that constitute the nodes below g. However, all theories appear to be in agreement about the Content G
I
(Complex,general) GrComPlexitY
-ril
Veinal
n Systems
Transformations
l_n Im p l i c a t i o n s
Figure 2. Examples of six different products from Guilford's Structure of Intellect with figural content (from Ref. 13). Courtesy of McGraw-Hill.
(Simple, specific)
Gf -G,
ltl Numerical
SPatial
rrh rh #
abilities as desigstructure of intellectual Figure 3. Hierarchical _ crystallized naled by Snow et al.: G : general intelligence, Gg : : see text for visrralization; Gv intelligence, fluid intelligence, Gf discussion of these terms (from Ref. 2L). Courtesy of Ablex Publishing Corporation.
|NTELLIGENCE
nature of the hierarchy. That is, the general factor represents the broadest ability, and factors at the next level represent broad or major group factors [e.g., verbal: educ'ational and practical: mechatticai, as in vernon's theory). Each of the rUititi"s at this broad group factor node may be fragmented to reveal their constituent abilities. For example, at the next node the verbal ability factor might fragment into vocabulary, reading comprehension, associational fluency, and so forth' These lower rUitity nodes ffi&y, in turn, be further subdivided to allow representation of the different test formats for assessing the specific abilities, and so on. A proj*ct in this domain was conducted by Lohman Q2). Lohman reviewed substantial data and found the following structure underlying spatial ability. Three major factors are subsumedunder the classdenotedas spatial ability. These are spatial relations, spatial orientation, and visualizatron- At lower nodes factors such as closure speed, perceptual speed, visual memory, and kinesthetic speedwere found that implicate the basic processesof encoding,rememberitg, transformirg, and matching spatial stimuli (23). These theories maintain generality by allowing flexibility in the determination of specific abilities. In addition, the actual level of description (or degree of fragmentation) of the ability nodes in the hierarchy is often left indeterminate. Thus, researchers may differentially define the broadnessof specific nodes to reflect the level of test content analysis of concern. The key to these hierarchical theories is that they purport to describegeneral intelligence and broad content domains as representing the communality that exists (to varying degrees)at all levels of intellectual tests and tasks [see Humphreys (17)1. The theory put forth by Cattell and Horn has elements common to the other hierarchical theories but also contains additions and deviations (L6,24).The deviations of particular interest regard the major ability groups and the role of learning in the structure of intellectual abilities. The main unique feature of this theory is the exposition of "fluid" and "crystallized" classesof ability groups. The distinction is that these are "two major factors, one associatedprimarily with physiologically-basedinfluences [fluid intelligence] and the one associated with educational, experiential influences [crystallized intelligencel. . ." (25). Regarding the fluid intelligence-crystallized intelligence distinction, Horn states that individual differences on fluid intelligence are implied as influential when subjectsare confronted with tasks that require the rapid "learning and unlearning" of information or other processessuch as educing relationsi logical reasonirg, and so on. Crystallized intelligence is associatedwith processingspeedand efficiency when tasks contain familiar formats, require use or minor restructuring of previously formed production systems, or retrieval of information that is stored in long-term memory. As is discussedfurther below, the fluid and crystallized types of intelligencein the Cattell and Horn theory are posited to have differential associations with learning on a broad developmental scale. A different view of the structure of intelligence has been provided by Snow et al. (2L). Although mathematically equivalent to the hierarchical representationsdiscussedabove,this approach provides a conceptually divergent portrayal of intelIigence. In this theory, depicted in Figure 4, Snow et al. describe the structure of intelligence in terms of a circle (i.e., two
435
figural Specific Figura I
Complexity
Specificverbal
Numerical Specificnumerical
Figure 4. Alternate structureof intelligencemap proposedby Snow et al. (fromRef. 2I).Courtesyof Ablex PublishingCorporation. dimensions). General intelligence is defined as the center of the circle, with complexity (which is not distinguished from generality/specificity per se) defined in terms of distance from the center of the circle. Different regions of the circle correspond to the major content abilities (e.g., verbal, numerical, spatial). The importance of this representation of intelligence is that attention is focusedon the degreeof similarity/dissimiIarity of various types of intellectual processesin terms of content and complexity. From this perspective,the integrated nature of intelligence should be more apparent than from the hierarchical structures or Guilford's Structure of Intellect. Information Processingand Intelligence.In the last decade an explosion of research has converged on determining more basic information-processing elements of intelligence. In terms of the hierarchical representations presented above, these researchprograms seek to provide explicit description of intelligence at nodes far removed from general intelligence and major content abilities. The desire is to find the general buitding blocks for intelligence, whether they be revealed by evoked brain potentials or speed of stimulus recognition or discrimination. In this sense the classical psychometric approach has been top down, from general intelligence to specific abilities. The information processing approach in experimental psychology has been essentially bottom up, or moving up from the most basic processes to more complex levels of thought. Of course, as with any reductionist scientific enterprise, distinctions can be made in terms of the level of analysis in this area as well. A few illustrative paradigms and findings are described. Sternberg (20 identifies four major sourcesof research in the recent applications of information processingto the study of intelligence. A listing of these methods here serves to describe these avenuesof investigation, although in-depth treatments may be found elsewhere (27). The cognitiue-correlatesmethod. In this paradigm basic information processihgtasks are given to subjectsof high and low intellectual abilities (tasks such as lexical access,shortterm memory, choice reaction time). The associationsbetween task performance differences and intelligence differences are used to develop theories relating information processingto intelligence. The work by Hunt and his colleagues falls in this category (28).
436
INTETI.IGENCE
The cognitiue-components method. This is essentially a to_pdown approach to link intelligence with information processing.In this paradigm the information-processingcomponents of specific intelligence tests are isolated and measured separately. Research results using this method often have a clearer connection to global intelligence than with a bottom-up approach. [For a landmark example of this type of study, see Sternberg (Zg).1 The cognitiue-training method. Within this paradigm the malleability of cognitive intellectual processesis studied via specifictraining programs. Researchfrom this approach has been quite promising in the delineation of cognitive "skills" from intellectual "abilities." Recent papers by Frederiksen, et al. (30) and Pellegrino (81) are restructur_ ing traditional conceptionsof adult intelligence as fixed by creating training programs that can remediate deficits often viewed as "intellectual." The cognitiue-contentmethod. This is one of the approaches traditionally used in AI research as applied to the study of intelligence. That is, expert-novice differences are evaluated for establishing how intellectual abilities are implicated in the acquisition of expertise. Factors such as knowledge structures and selection/useof strategies fall into this domain. Prototypical work in this area has been done by Chi et al. (32).
between these two types of abilities within a general conception of intelligence. 2. Stageanalysis approach. Abilities found at these low levels of cognitive operation were consistently found to be associated with both process and content of information. Thus, this particular approach gives more validity to the claim that hierarchical theories of intelligence can proceedconsistently to the lower nodesof increasedspecificity. 3. Code analysis approach. Additional ability infrastructure was put forth through this study. Low-level abilities were found that relate to amount of perceptual processing required and level of processing (e.g., from physical feature analysis, to identity of stimulus name, to semantic meaning). Another major compendium of information-processingabilities and some associatedintellectual abilities from both task analysis and ability factor perspectiveshas been put forth by Carroll (for details, see Ref. 38).
Learningand Intelfigence
Given that learning (qt) is a major aspect of any theoretical conceptionof intelligence, it is no surprise that substantial attention has been devoted over the years to establishing empirical connections between the two constructs. Three quesCognitive-correlates research by Hunt and his colleagues (e.9., Ref. 33) has led to identification of several sourcesof tions are central to providing an understanding of learning information-processing speed and efficiency that are associ- and intelligence. Theseare: How doesintelligence comeabout, ated with verbal aspectsof intelligence. At a more basic level, how does intelligence influence learning, and how is intelliwork by Lunneborg (34) and Jensen (3b) have assessedthe gencerelated to efficacy of information processingsubsequent degree of association between the most elementary types of to learning? Much research has been devoted to the developinformation processing,"simple reaction time" and "choicere- mental aspects of intelligence acquisition. An overview of action time." Simple reaction time is a measure of the speed some important theories and findings of this approach is prowith which a person can encodeand make a responseto the vided below. onset of somestimulus (usually a single light stimulus, and a Developmentof Intelligence.How doesacquisitionof intelliresponseof a key depression).Choicereaction time is the next higher level of information processing;it involves a discrimi- gence come about? Historically, psychologists have argued nation of two or more stimuli (e.g.,two different lights) and about the relative importance of genetic endowment (i.e., a usually unique key responsesto each stimulus. Even at this predispositionto acquiring intelligence) or environmental inelemental level of operation, positive correlations are found fluences (e.g., things like child-rearing practices and educabetween speedand accuracyof reactions and global measures tional opportunities). In brief, this controversy covers the questions that concern relative standing of individuals with of intelligence (for details, seeRef. 36). respect to measures of intelligence, not specifically how indiA recent paper by Kyllonen (37) has demonstrated further viduals develop intelligence. Detailed discussionof this conhow a more comprehensive approach may be used to detertroversy is beyond the scopeof this entry, but the interested mine the relations between basic information-processing reader might consult Anastasi (19) or Block and Dworkin (39). mechanismsand intelligence. In a series of studies Kyllonen examined information processeswith three different methods: Rather, the concernhere is exactly what role the environment a whole-task analysis, where underlying factors of cognitive has on the development of intelligence (regardless of which individuals are predestinedto be more or less well endowed). tasks were determined; a stage analysis, where intelligenceIntelligence in terms of thinking abstractly, Iearning cominformation-processing relations were examined for various plex concepts, reasonirg, and developing language skills is processing procestagesof within tasks; and a coding analysis dure, in which relations between intelligence and information certainly not innate, that is, present at birth. Case studies of processing were examined where the type of information in feral or severely deprived children most strikingly demonmemory was manipulated. Findings from this series of studies strate that such intellectual faculties as language are not present in the individual in the absenceof particular types of were as follows: environmental experiences.Certainly, environmental influences are crucial to any expression of intellectual developl. Whole-taskapproach. Abilities were found to be associated ment. Many research programs specifically explore the variwith task difficulty level (consistentwith earlier theories of ous factors that appear to strongly impact development of intelligence) and also with speed of mental operations (a intelligence. Details of such programs can be found in Refs.40 construct not separately dealt with in previous theories of and 41. For present purposes two avenues of theory and reintelligence). There appears to be a complex relationship search seem especially relevant. First and foremost is the pat-
INTELLICENCE 437 tern of accretion (and plasticity) of intelligence. The second issue pertains to the possibility of differentiation of intellectual abilities with maturation. Each is discussedin turn. T\vo tracks of research have provided notable information about how intelligence is acquired. One track is strictly psychological; it involves analysis of changing patterns of intelligencemeasuresfor individuals from infancy to adulthood. The other track includes combined work in neuropsychologyand developmental psycholory; it involves examination of concurrent changesin brain physiologywith developmentof intellectual abilities. It is important to note first that evidence from the study of intelligence measurescan be somewhat problematic when one is concernedwith infants and young children. This state of affairs comes about becauselanguage development appears to be a critical ingredient for reliable measurement of intelligence. Assessmentsof intelligence in prelanguage infants must, by definition, involve criteria other than things like vocabulary, verbal reasonirg, and so forth. As a result, such measurestend to capitalrzeon perceptual/motorrelated intellectual abilities. However, when performanceon similar types of test items are examined in detail in longitudinal studies, some understanding of the accretion of intelligence is afforded. With research as early as L928, Thurstone (42) and others have shown that intellectual development can be representedby a function that has positive acceleration at birth, but the function showsrapid decelerationwith maturity (e.g.,infants at the age of 4 appear to have developedroughly 50Voof measured adult intelligence). Such findings illustrate how important early experience and stimulation may be toward development of intelligence. By approximately age 6, children have developed the necessary intellectual ability foundation for reasoning,reading, talking, and so on that allow them to build declarative knowledge and procedural production systems in memory that ultimately provide for adult operationsof thought and action. Formalized structures of the developmentof intelligence in stageshave been put forth by Piaget (43) and others (seeespecially Ref. 44). Piaget's theory of intellectual development states that children proceedthrough several stagesof thought processes,denoted as sensorimotor, preoperational, concrete operational, and formal operational. Each of these stages of developmentis charactenzedby certain styles of thinking that imply whether or not a child can understand various concepts (such as object permanence, causality, and object relations in space).In a sense,these stages parallel development of more traditional conceptionsof the developmentof intellectual abilities. In fact, instruments measuring the stagesof intellectual development from this framework have shown high levels of agreement with more global measures of intelligence (45). The track of research dependent on neuropsychologicalinvestigation has shown marked correspondencebetween intellectual plasticity and neurological development.In one line of work the processof canahzationof the brain is seento parallel the initial development and negative acceleration of intelligenceaccretion. Other studies (focusing on language development, a clearly important aspectof intellectual abilities) have demonstrated that there may be similar critical periods for language acquisition. If specific brain damage occurs prior to age 12, individuals may still show development of language, attributable to the fact that connectionsbetween neurons are still being established (46). Subsequent to this age similar types of brain damage will vastly limit the degree of recovery (and further acquisition) of verbal skills shown at earlier ages.
Entirely analogous findings have been reported that demonstrate limitations of even extreme changes in environment that effect changesin relative levels of intelligence (40). That is, environmental influences on the development of intelligence are maximally effective immediately after birth and up to age 3 or 4. When environmental changes occur after this point, substantially lesschangein intelligence (i.e.,relative to the agenorm) is found. The failure of the Head Start programs is further illustration of the fact that the fundamental mechanisms for intelligence are formed quite early in life. One other line of thought may have ramifications for how one views the development of intelligence. Researcherssuch as Garrett (47) have proposedthat intelligence for infants and young children is essentially undifferentiated. However, as the individual matures and receives varied educational instruction and other experiences in the environment, intelligence becomesdifferentiated or specialized.Adults are thus expectedto show much wider varieties of intellectual abilities (from verbal abilities of vocabulary, to spatial abilities of visualizatton, to perceptual/motor abilities such as scanning speed and accuracy, etc.). Such theorizing is consistent in many ways with the previously discussedstructure of intelligence proposedby Cattell and Horn (16). That is, in early infancy fluid abilities essentially constitute intelligence. As the individual matures, fluid intelligence is used for building crystallized intelligence. In a sense,the more specializedabilities are thought to "break out" from the higher node, more general intellectual abilities. Whether this phenomenon may be primarily determined by the educational system (where specialization often is associated with high school,trade school,and collegecurricula) or be a result of interests and extramural experiencesis not directly specifiedby the theorists. The data regarding the veridicality of the differentiation hypothesis (as it is known in the psycholory literature), though, has been equivocal (15,48).Although several studies have tentatively shown that general intelligencedeclines in its dominance as age increases,other studies have failed to find such changes(49). The problems in evaluating this hypothesis seemto be encounteredin choosingassessment instruments that tap the same abilities (and that are equally appropriate) for individuals in different age groups. For the present purposes,though, the lack of resolution of this hypothesis is less important than the possibilities it raises for those interested in the patterns of intelligence acquisition.
Intelligenceand Learning.For adults, though, intelligence (whether general or specific) is seen as the foundation upon which all new information is processedby any given individual. As Ferguson (50) has pointed out, the key to understanding learning is that the processis accumulative, where each new fact or procedure incorporated partly a result of prior and present knowledge. In fact, Ferguson maintains (quite reasonably) that transfer is implied in nearly all learning. (Transfer, whether of knowledge,skills, prior training, processingmechanisms, and so on, is the conceptof building upon a structure provided by earlier learning experiences.)That is, learning in the absenceof some aspect of transfer is an extremely rare occurrence(such as only with the neonate). Given this fundamental implication of intelligence influencing future learning, it is sensibleto ask specifically how intetligence interacts with learning. There are many caseswhere demonstrative associationsbe-
438
INTELLICENCE
tween intelligence and learning are found in the literature (e.9.,see Refs. 51 and 52).Unfortunately, in many situations measurement (i.e., statistical) problems are associatedwith assessingamount of learning independent of initial level of performance(seeRef. 53). What has been determined to date is that intelligence is strongly associated with initial performance on most skill acquisition tasks 6$. However, more striking is the fact that on many learning tasks, the influence of intelligence on task performanceseemsto markedly attenuate as time-on-task increases.Ackerman (55,56)has theorized that a complex, but tractable, relationship is found between intelligence and performance under learning conditions. This theory states that intelligence is crucial in situations of learnirg, especiallywhen the task is novel; that is, less potential for transfer is present when novel conditions are encountered. The importance of intelligence in learning, though, is predominantly with respect to selection of appropriate task strategies, enabling of previously learned information processing structures, memory capacity, and establishing new production systems. However, once the appropriate strategies are selected,production systemsestablished,and so on, the influence of intelligenceon future learning is vastly diminished when the task is one with consistent information-processingrequirements. Essentially, when the consistent rules and proceduresare internalized by the individual, further learnittg (to levels of asymptotic performance)is associatedwith specificabilities. In terms of the structure of intelligence discussedearlier, a predominantly novel task will involve general intelligence (the top node in the hierarchical theories, the central location in the Snow et al. theory). As individuals get more experienceon the learning task, specificabilities (the lower nodesof the hierarchies, the peripheral areas of the Snow et al. theory) determine ultimate performance.In fact, when a task becomes"automatic" (seeRefs.55 and 57), such as in aspectsof driving a car, moving chesspieces,solving geometry proofs, and so on, the influence of intelligence on performance is expectedto be substantially attenuated. On the other hand, should a task contain inconsistent information-processingcomponents (such that productions to perform the task must constantly be altered or reversed),learning per se fails to occur.In such casestransfer is also excluded,and intelligence is expected to determine performance from initially novel situations to ones in which individuals are given substantial opportunity for practice/learning. Recent empirical demonstrations of these phenomena support this hypothesized role of intelligence in learning. Figure 5 illustrates how consistency(or lack thereof) of a simple task of memory and verbal categorization moderatesthe associationbetween general intelligence and performance during learning. The task with inconsistent information-processingrequirements shows stable dependenceon intelligence for performance.The consistent task, which is also initially novel, shows reduced dependence on intelligence as learning progresseswith practice. Findings such as these indicate how intelligence is used during learning but also show that intelligence is not directly involved in many previously learned information-processing operations. Although it is perhaps not immediately obvious, intelligenceis required when new tasks are incompatible with previously learned production systems.For example, when controls are reversed (as is common when adjusting between the right-
0.30 4.25 o) (J C (o
Inconsistent
0.20
t-
(o E q) (! a
0 .1 5 0.10 0.05 0.00 0
x
r
2 3
4 5 6 7 8 9 10 11 12 Sessions ofpractice
on Figure 5. Sharedvariancebetweenintelligenceand performance the informationprocessingtasks, one with consistentcomponents, Data colcomponents. otherwith inconsistentinformation-processing lectedover 2.5 h of task practice(from,P. L. Ackerman,Individual Differencesin Learningand CognitiveAbilities, paperpresentedat the Officeof Naval Research"Action and Attention" Contractors' Meeting,HarvardUniversity,1985.)
hand controls on American autos and left-hand controls on European autos; or when a programmer changes from one computer keyboard layout to another, different layout), old production systemsmust be undone while new productions are established. For both situations, extinguishing the old and putting together the new, intelligence is demanded.The key conceptis that intelligence determines the efficiencyand accuracy of both novel and inconsistent (or incompatible) types of information processing. after Learning.That Intelligenceand lnformation Processing intelligence is implicated during learning is not surprisittg. However, the fact that intelligence is seen as the foundation for building production systems, but not in the postlearning phasesof operation, is an important aspectof the human information-processingsystem.As James (58),Whitehead (59),and many others have pointed out, normal human functioning involves very little that may be describedas "intellectu &1,""requiiing thinking," and so forth. Instead, much of normal human mental activity is more like a series of "flywheels" (58,60). Series of production systems are established, tuned, and ultimately unitized so that a long series of productions may be triggered by a single stimulus or internally generated intention (such as reaching for a fork at the dinner table or putting on one's socks).Intelligence is the stuff of which these production systems are created (or modified from previous uses). Subsequentto the lengthy processof learning, intelligence (or attention, if you wilt) is only neededwhen seriesof production systems must be modified (such as driving an unfamiliar car) or addedto (as when an electronicstroubleshooterencounters a more complex system) or when greater-than-normal accuracy is needed(e.g.,taking apart a clock vs. taking apart the same clock when it is connectedto a time bomb). In such cases intelligence is used to provide additional checking on the information-processingsystem in order to preservestricter tolerances.
INTELLIGENCE
Summary
BTBLIOGRAPHY
Intelligence has been describedin many ways' though almost all definitions involve learning as a major component of the construct (i.e., adapting to one's surroundings). Ultimately, the domain of knowledge, thought, and action are implicated as making up the framework for intelligence. The concept is consenr,rul, but overlap among different theories is the prominent characteristic of varied definitions. Regarding structure and the purpose of the inquiry, the construct of intelligence can be segmented into a hierarchy, with general intelfig"tt.. at the top node and more specific abilities at lower nodes. Similarly, the structure can be described as a circle, with general intelligence at the center and Iess complex abilities (and components of abilities) moving away from the center of the circle, Iike spokes of a wheel. Other views provide interesting causal implications about the development and structure of intelligence. Lastly, for taxonomic purposes, intellectual abilities may be described categoricaliy in three or more dimensions (as in Ref. 20). Although at oddswith empirical data, this latter model serves **"*hat a valuable heuristic purpose in seeking out different types of intellectual processes. Information-processing research and theory developedover the last decadehas allowed for the examination of basic building blocks for intelligence. Experiments have been described that concernaspectsof speedof mental operations,accuracyof information processing, contents of processing, and codes of processing (including low-level pattern detection of stimulus features to higher level semantic codespresent in long-term memory), all of which appear to represent fundamental facets of broader, more general conceptualizationsof intelligence. Acquisition of intelligence appearsto parallel neurophysiological development. At early ages intelligence is plastic and can be influenced in many ways. As the individual ages' though, intelligence (or at least the foundation for intellig.n.") seemsto be less malleable, such that by the time a child enters the schoolsystem, the basis for that individual's intelligenceis relatively stable. Whether intelligence becomesmore specialized(i.e., differentiated) with age is not yet established; however, the possibility exists that infant intelligence is more like a general problem-solving system and less similar to adultlike expert systems.Current researchis focusedon these issues,and answers seem forthcoming in the near future. Such findings about the developmentof intelligence will have major implications for discussionsof mental architecture. In the domain of human information processingand learning, not all operations require intelligence. Rather, intelligence is required when tasks involve processing of novel or inconsistent information. From a perspective of production systems, intelligence seems to be necessary for establishing production systems, selecting optimal arrangements of productions, modifying previously built production systems for related uses, and undoing established production systems. Although there are many unanswered questions about the structural characteristics of intelligence-the causal factors determining the development of intelligence and the specific relations between learning and intelligence-the scientific discipline studying intelligence has shown great progress in the past 100 years of investigation. Future developmentsare especially likely to be found in applications of neuropsychological and cognitive, information-processing paradigrns to the study of intelligence.
Bibliogra' 1. U.S. Department of Health, Education, and Welfare. phy of Human Intelligence. u.s. Government Printing office, Washington, DC, 1968. A sympo2. B. R. Buckingham, "Intelligence and its measurement: sium," J. Ed. Psychol.!2,27L-275 (1921)' g. A. Binet and r. simon, The Development of Intelligence in chilLevel of dren. New Methods for the Diagnorir of the Intellectual in J. J' Subnormals.The Developmentof Intelligence in the Child, DifferIndiuidual (eds') of Studies Paterson , Jenkins and D. G. ences:The Search for Intelligence, Appleton-Century-Crofts,East Norwalk, CT, PP.81-111, 1961' Intelli' 4. L. G. Humphreys, "The construct of general intelligence," gence3, 105-120 (1979)' A reply to 5. A. R. Jensen, Race and the Genetics of Intelligence: (eds.), ControIQ The Dworkin G. Lewontin, in N. J. Block and 1976' pp. 93-106, York, New Pantheon, Read,ings, critical uersy: 6. R. J. Sternberg, Beyond,IQ: A Triarchic Theory of HurrLanIntelli' gence,cambridge university Press, cambridge, MA, 1985. 7. E. G. Boring, Intelligence as the Tests Measure It, in J. J. Jenkins and D. G. Paterson (eds.), Studiesin Indiuidual Differences:The Search for Intelligence, Appleton-Century-Crofts, East Norwalk, CT, pp. 2L0-2L4, L96L. g. D. Wechsler, Manual for the WechslerAdult Intelligence Scale, PsychologicalCorporation, New York, 1955' g. C. Spearman, "General intelligence objectively determined and measur€d,"Am. J. Psychol. 15, 201-293 (1904). 10. Q. McNemar, Psychological statistics, 4th ed., wil"y, New York, 1969. 11. L. L. Thurstone,Primary Mental Abilities, University of Chicago Press,Chicago,IL, 1938. tZ. L. L. Thurstone, "Psychological implications of factor analysis," Am. Psychol. 3, 402-408 (1948). 18. J. P. Guilfor d,, The Nature of Human Intelligence, McGraw-Hill, New York, 1967. 14. C. Burt, "The structure of the mind, o review of the results of factor analysis,"Br. J. Ed. Psychol.19,100-111, 176-199 (1949). 1b. P. E. Vernon , The structure of Hurnan Abilities, Wiley, New York, 1961. 16. J. L. Horn, and R. B. Cattell, "Refinement and test of the theory of fluid and crystall tzed,general intelligences," J. Ed. Psychol. 57, ZEB*270(1966); see also R. B. Cattell, "Theory of fluid and crystallized intelligence:A critical experiment,"J. Ed. Psychol.54,122 (Lg6B)and J. L. Horn, "Organization of abilities and the devel(1968)' opment of intelligence," Psychol.Reu. 75, 242-259 Intelliintelligence," general of construct L7. L. G. Humphreys, "The gence3, 105-120 (1979)18. J. P. Guilford, "Cognitive psychology's ambiguities: Some suggestedremedi€s,"Psychol.Reu.89, 48-59 (1982)' 19. A. Anastasi, Psychological Testing, 5th €d., Macmillan, New York, pp. 269-370, L982. 20. J. P. Guilford, The Structure-of-Intellect Model, in B. B. Wolman (ed.;, Hand,book of Intelligence, Wiley, New York, pp. 225-266, 1985. 2L. R. E. Snow, P. C. Kyllonen, and B. Marshalek, "The topographyof ability and learning correlations," in R. J. Sternberg (ed.), Ad' uancesin the Psychologyof Human Intelligence,Vol. 2, Erlbaum' Hillsdale,NJ, pp.47-103, 1984;B. Marshalek,D. F. Lohman,and R. E. Snow, "The complexity continuum in the radex and hierarchical models of intelligence," Intelligence 7, I07 -L27 o983). ZZ. D. Lohman, Spo tial ability: A Reuiewand Reanalysisof the Corcelational Literatltre, Technical Report No. 8, Aptitude Research Project, Stanford University, School of Education, Stanford, CA, 1979.
INTERNIST 23. Reference22, pp. 188-189. uelopment:A Multidisciplinary Approach, Vol. 2, Academicpress. 24. R. B. Cattell, "Theory of fluid and crystallized,intelligence: A New York, 1975. critical experiment," J. Ed. Psychol. 54, I-ZZ (1968). 47. H. E. Garrett, "A developmentaltheory of intelligence,"Am. Psy25. J. L. Horn, Fluid and Crystallized Intetligence:A Factor Analytic chol. L, 372-378 (1946). Study of the Structure among Primary Mentol Abitities, Univer48. R. Atkin, R. Bray, M. Davison, S. Herzberger, L. G. Humphreys, sity Microfilms, Ann Arbor, MI, 196b. and U. Selzer, "Ability factor differentiation, grades 5 through 26. R. J. Sternberg,Human Abilities: An Information-ProcessingApLL:' Appl. Psychol.Meas. l, 6b-7G (Lg7T. proach, W. H. Freeman, New york, 198b. 49. J. W. Pellegrino and R. Kail, Human Intettigence: Perspectiues 27. R. J. Sternberg,Beyond IQ: A Triarchic Theory of Human Intelliand Prospecfs,W. H. Freeman, New York, 198b. gence,cambridge university Press, New york, 1ggb. 50. G. A. Ferguson, "on transfer and the abilities of man ," Can. J. 28. E. Hunt, Verbal Ability, In R. J. Sternberg (ed.),Human Abilities: Psychol. LO,L2I-131 (1956). An Information-ProcessingApproach,w. H. Freeman, New york, 51. W. K. Estes, Learning, Memoty, and Intelligence, in R. J. Sternpp. 31-58, 1985. berg (ed.;, Handbook of Human Intelligence, Cambridge Univer29. R. J. Sternberg,Intelligence,Information Processing,and Analog sity Press,New York, pp. 170-224, L982. ical Reasoning: The Componential Analysis of Human Abilities, 52. D. Zeaman and B. J. House, The Relation of IQ and Learning, In Erlbaum, Hillsdale, NJ, 1977. R. M. Gagn6 (ed.),Learning and Indiuidual Dffirences, Charles 30. J. R. Frederiksen, P. A. Weaver, B. M. Warren, H. p. Gillotte, Merrill, Columbus, OH, pp. 192-2I2, L}GT. A. S. Rosebery,B. Freeman, and L. Goodmatr,A Componential 53. L. J. Cronbach and L. Furby, "How we should measure"ch&nge"Approach to Training Reading Skills, Report No. 529b, Final Reor should we?"Psychol.Bull.70, 68-80 (1920). port, Bolt, Beranek, and Newman, cambridge, MA, 1gg3. 54. E. A. Fleishman, "On the relation betweenabilities, learnirg, and 31. J. W. Pellegrino,Indiuidu:al Dffirences in Spatiat Abitity: The human performance,"Am. Psychol. 27, 1012-1082 (1972). Effects of Practice on Componentsof Processingand,ReferenceTest 55. P. L. Ackerman, "Individual differences in information processScores,Paper presentedat American Educational ResearchAssoing: An investigation of intellectual abilities and task perforciation Meetings, Montreal, Canada, 1988. mance during practice,"Intelligence10, 101-139 (1986). 32. M. T. H. Chi, R. Glaser, and E. Rees,Expertise in Problem Solv56. P. L. Ackerman and W. Schneider,Individual Differencesin Autoing, in R. J. Sternberg (ed.),Aduancesin the psychologyof Human matic and Controlled Information Processing,in R. F. Dillon (ed.), Intelligence,vol. 1, Erlbauh, Hillsdale, NJ, pp.7-76, 1992. Indiuidual differencesin cognition, Vol. 2, AcademicPress, New 33. E. Hunt, N. Frost, and C. Lunneborg, Individual Differences in York, pp. 35-66, 1985. Cognition: A New Approach to Intelligence, in G. Bower (ed.), 57. W. Schneider and R. M. Shiffrin, "Controlled and automatic huPsychologyof Learning and Motiuation, Vol. 7, Academic Press, man information processing:I. Detection, search,and attention," New York, pp. 87-L22, 1973. Psychol.Reu. 84, L-66 (L977). 34. C. E. Lunneborg, "Some information-processingcomelatesof mea58. W. James, Principles of Psychology,Holt, New York, 1890. sures of intelligence,"Multiuar. Behau. Res.18,158-161 (19?g). 59. A. N. Whitehead, as cited in M. R. Cohen and E. Nag el,An Intro35. A. R. Jensen and E. Munro, "Reaction time, movement time, and duction to Logic and Scientific Method, Harcourt, Brace, New intelligence," Intelligence 3, I?I-LZG (19?g). York, pp. 43L-432, 1934. 36. L. E. Longstreth, "Jensen'sreaction-time investigations of intelli60. J. Reason and K. Mycielska, Absent-Minded? The Psychologyof gence:A critiqu€," Intelligence8, 139-1G0 (1984). Mental Lapses and Eueryday Emors, Prentice-Hall, Englewood 37. P. C. Kyllonen, Dimensions of Information ProcessingSpeed, Air Cliffs, NJ, L982. Force Human ResourcesLaboratory Technical Report, Air Force Systems Command, Brooks AFB, TX, 198b. P. L. AcrnRunN 38. J. B. Carroll ,Indiuidual Dffirence Relations in Psychometricq,nd Universityof Minnesota Experimental Cognitiue Tasks, Technical Report No. 163, University of North Carolina, The L. L. Thurstone PsychometricLaboratory, Chapel Hill, NC, 1980. INTERACTIVE PROGRAM.See Natural-language interfaces. 39. N. J. Block and G. Dworkin (eds.), The IQ Controuersy:Critical Readings, Pantheon, New York, 1976. INTERLISP. SeeLISP. 40. B. S. Bloom, Stability and Change in Human Characteristics,Wiley, New York, 1964. 4r. T. J. Bouchard,Jr., and N. Segal, Environment and IQ, in B. B. Wolman (ed.), Handbook of Intelligence, wiley, New York, pp. INTERNIST 391-464, 1995. 42. L. L. Thurstone,"The absolutezeroin intelligencemeasurement," A medical consultation system for diagnosis of internal mediPsychol.Reu.35, 175-L97 (1928). cine, INTERNIST was written in 1975 by H. E. Pople and J. 43. J. Piaget, The Origins of Intelligence in Children, International Universities Press,New York, 1952. 44. I. e . UZgiris and J. McV. Hunt, Assessrnent in Infancy: Ord,inal Scalesof PsychologicalDeuelopmenf,University of Illinois Press, Urbana,IL, L975. 45. L. G. Humphreys and C. Parsons,"Piagetian tasks measureintelligence and intelligence tests assesscognitive development," Intelligence3, 369-382 (1979). 46. E. H. Lenneberg and E. Lenneberg,Foundations of Langua,geDe-
Myers at Carnegie-Mellon University and is now called CADUCEUS (seeMedical advice systems)[seeH. E. Pople,Heuristic Methods for Imposing Structure on Ill-structured Problems: The Structuring of Medical Diagnostics, in P. Szolovitz (ed.),Artifi.cial Intelligencein Medicine,WestviewPress,Boulder, CO, pp. 119-185, 19811. M. Tam SUNY at Buffalo
LAMBDACALCULUS
ACQUISITION.SeeMetaknowledge,metarules, KNOWLEDGE an metareasoning.
KAISSA Basedon an older program, KAISSA was rewritten by Mikhail Donskoy along with nine other Soviet scientists.KAISSA won the World Computer Championships (chess)tn L974 at Stockholm by winning all four of its matches (seeComputer chess methods). It uses the alpha-beta algorithm and searchesall moves to a specified depth. It also keeps a classification of moves and uses a method called "best-moveservice" to search optimal moves first [see G. M. Adelson-Velsky, V. L. Arlazarov, and M. V. Donskoy, "On some methods of chessplay programming:' Artif. Intell. 6, 36L-371 (1975)1. J. RosnNBERG SUNY at Buffalo
KI-ONE
SYSTEM.SeeControl structures; Expert KNOWLEDGE-BASED systems;Inference; Rule-basedsystems.
ENGINEER.SeeExpert systems;Rule-basedsysKNOWLEDGE tems. KNOWLEDGE REPRESENTATION.See Representation, knowledge. See also Epistemology; Frame theory; Inheritance hierarchy; Metaknowledge, metarules, and metareasoning; Semantic networks.
KRL
A frame-basedlangu agefor knowledge representation (qv) in procedural semantic (qv) approach,KL-ONE was developedby Brachman in 19?8 at BBN [see R. Brachman, A Structural Paradigm for Representing Knowledge, Report No. 3605, Bolt Beranek and Newman, Inc. Cambridge, MA, L978, and R. Brachman, "'What's in a Concept: Structural Foundations for Semantic Networks," in N. Findler (ed.), AssociatiueNet' works: The Representation and Use of Knowledge by Computers, Academic Press, New York, 3-50, 19791. A. HelrYoNGYuHen SUNY at Buffalo
TAMBDACATCULUS The lambda calculus grew out of efforts by logicians in the 1920s and 1930s to understand the notion of a mathematical function. The traditional view of a function as a set of ordered pairs with a fixed domain and range did not adequately describe the behavior of functions like the identity function, which works on any input whatsoever. The first key discovery, o&de by Frege in 1893 and rediscovered by Schdnfinkel in 1924, was that it sufficed to study functions of a single argument: Any function of the form f: A x B + C could be replaced by a function f' : A + (B - C), which, given its first argument, produced another function willing to accept a second argument. The name lambda calculu.s is de-
A frame-based language for knowledge representation (qv), KRL was developedby Bobrow and Winograd in 1977 [seeD. Bobrow and T. Winograd, "An overview of KRL, a knowledge representation language," Cog. Sci. 1, 3-46 (1977), and T. Winograd, Frame Representationsand the Deciarative/Procedural Controversy,in D. Bobrow and A. Collins (eds.),Representation and Understanding: Studies in Cognitiue Science, Academic Press, New Yorkl. A. HaNYoNG YUHaN SUNY at Buffalo
rived from Church's notation for such functions (seeChurch's thesis). Let r be a variable, and let M be an expression.A function F may be defined such that for any value of x, F(x) : M. Church (1) proposedthat this function be denoted ^x.M. Church, Curry, Kleene, and Rosser establishedthe basic theoretical properties of this formalism in the 1930s.In the late 1950sMcCarthy used the lambda calculus as the basis for the notation of procedures in LISp (qv). This gave it a wide exposure among computer scientists. In the late 1960s and early 1970s Strachey used the lambda calculus as a tool for specifying programming languages, and motivated by Strachey's work, Scott further developed the foundations of the lambda calculus, leading to a renewed research interest that continuesto this day (2). A lively history of the lambda calculus may be found in Ref. 3.
442
LAMBDACALCULUS
Foundations Notation. Since the lambda calculus involves functions whose results is a function, one often needs expressionssuch as (f(o)Xb))(c).To avoid excessiveparenthesization,it is standard to write fa for the application of f to a,and to assumethat application associatesto the left. Thus the complicatedexpression above would be written fabc. Parentheses may still be used for grouping. Basics.If ^n.M denotes the function F such that for any value of x, F(x) : M, then the value ofF can be computedon an argument N by substituting l/ into the defining equation. This leads to the basic axiom of the lambda calculus, the B rule: Qrn.It/I)N- MlI,{ lxl where MlNlxl denotes the term M withN substituted for each free occumenceof x [and with bound variables renamed to avoid capture of free variables (1-6)1. This axiom can be regarded as a rewriting rule, and an attempt made to reduceany L-term to a term in which the rule no longer applies. Such a term is said to be in normal form. In general, there may be many ways to reduce a term. However, the Church-Rosser theorem (4,5) states that if a term M reduces,by different paths, to two terms N1 and Nz,then N1 and Nz may be reduced to a common term P. This implies that if a term has a normal form, that normal form is unique (up to renaming of bound variables). This also implies that the formal system of B-reduction is not degenerate; that is, it is false that all terms are provably equal under the B rule. For example, the two terms S_ Ir.(Iy.(\z.xzj)D
K - ),r.(try.r)
are distinct terms in normal form and therefore cannot be proven equal. Definability. An extremely useful L-term is the fixed-point combinator Y : I/.(1,x.f(xx))(|,tc.11x6*y1 It can be shown that for any term M, M(YII/I) - YM. This property of Y can be used to write recursive definitions. Boolean values can be deflned by T - Ir.(Iy.r)
F - Ir.(Iy.y)
With these truth values, a conditional operator can be defined by (IUt+ N, P) - MNP. Another useful term is Church's pairing operator: lM, N : )tx.xMN With this operator, the first element of a parr p can be retrieved by the term pT and the secondelement by the term pF. For discussingthe expressivepower of the lambda calculus, it is useful to have a copy of the integers. The nth numeral n can be defined by 0-[T,Tl
n+1:[F,nl
With these operations and the fixed-point operation, one can define functions on the numerals. It can be shown that the lambda-definableoperations on the numerals are preciselythe partial recursive functions (3-5). Another useful set of operations concerns combinatory terms. Define a combinatory term to be a term built up from variables, ,S, and K by application alone, without any additional use of lambda. Then any lambda term is equal to some combinatory term. To do this, replace each lambda abstraction )\x,-M(where M rs already a combinatory term) by a term LxlM defined as follows:
lxlM
if r doesnot occur free tn M
[x)x : SKK lx)(M 1M) - S([x]M )(lxlM z) observingthatSKK:},,x.lc,itiseasytoshowthattx]M )tx.M. This algorithm is called bracket abstraction and has been used as a basis for hardware implementation of reduction.
Applicationsto Artificial Intelligence The primary relevance of the lambda calculus to AI is through the medium of LISP (qv). McCarthy used the lambda calculus as the basis of LISP's notation for procedures.Since that time, however, other programming languageshave used the lambda calculus in a more pervasive way. Scheme,for example, uses the lexical scoping rule of the lambda calculus rather than LISP's dynamic scoping and fully integrates the concept of functions as first-classvalues in the language. Church's pairing combinator serves as the basis for procedural data types like Hewitt's actors (seeActor formalisms) or Smalltalk's objects, which respond to messagesin order to communicate. In combination with the conceptof functions as first-class citizens, this leads to the paradigm of object-oriented programming (qv), in which all data are encapsulatedin functional capabilities. Becausethe lambda calculus is a convenient formalism for describing complex functions that take functions as arguments and return functions as results, it has been used to model domains in which such functions arise. One such application is the semantics of Montague grammars for natural language (7). Another such application is as a basis for the description of the semantics of programming languages (3). These sernantic definitions can be used for description, for building interpreters, and for building compilers.
BIBLIOGRAPHY 1. A. Church,"A set of postulatesfor the foundationsof logic"Ann. Math.33(2),346-366(1932). 2. J. E. Stoy, Denotational Semantics:The Scoff-StracheyApproach to Programming Language Theory, MIT Press, Cambridge, MA, L977. 3. J. B. Rosser,"Highlights of the history of the lambda-calculus," Ann. Hist. Comput.6,337-349 (1984). 4. H. P. Barendregt, The Type Free Lambda Calculus, in J. Barwise (ed.), Handbook of Mathematical Logic, North-Holland, Amsterdam, pp. 1091-1L32, L977.A goodshort introduction.
With this representation, the successorfunction can be defined on numerals as \rtc.fF,xf,the predecessorfunction as ),,x.xF 5. H. B. Curry and R. Feys, CombinatoryLogic, Vol. 1, North-Holland, (taking the second component), and the zero test as )w.xT. Amsterdam, 1958.
TANGUAGEACQUISITION
443
sisted of strings of words that were treated as sentencesand scenedescriptions that were encodedin associativenetworks. The associative network structure may be thought of as an encodingof a picture so the learning processis meant to represent language learning from sentence-picture pairs. The grammar is represented in an augmented transition network (5) (ATNs are described below). LAS obeyed commands to M. Wexn speak, understand, and learn. LAS "understood" by encoding NortheasternUniversity the meaning of an input sentencein the associativenetwork; LAS "spoke" by receiving such a network and encodingit in a sentence;and LAS "learned" by using a sentenceand an enLANGUAGEACQUISITION coding of the sentencemeaning to derive changesin the current ATN grammar. Though Anderson built LAS as a cogniA Taxonomyof Al Modelsof LanguageAcquisition tive model, he pointed out that LAS did not learn language as a child does.Its learning paradigm was more similar to that of are a Artificial Intelligence models of langu age acquisition an adult learning a secondlanguage. LAS was not based on proper subset of AI learning systems and rely heavily on work data of the way that children learn language. developmental in computational linguistics. AI models of language acquisibetween the child and LAS is that all the large difference One tion falt naturally into two main subdivisions: theoretical in the model before the mapping between exist must concepts models that learn formal languages and cognitive models that occur.This was true of Harris's system can concepts words and embody theories about the way in which children learn a lanproblem can be observedin many current same the also, and guage. Formal models have been very influential in defining models. Neither Anderson'snor Harris's language acquisition the complexity of the task of learning language. By contrast, proceeds as to make the same errors in a way such system psycholocognitive models are used as tools for linguists and in the manner that children to them correct make and children gists in the task of exploring and contrasting alternative theois This a very important criterion errors. to their correct come ries about the way in which children learn their native lanjudging systems that purport to acquire language as the for guage.Cognitive scientists who build computational modelsto child acquires a first language. embody psychologicaltheories do so becausethey believe that Even earlier than these two models is that of Kelley who for such models enable them to achieve a degree of explicitness his Ph.D. dissertation in 1967wrote the first computer simulathat is virtually impossible to attain when working only with (6). Kelley approachedhis model pencil and paper. Such models encouragea comparisonof the tion of language acquisition point from linguist's the of view. His model was a stage model results of the theory to actual data, which aids in refining proceeded and through the of one-word,two-word, and stages hypotheses and evaluating them. The discipline of building three-word utterances. Kelley's model focusedon attaining a psychological should theories embody models that theoretical not be confused with crude comparisons between brain and hierarchical representation of the three-word stage. The basic computer. The development and use of models of language syntactic categories were not learned by the system but were given. The system relied on a comparator module that indiacquisition is becoming more prevalent, and as more and more powerful models are developed, it is generally expected cated whether guessedsentencestructure was correct or incorthat they will make a major contribution to our understand- rect. This is a problematic strategy, 8s is discussedbelow. One ing of the fascinating question of how children learn lan- important aspect of this model was a weighting system that allowed guessesthat were not confirmed to die out through guage. absenceof confirmation. This model succeededin learning a HistoricalNotes. Models of language acquisition are a rela- simple grammar through experience even if noisy data were given, and it was intelligent enough to ignore sentencesthat it tively new development in AI. One early precursor was the did not understand and to select those sentencesto which it (1) TeachableLanguage Comprehender developedby Quillian. could respond. This model learned to understand English text but was developed as a theory of language understanding rather than as a lnnatistsversusEmpiricists.Since Chomsky published Asmodel of the processesof language acquisition. In 1974Harris, a computer scientist, developeda language-learning program pects of the Theory of Synto,x(7) in 1965 the dialogue between innatists and empiricists has fundamentally influenced the for a simulated robot (2,3),His rationale was that since it courseof researchin langu ageacquisition. The bias of an indiwould be an almost impossible task to predict in advance all the language capacity that might be desirable in a robotics vidual or research group toward the innatist or empiricist posystem, it would be useful to provide the system the ability to sition profoundly influences the questions asked and the manlearn language from a teacher. Haruis made no claims about ner in which they are posed.Chomsky's innatist position is the cognitive validity of his system, and it doesnot fall easily based on the fact that language is simply too complex to be into either of the two classesof language acquisition systems learned in the sensethat one learns mathematics, for example, defined above; nevertheless, Harris's simulated robot did ac- or to play chess. Yet all normal children the world over sucquire a subset of English from examples and was one of the ceed in acquiring a native language by the time they are earliest systems to acquire language. around five years old. Moreover, language is productive. The Also in L974 Anderson, a psychologist, built a Language child is creative in the use of language. Every child produces Acquisition System GAS) that was a psychologicalmodel of countlessoriginal sentences.By what processescan these ashuman language processing(4). LAS learned both to generate tonishing facts be explained? Chomsky enumerated five resentencesand to understand them. Input to the model con- quirements for a child to learn language:
6. H. P. Barendregt, The Lambda Calculus: Its Syntax and Semantics, North-Holland, Amsterdam, 1981(reviseded., 1984).The standard reference in the field. 7. D. S. Warren, Using Lambda-Calculus to Represent Meanings in Logic Grammars, Proceedings of the Twenty-First Annual Meeting of the Association for Computational Linguistics, pp. 51-56, 1983.
TANGUAGEACQUTSTTTON
1. a technique for representing signals, 2, a way of representing structural information about these signals, 3. some initial delimitation of a class of possible hypotheses about language structure, 4. a method for determining what each such hypothesis implies with respect to each sentence,and 5. a method for selecting one of the (presumably infinitely many) hypothesesthat are allowed by requirement (3) and are compatible with the given linguistic data. Since 1965 Chomsky has modified these criteria somewhat. The need for requirement 5 is deemphasizedby assuming language universals and a set of parameters that narrow the hypothesis space.A universal grammar is defined to be a system of principles that charactenze the class of biologically possiblegrammars. Emphasizing the biological foundations of language, Chomsky likens the "growing" of language to the gpowing of any other organ of the body. The child will hear the language of his or her environment, and discovering,for example, that the child's language uses subject-verb-object word order, this fact might act as a trigger for a set of related assumptions such as that the child's language is not an inflected one. Universal grammar then has highly restricted options and a few parametric variations. A given language would be acquired by adding rules to this universal grammar (8-11). A very important aspect of this innatist theory for language acquisition is that Chomsky continuesto define his approachas one in which only the moment of acquisition of the correct grammar is considered.Another way of characterizing the innatist approach to language acquisition is that it proceeds from a characterization of adult grammar and works backward to see how the child might arrive at this characterization. The alternative approach, that of the empiricists, is to work forward from the evidencethat the child provides toward some characterization of the adult language. The empiricist's approach is sometimes confused with the stimulus-response work of Skinner. In a volume published in 1957, Skinner (12) suggestedthat language might be viewed as behavior taught in the stimulus-response paradigm. No student of language acceptsthis suggestion today, and it is a mistake to confound empiricism with Skinner and the stimulus-response paradigm. Most empiricists view language development from a Piagetian point of view (13). They emphasizethe diversity of the world's languages and view this diversity as weakening of the arguments for a universal gtammar. Piaget regarded langrrage as only one aspect of the developing cognition of the child, and so the empiricists tend to view the acquisition of language in a larger cognitive sensethan do the innaticists. The Piagetian approach is to view the child, motivated by an innate desire to communicate, as actively constructing language, aided by innate cognitive schemasand mediated by the perceptive apparatus through which all humans perceive the world. Piattelti-Palmerini (14) offers a clear exposition of these two contrasting approachesto language acquisition. Of course, these positions are not necessarily antithetical. Most scientists believe that the truth about language acquisition will be shown to contain both empiricist and innatist aspects. Ultimately, everyone, innatists and empiricists alike, believe that language acquisition requires some innate sche-
mas. The as-yet unresolved question is to determine exactly what is innate. Models of language acquisition are very well suited to contribute to the search for some answers to this question. Theoreticallssuesin LanguageAcquisition LinguisticTheory. For a fuller discussionof linguistic theory see Computational Linguistics. In order to clarify the discussion that follows a few terms must be defined here. The Chomsky hierarchy of formal languages (15) ranges from the regular languages (type-3), which are defined by gTammar rules of the most restrictive form, through contextfree (type-D languages, context-sensitive (type-l) languages, up to the type-O languages that have no restrictions whatsoever placed on the format of the rules contained in their grammars. Most attention has focused on the context-free languages. Many models of language acquisition find that context-freerules are the easiestto learn. It is, however,generally believed that unaugmented context-freerules lack sufficient power to represent the natural languages.A context-free language is defined as one in which all the rules are of the form (a * F), where d is a single variable and F is any string. An example of such a rule is (S --+NP VP) or (NP --+ART N). These are sometimes termed production rules, sometimes phrase structure rules. The first rule states that a sentence,S, may be composed of a noun phrase, NP, followed by verb phrase, VP. The secondstates that a noun phrase, NP, may be composedof an article, ART, followed by a noun, N. It would take many such rules to represent a natural-seeming subsetof English, but the key restriction on the form of these contextfree rules lies in the definition that the symbol to the left of the arrow may be expandedat any time into the symbolsto the right of the arrow. If these rules are qualified in such a way as to say that NP could be expanded into ART N only when the NP is precededor followed by a given symbol, the rules will yield a more powerful context-sensitive language. A transformational theory of langu age defines a normal form and a set of transformations over this normal form. Both the normal form and the transformed form are said to represent the same deepstructure or underlying meaning. Thus, for example, a sentencesuch aS"Daddy gave the boy a toy" might be regarded as an English sentence in normal form and the sentence "The toy was given to the boy by Daddy" might be regarded as an English sentencethat has undergone a transformation. Transformational languages are at least as powerful as the context-sensitivelanguages. An augmented transition network is a formalism used to describegrammar rules. The example context-freerules above could be encodedin the two transition nets such as those pictured in Figure 1. Networks encodeexpectations.This net says that expecting an S causesone to look for an NP followed by a VP. The arcs between nodes represent word classesthat enable a transition from one node to another; the node names are arbitrary. Such networks as these are very familiar to computer scientists and linguists becausethey occur in many different formalisms. Networks can becomefar more complicated of course,and the augmentation of the ATN takes the form of memory stacks that make it possible to keep track of additional information including context. The network of Figure 2 permits a sentenceto be encodedin two alternative fashions, either as a noun phrase followed by
LANGUAGEACQUISITION
Figure 1, Transition networks for the rules (S NP+ ART N).
an intransitive verb or as a noun followed by a transitive verb and a noun phrase. The use of memory stacks gives the ATN all of the power of a Turing machine, which means that it can do any computation at all. An ATN need not be restricted to parsing only context-free or even context-sensitivelanguages. This is important becausemost linguists believe that natural Ianguage requires more power than the context-free formalism, though recently there have been attempts to show that English may indeed be classified as a context-free language. The July-December 1984 issue of the Journal for Computational Linguistics (16) is devotedto discussionsof this issue. There is always a variety of different formalisms of equivalent power for representing a language. The choiceof a formalism becomesespecially important, however, when a learning model is to be constructed since some formalisms may facilitate learning. It can be argued that ease of learning is one criterion by which a formalism should bejudged. All the cognitive modelsthat are discussedassumethat the investigation of language is tractable to modular treatment. This assumption that the rules of grammar (the syntax of language) can be represented separately from the representation of meaning (the semantics of language) and are mediated by customs of usage such as social customs and language appropriate to particular situations (the pragmatics of language) is an essential assumption to all computational models of language. Another method of characterizing the models of language acquisition is by whether the processingis primarily semantic with syntax checking secondary or primarily syntactic with semantic checking secondaryor whether an attempt is made to treat the two aspectsin parallel. LearnabilityModels. Learnability theory addressesthe formal problem of determining the conditions under which cer-
Figure 2. A transition network showing alternate paths.
44s
tain types of languages can be learned. Pinker (18) offers an overview of work in this framework. The importance of learnability models lies in their clear demonstration of the need for constraints on the possible grammatical systems that can be Iearned. GoId (19) was a mathematician who in 1967posedthe formal problem of learnability and demonstrated that the defining set of rules for a grammar for any nontrivial language cannot be learned from positive examples alone. Since children are given no explicit examples of sentencesthat are not in their language, learnability theory seeks constraints that may be assumedto be innate in humans, which would rule out most of the otherwise computationally intractable sets of languages to be learned. Wexler and Culicover (10) took up the challenge of the learnability question posedby Gold. In their book Formal Principles of Language Acquisition, Wexler and Culicover have assumeda formulation of language in a traditional transformational framework and have formally demonstrated the need for a set of learnability restrictions or constraints on the operation of transformational rules. They assumedthat the child can derive the meaning of adult utterances from extralinguistic information and that the child can derive the deepstructures of a transformational grammar from these meanings. Then, given a pairing of word strings and deepstructures, the learning device tries to find a set of transformations in its cument hypothesized grammar that will map from the deep string to the word string. Failure to find such a set results in a revision of the grammar. The work of Wexler and Culicover is doubly interesting since restrictions derived from a mathematical approach to learnability theory coincide in many instances with constraints noticed quite separately by linguists working in a traditional framework. Note that no one, certainly not Wexler and Culicover themselves,would claim that their model learns language in the way that children do. The work of Wexler and Culicover falls in the area of mathematical inquiry since their theories for the most part have not been implemented on a computer. These are models that could in theory be implemented, and some theoretic work by scientists in AI is closely related to their work. Berwick has developeda LISP program that embodiessome of the theories of the transformational grammar (20) and, by examining example sentences,is able to modify the rules of a grammar. Berwick shows that learning syntactic transformations can be easy, provided the program begins with the right initial computational structure. His model learns to parse in a fashion based on Marcus's deterministic parser (2L). Marcus's parser consistsof a structured working memory and a set of productions that add items at specifiedpoints in the memory, move the items around within the memory, and add descriptive labels to them. Berwick's learning system assumesthat a limited set of productions are innate and proceedsto learn the transformations, that is, the rules for moving items and adding descriptive labels to them. Given an initial set of abilities, an interpreter, a lexicon, a limited set of phrase structure rules, and selectionalrestrictions, his model is able to glean a large set of complex rules from simple example sentences.The successof his program is basedon constraints. Given a rich set of features and the ability to categorrzelexical items as nouns, verbs, or "oth et," the model discoversa variety of word classes within the other category. Learning takes place whenever the parser is unable to parse a sentence.If in the midst of a parse no known grammar rules can trigger, the system tries to build
446
TANGUAGEACQU|SIT|ON
a new grammar rule. It has four possibleactions and tries each in turn. These involve attaching an item in a parse tree, switching items, and inserting items. For example,if the system has rules to parse the sentence"Daddy gave the boy the toy," then if the model encounters the sentence,"Did Daddy give the boy the toy?" it is at a loss, since did cannot be a noun phrase. However, by means of the switch rule, did and Daddy can be switched and the rest of the parse is successful.Then this new grammar rule is addedto the model'srepertoire. It is important to note that this model starts with a large body of assumptions about the nature of language and of parsing. Given these assumptions and the computational constraints, the system is able to infer many complex and high-level rules. Berwick's model doesnot use negative information but learns as the child doesfrom language experience.There is no provision for the system to make and correct errors. Though included here because of its formal antecedents,the Berwick model may also be viewed as a cognitive model in that the model does reproduce certain child language developmental data. A very important hypothesis that this model suggestsis that the constraints necessary for efficient parsing may turn out to be the very same constraints that are necessary for learnability. Wolff (22) has developed an algorithm that finds the distinct words in pieces of text from which all the spaceshave been removed. This algorithm for segmenting input to the model scans for commonly occurrittg adjacent strings. This work provides concreteillustrations of how at least part of the task of segmenting sound into word units might be dealt with. This work has been expandedby Wolff into a model that shows how a similar approach can be implemented to segment text into phrasal segments (23). Wolff makes no claim that his models learn as the child does,but as he points out, valuable insights can be gained by developing and testing computer models of theoretical proposals and observing their strengths and weaknesses.
to use the word goed. Presumably this is becausethe child has formed a general schema for forming the past tense of verbs. Eventually, of course, children learn that go is an irregular verb and doesnot obey the general rule in the form of its past tense. But the puzzle is that for a period of time, sometimesfor years, both forms exist in the child's vocabulary. How can this period of imbalance between the erroneous and the coruect forms be explained? This behavior cannot be explained if the Ianguage mechanism is expressedin terms of explicit rules the child either knows or does not know. McClelland and Rumelhart's model learned the past tense of some 420 verbs in English, some regular and some irregular. By means of this connectionist model McClelland and Rumelhart can explain the period of instability, and they have found a rough correlation between the difficulty of learning in their model and in Bybee and Slobin's observations(26) on the courseof learning in the chitd. How hard a word form is to learn in the model dependson the corpus as a whole. Connectionist models have yet to be explored to any extent in the field of language acquisition, but they are an excellent example of the way in which a different kind of AI formalism may modify something so basic as one's idea of what a grammar is. It will be very exciting to explore further the strengths and weaknessesof such models for language acquisition.
Connectionist Models. An entirely different kind of language acquisition model should be mentioned here. Connectionist models may be characterized by the fact that they do not encodeexplicit rules at all in the sensedescribedhere. The January-March 1985 issue of Cognitiue Science(24) was devoted to connectionist models and their applications. Inspired by the parallelism of the brain, these models function by connecting a large number of very small processingunits. These units are embodied in networks. A connectionist model embodies the idea of interactively activating subsectionsof the network. Each of the small units in the network is connected to a large number of others. Each unit samples its input connections from other processingunits and modifies its outputs, which are also connectionsto other processingunits. The sum total of these excitatory and inhibitory effects causesthe network to converge on a decision about a hypothesis. Thus, conspiracies of mental agents permit generalization about lawful behavior. McClelland and Rumelhart Q5) have implemented a connectionist model to explain the way in which children form the past tense of verbs in English. Consider the verb go. Tt is an empirical fact that children at the earliest stage of language acquisition typically learn the word went and use it correctly. One may assume that these forms have been learned by rote. Then at a subsequentstage of developmentthe child will start
PsycholinguisticStudies. There exists a wealth of psycholinguistic studies concerning the acquisition of language of children from all over the world. Although the literature abounds with collectionsof language acquisition data, there are a great variety of ways of explaining and interpreting the data (271z).There are many methods that psychologistsand linguists have used to collect data, including diary records(33),elicited production tasks (34) in which the child may be asked to describe a picture or a situafion, or tests of understanding in which the child is asked to act out with toys sentencespresented by the researcher (35,36). Explanations are sought to account for the progress from one-word utterances (37) to short utterances that include only content words to the acquisition of morphemesthat have been shown to appear in all children in a particular order (26D.The stage in which children use short utterances containing only content words has been called telegraphic speechbecause of its resemblance to the cryptic kinds of messagesused in telegrams. Thesebrief utterancesencodea set of relations found in early child speechin all languages. Particular attention has been paid to the errors that children make since child speech that differs from adult speech yields clues concerning the processesthat the child is using to understand and produce speech.An equally important clue to the child's processesare those errors the child typically does
CharacterizingCognitiveModelsof LanguageAcquisition Cognitive models of language acquisition not only ask under what conditions a langUage can be learned by a computer model but also if the model learns the language in the same way that the child does.As a consequence,these models may be distinguished by their attention to the psycholinguistic data on the manner in which children learn their native tongues. The models described here all concern themselves with the acquisition of syntax, which is, of course,only a small part of the language acquisition task.
LANCUAGEACQUISITION
not make. One frequently noted phenomenon is the overgeneralization of the plural of nouns and of the past tense of verbs in English, as was discussedin the preceding section. Such phenomena need explanation. A cognitive model of language acquisition therefore must not only learn to understand and generate sentencesof ever greater complexity, as doesthe child, but to provide a satisfactory explanation of the courseof langUage acquisition, the model must also make the same kinds of error that the child makes and eventually correct the errors after further learning has occurred. Four different exemplars of the cognitive model of language acquisition are described in the following sections. The CHILD model of Selfridge focusesmainly on the use of semantics in the acquisition of langu age.The first version of Selfridge'smodel learned to understand commands (38). A later version learned to generate language as weII (39). The models of Langley (40), Hill (4L-43), and MacWhinney (44) focusmainly on the acquisition of syntax. Langley's AMBER and MacWhinney's competition model learn to generate language, and Hill's model learns both to understand and to generate language at the level of the two-year-old.
447
error recoveru sgstem odult sentence c o m p o r ea d u l t a n d s g s t e m sentences
od;ustif different
meoning grommar is induced
sgstemgeneroted sentence
topic or gool 0f sentence
Figure 4. componentsof Langley'sAMBERmodel.
rules and rules of inference. Initially the model has no langlrege, and eventually it learns to understand commandsand to respond to them. The system was basedon psycholinguistic data collected from a child, Joshua, and models the way that Joshua proceededfrom very little language knowledge to the point where he could understand even unlikely commands such as to "get on the tape recorder." The adult sentences of is characteristic It a Models. Cognitive Four of Overview cognitive models of language acquisition that the learning given the model are taken from the recorded interaction bethat takes place is more important than the output of the tween adults and Joshua. Understanding takes the form of system, so the models must be describedin terms of the knowl- establishing a mapping between the conceptual dependency edge structures that are built as the model acquires language. frames and the lexical information. The model has successSelfridge's model has as its input adult sentences together fully learned subsets of both English and Japanese. Langley's AMBER (40) is a model of language acquisition with simulated visual input. The output of the model is the child's response.In the first version of CHILD this responseis through error recovery. Langley's system accounts for the an action; in the subsequent version, in which a language gradual learning of language over time and also for the order in which morphemes are mastered. generator was added, the responsemay be verbal. Figure 4 illustrates the componentsof the model. Input to Figure 3 is a diagrammatic representation of the model. In Figures 3-6 data structures are represented in a large box, the model consists of randomly generated adult sentences and the inherent processesthat act on these data structures paired with meaning representations and a representation of are representedin a smaller box above.The seriesof diagrams the main topic of the sentence.Output of the model is a sentence that the system generates. AMBER gets a proposition, highlights certain similarities of structure between the various models. It should be emphasizedthat the grammars and predicts a sentence,and then comparesthe system-generated processesemployed by each of these models are quite dissimi- sentenceto the adult sentenceit received and adjusts the syslar. Selfridge assumesthat conceptsexist before language is tem to account for a discrepancy between the two. The goal of learned. The knowledge structures that are built up consist of AMBER's generated sentenceis to describe the main topic of attaching lexical knowledge to the conceptsthat are repre- the sentence.The model is implemented in the computer lansented in the conceptual dependencyformalism of Schank (45). guage PRISM (46), which accounts for the gtadual learning The conceptual dependencyframework is used to build a dic- process.Meaning is representedin a tree structure employing tionary of words and meanings together with slots to be filled a small set of relations such as agent, action, object, size, and and positional information related to where slot fillers appear color. In addition, features such as singular, plural, present, in the sentencewith respect to the verb. The model makes use past, and so on are defined.The system is given the capacity to of mechanisms for focusing attention and a set of learning understand via the meaning representation; it learns to produce language. Hill's model (41) of language acquisition in the two-year-old is based on data collected from a two-year-old child, Claire. f o c u so f a t t e n t i o nm e c h a n i s m l e o r n i n gr u l e s The structure of the model is depicted in Figure 5. i n f e r e n c er u l e s Input to the model consists of adult sentencestaken from the transcribed sessions.Output of the model consists of a childlike sentencerepeating or responding to the adult input in accordancewith the current state of the model's grammar. odult sentence fromes conceptusldependencU chilrl response The e n c o c l i n gw o r l c l k n o w l e d g e internal representation of the model consistsof dynamic in the form of o n d m e o n i n gr e p r e s e n t o t i o n data structures encoding the child's grammar, the conceptual sn sction or 6 verbol response knowledge of the child, and the physical context of the diao s s o c i o t e dl e x i c 8 l 6 n d s u n t o c t i c informotion logue. The model is given a basic lexicon and a set of concepts with a mapping between the two. No assumptions are made about the ultimate form of the adult grammar, nor about what Figure 3. Componentsof Selfridge's CHILD model.
TANGUAGEACQUtStTfoN
H g p o t h e szi e word classes ond grammor 6 e n e r o li z e word closses and grommor A s s i m i l a t e n e \ , /w o r c l s o n d n e w concepts A c c o m m o d o t es t r u c t u r e t h r o u g h s u c c e s s vi e r e o r g o n zi o t i o n
L e x ic o n Adult Sentences 6rammor C o n c e p t sa n d W o r l d K n o w l e d g e P h g s i c o lC o n t e x t of Utterance
C h il r l - l i k e R e p e t ti i o n or Response
P r e s e n t P h g s i c o lC o n t e x t
Figure 5. Components of Hill's modelof languageacquisitionin the two-year-old.
must be built in to the model, but the model is developedstep by step to represent the progressof the child, Claire, over very fine time slices. Processesattend to the adult input and use :rules of salienceto focus on exampleswithin the adult data in order to form word classesand build a grammar. The model is written in LISP using the semantic net language GRASPER (47). The world knowledge is encodedin a semantic net as are the grammar templates and the lexicon (48). The model uses its language experienceto place words in word classesand to build a grammar that is at first a flat template grammar but eventually evolves into a grammar that can best be described by a set of recursive context-free phrase structure rules. MacWhinney's competition model for the acquisition of syntax (44) builds on an earlier model of MacWhinney's of the acquisition of morphophonology (49). In this earlier work MacWhinney separatedout three processes(rote, analosy, and combination) that are joined into a single model in the newer work. The acquisition of syntax model employs a strengthbased conflict resolution paradigm to induce a set of lexical structures. Input data is in the form of adult utterances and is taken from the mother-child interactions that appear in Sachs'sdetailed study of her daughter Naomi (50). Thesetranscripts are available through the child language data exchange system (51). A diagram of the model appears in Figure 6. Although the model is being built to account specifically for the Sachsdata, linguistic data from many languages has been taken into account in building the model. Each adult utterance is paired
with a propositional representation of the event structure it represents.Output from the model is a childlike sentence.The model employs a paradigm of parallel interaction and competition between data structures to induce a lexical functional grammar. The model is being implemented in Franz LISP. Meaning representation in the model takes the form of complex mappings between meaning and utterances.It is assumed that the child attempts to learn words for meanings that the child wants to express.The child therefore is granted a form of mental representation that includes the propositional structure, which is the basis for semantic interpretation. What is learned is a set of lexical structures that embody rules of syntax in English. Systemsof Crammar. Each of the cognitive models of language acquisition under discussion learns a different representation of the rules of syntax in English. Selfridge'sconceptual dependency frame representation (38) embodies a representation based on meaning that associatesslots to be filled with each concept.In this paradigm learning language is essentially learning to fill slots and learning positional information about where slot fillers go in a sentencein relation to other words in the sentence. The system assumes that the child at age 1 possessesknowledge of actions, objects, and spatial relations and has the ability to discriminate between these classes.Lexical class information given the model consists of only these three classes.In a sentencesuch as "Daddy gave the boy a toy" the conceptual dependency framework would encode the information that the action of giving requires a specification of what is given, to whom it is given, and who is the giver. Positional information would be encoded to capture the fact that in this sentencethe giver precedesthe verb, the recipient follows the verb, and the object given follows the recipient. Langley's model employs a processgrammar in which the grammar is actually encodedas a set of actions (seeNaturallanguage generation). Figure 7 illustrates a tree structure to represent meaning. The following agent-object rule wold be constructedto produce the utterance Daddy toy: If you want to describe node 1 and node 2 is the agent of node 1 and you have describednode 2 and node 3 is the object of node 1, then describenode 3 (40).
s e n t e n cgeo r s e r sentencg eenerotor
odult utteronce
l e x i c o ls t r u c t u r e s i n c l u c ebdg t h e m o d e l
chrI c l sentence
p r o p o s it i o n o l r e p r e s e n t o t oi n of event structure
Figure 6. Components of the competition model for the acquisition of by Macfrhinney and Anderson. "yit.*
Daddg
g rv e
Figure 7. A tree structure model'
t0u
representing
meaning
b0g
in the AMBER
TANGUAGEACQUISITION
Langley gives his model lexical information in the form of the case frames of words. The model constructs rules such as the above for producing utterances to express these case frames. It can be assumed that were this model expandedto include rules for understanding, the rules for understanding might look quite different from the rules for production. Different sets of rules for understanding and producing language are appealing when grammar is viewed as a set of dynamic processesrather than as a static set of rules. This is an issue worthy of further exploration. Langley's model acquires rules not only for telegraphic speechbut subsequently also acquires rules for production of morphemes. In Hill's model a flat template grammar is used that consists initially of specific examples drawn from the input data. From an adult sentence such as "Daddy gave the toy to the boy" the model would initially respondwith a single word such as toy. A subsequentpresentation of the same sentencemight causethe model to acquire a template for goue toy where gaue would be classified as a relation word and toy as a slot filler. Yet another presentation of the sentence might cause the model to learn the template Daddy gaue where Daddy was a slot filler, and eventually the template (slot-l gaue slot-2) would be learned for Daddy gaue toy. What was learned in each presentation of the sentencewould depend on the language experience of the model and what had been learned so far. No information is given the model about word classes,but hearing sentencessuch aS"Mommy gave the toy," "John gave the book," and "Sue gave the piece of cake" would eventually cause the model to put Daddy, Momffi!, Joh.n, and Sue all together in a word class meaning words that stand for agents of the relation word gaue.Note that it would not matter if the input sentenceswere far more complex than those used for illustration. If the model is focusing on the word gaue, a sentence such as "Mommy gave the toy to Sue while she went into the store to buy groceries" would have just the same effect as the short sentences used above for an illustration. By this processword classesare derived from the child's own ability to producelangu age.The processresults in a multiplicity of overlapping and intersecting word classes.Processesof generaltzation eventually also permit the classifying of relation words that might permit, for example, giuing and bringing to be relation words that could be classed together as words that have similar syntactic properties. Successivereorganizations of the grammar and the lexicon occur as learning takes place. Eventually, with the addition of the concept of coordination, this processof generalization causesthe initial flat template grammar to evolve into a recursive processgrammar which might be described as context-free. MacWhinney's model learns a lexicalist grammar much like that of the lexical functional grammar of Bresnan (17).In the MacWhinney model the lexicon is the sole repository of grammatical knowledge. For MacWhinney, like Hill, a goal of the model is to formul ate early lexically basedpatterns in such a way that the formulations survive into adult language. MacWhinney, like Selfridge, assumesthat the conceptsof objects and actions/processesare fundamental to thinking, and so their word classesbuild on this fundamental assumption. Learning, Making Errors, and Correcting Them. All four models learn from language experience. All four models assume that the child has previously acquired someconceptsand world knowledge before language learning begins. All four
449
models assume that certain learning abilities are innate. Using a set of inference and learning rules, Setfridge's model takes words from the input sentences,retrieves appropriate conceptualdependencyframes, and matches the words against the conceptual dependencyrequirements for filling slots. The matching is basedon word meaning and word categories.Ultimately, CHILD learns to understand simple commandsand to producesimple declarative sentences.In the processthe model makes overgeneralizations similar to those children make
(3e). Langley's AMBER starts with one-word utterances and ultimately learns to include appropriate morphemes, suffixes, and prefixes in the utterances produced.Becauseit is an adaptive production system that attaches weights to rules to indicate the relative strength or weakness of the rule, AMBER Iearns very slowly. The samerule must be learned many times before the rule is strong enough to mask a previously learned rule. In this respect AMBER's learning paradigm is more cognitively valid than CHILD's. The more complexthe conditions on the morpheme use, the longer it will take AMBER to master the correct use of the morpheme. AMBER offers an explanation through its learning paradigm of the fact observedby Brown (27) that it takes the child longer to master certain morphemes than others. One very powerful aspectof the system is that the same rules learned for producing word combinations can be used at a more advancedlevel to learn to produce embeddedconstructions. Hill's model learns by selecting appropriate examplesfrom the language experience according to rules that encodethose things that are salient to the young child. Like Selfridge,Hill's model relies on word order and on encoding relations. The learning processesin Hill's model emphasizethe interaction of cognition and language learning. The learning is highly dynamic in that the same body of input presentedto the model a secondtime causesa different set of grammar rules and additional lexical class information to be learned. MacWhinney and Anderson spell out nine learning strategies: amalgamation acquisition, componentanalysis, learning syntactic rules, strengthening, gener alization, di scrimination, procedurahzation, composition, and inference (52). MacWhinney's model also relies on attention to word order and characterizing relative word positions as, for example, preverbal or postverbal. The model learns strategies for parsing in the sensethat learning a new verb leadsto opening up new slots to be fiIled. In this model all learning is targeted toward the acquisition of lexical entries. All four models make errors of overgeneralization,just as the child does.None of the models rely on overt correction of errors since it is generally agreedthat explicit overt correction of syntax plays no role in language learning in the child. It is in the area of correcting errors that the models differ. Langley, Hill, and MacWhinney all employ somesystem of weighting factors to differentiate between strong and weak hypotheses.Selfridge's CHILD lacks such a system of weights or confidencefactors and so learns immediately rather than gradually. The unstable period in which both erroneous and correct rules exist simultaneously is not represented in the Selfridge model. The strength-basedlearning paradigms of the other three models enable the models to focus on rules that were reinforced or strengthened through use and also to give greater attention to newly learned rules than to older ones. Whereas Selfridge's and Hill's models rely on generalization,
4s0
IANCUACE ACQU|S|T|ON
Langley's model employs discrimination, and MacWhinney's model both generalizes and discriminates. Generalizing is characterized as the processof going from specific rules to a more general rule. The specificrule is still available, but if it is not used, it will fail to be reinforced and will eventually decay. Discrimination is the opposite processof starting with overly general rules and finding and adding missing conditions.
thus suggesting a way that child language may evolve into that of the adult. The model, however, never proilressesbeyond the level of the two-year-old. This model more than the others is offered as a theoretical tool for linguistic experimentation and contains a great deal of flexibility including a variety of parameters and alternative modules that can be added or omitted in order that the differing results may be observed. MacWhinney and Sokolov's model, Iike the others, begins with single-word utterances and proceedsto acquire more and Evaluatingthe PresentModelsand LookingToward the Future more complex language skills. This model is an ambitious There is clearly much work still to be done in the modeling of project that is not at present fully implemented. The model is language acquisition. These models focus mainly on low-level given a careful and extensive meaning representation to acmorphosyntactic effects. Much of the rich and detailed pat- companythe input sentences.As a consequenceof the detailed terns of actual language development is as-yet virtually un- information given, it is anticipated that the model will eventutouched.There is a vast amount of data collectedon the child's ally acquire the capacity to deal even with such tough problearning of anaphora, pronouns, question forms, and the like, lems as center embedding and relative clauses. Though none of the cognitive models assumea specificset of which have not been addressedby any cognitive model. It is hopedthat future modelswill be built that will provide easeof linguistic universals, none are inconsistent with a theory of experimentation for linguists and psycholinguists. Today's linguistic universals, and all may be seen to embody certain models unfortunately must be built by skilled practitioners of innate processes.Since it is incumbent upon a computational AI techniques and are not easily modified or embellished by model to be precisely explicit about what is built in and what any others than the authors and their students. The models is learned, it will be possible in the future as more and better are important in that they are prototypes for more extensive modelsare constructedto clarify in a fashion heretoforeimpossible exactly what may be learned given a set of innate schemodels of the future. Berwick's and Wolff's models deal with questionsof learna- mas and a set of processesfor acting upon those schemas. bility, and future models of this ilk will continue to yield important results in learnability theory concerning the kinds of languagesthat can be learned and the constraints necessaryto BIBLIOGRAPHY make learning possible. Selfridge's model starts where the child starts, with some 1. M. R. Quillian, "The teachablelanguage comprehender:A simulaconceptsin place but no language skills. The model starts as a tion program and theory of language:' CACM 12,459-476 (1969). poor language user, gains experience,makes appropriate er2. L. R. Harris, Natural Language Acquisition by Robot, Technical rors and learns to correct them, and gets progressively better Report TR74-1, Department of Mathematics, Dartmouth College, at understanding. Having learned to understand first just Hanover, NH, October, L974. words, it eventually learns to understand syntactic informa3. L. R. Harris, "A system for primitive natural language acquisition and to associate syntactic information with words. The tion," Int. J. Man-Mach. Stud.9, 153-206 (L977). ability to learn to talk is driven by the ability to learn to 4. J. R. Anderson, "Induction of augmented transition networks," understand, which is inherently satisfying. The model can Cog.Sci. L, L25-157 (1977). learn languages other than English, and Selfridge predicts 5. W. Woods, "Transition network grammars for natural language that the model will learn language up to the adult level. Since analysis,"CACM 13, 591-606 (1975). the present model can deal only with simple commands and 6. K. L. Kelley, Early Syntactic Acquisition, Ph.D. Dissertation, declarative sentences,it remains to be seen what modificaUniversity of California at Los Angeles, 1967, also published as tions the current rules may need in order to attain the comReport No. P-3719, The Rand Corporation, Santa Monica, CA, plexity of adult language. November 1967. Langley's model begins with single-word utterances and 7. N. Chomsky, Aspectsof the Theory of Syntan, MIT Press, Camgradually learns to utter sentencesof the complexity of the bridge, MA, 1965. adult. Unlike Hill's model, Langley's model learns suffixes, 8. N. Chomsky, Language and Responsibility,Pantheon Books,New prefixes, and morphemes.It does not learn to understand but York, L977. is given a meaning representation for adult utterances that it 9. N. Chomsky and H. Lasnik, "Filters and control," Ling. Inq. 8, eventuatly learns to mimic in all their complexity. The system 425-504 (Summer L977). has little need of complexworld knowledge sinceit doesnot try 10. K. Wexler and P. Culicover, Formal Principles of Language Acto answer questions. The idea of a grammar as embodiedin a quisition, The MIT Press, Cambridge, MA, 1980. system of processesfor understanding or producing language 11. R. Berwick, Learning Structural Descriptions of Grammar Rules is cognitively appealing. from Examples, Pro ceedingsof the Sixth International Joint Con' Hill's model starts with very little information and learns ferenceon Artificial Intelligence, Tokyo, Japan, pp. 56-58' 1979. to understand and produce language at the level of the twoL2. B. F. Skinner, Verbal Behauior, Appleton-Century-Crofts, New year-old. The model progressively acquires more and better York, L957. iung.t*g. skills. It can repeat adult sentencesin a childlike 13. J. Piaget, Child's Conceptionof the World, J. Tomhinsonand A. Tomlinson (trans.), Littlefield Adams and Co., Totowa,NJ, L979. manner and answer questions as well. The paradigm suggests SeeJ. H. Flavell, The Development of Psychologyof Jean Piaget, a manner in which language based initially on cognitive Van Nostrand, New York, 1963. knowledge may grow into a syntactic system that will be increasingly independent of its semantic and cognitive origins, L4. M. Piattelli-Palmarini (ed.),Language and Learning': The Debate
LANGUAGEACQU|S|T|ON
4s1
Between Jean Piaget and Noam Chomsky, Harvard University Press, Cambridge, MA, 1980. 1 5 . N. Chomsky, Formal Properties of Grammars, in R. D. Luce, R. R. Bush, and E. Galanter (eds.),Ha,ndbookof Mathematical Psycholog!, Vol. 2, Wiley, New York, 1963. 1 6 . C. R. Perrault (ed.), "Special issue on mathematical properties of grammatical formalisms," Computat. Ling. f0(g-4), (July-December 1984). L 7 .J. Bresnan (ed.),The Mental Representationof Grammatical Relations, MIT Press, Cambridge, MA, 1982. 18. S. Pinker, "Formal models of language learnin gi' Cognition 7, 2L7-283 (1979).
4, University of MassachusettsOccasionalPapers in Linguistics, Amherst, pp. 105-L26, 1978. 37. L. Bloom, One Word at a Time: The (Jseof Single Word (Jtterqnces Before Syntae, Mouton, The Hague, lg7g. 38. M. Selfridge, Inference and Learning in a Computer Model of the Development of Language Comprehensionin a Young Child, in W. Lehnert and M. H. Ringle (eds.),Strategiesfor Natural La,n_ gua,geProcessing,Lawrence Erlbaum, Hillsdale, NJ, pp. 299 -926, 1982. 39. M. Selfridge, Why Do Children Say 'Goed'?:A Computer Model of Child Generation, Proceedingsof the "Third Annual Conferenceof the Cognitiue ScienceSociety,Berkeley, CA, pp. 191-182, August 1 9 81 . 19. M. E. Gold, "Language identification in the limit," Inf. and Con40. P. Langl"y, "Language acquisition through error recoveryl' Cog. trol LO,447-474 (L967). Brain Theor. 5,2LL-255 (1982). 20. R. Berwick, Computational Analogues of Constraints on Gram4I. J. C. Hill, "A computational model of language acquisition in the mars: A Model of Syntax Acquisition, in Proceedingsof the 16th two-year-old,"cog. Brain Theor.6(g),zg7-B1z (1ggg). Annual Meeting of the Association for Computational Linguistics and Parasession on Topics in Interactiue Discourse, University of 42. M. A. Arbib, E. J. Conklin, and J. C. Hill, From SchemaTheoryto Pennsylvania, Philadelphia, June 1980. Language, oxford university Press, New york, 19g6. 2L. M. Marcus, A Theory of Syntactic Recognition for Natural Lan43. J. C. Hill, Using a Computational Model of Language Acquisition guage, MIT Press, Cambridge, MA, 1980. to Address Questions in Linguistic Inquiry , Proceedings of the 22. J. G. Wolff, "Language acquisition, data compressionand generalSeuenthAnnual Conferenceof the Cognitiue Science Society,L)niization," Lang. Commun. 2(I), bZ-89 (1932). versity of California, Irvine, August lggb. 23. J. G. Wolff, "Language acquisition and the discovery of phrase 44. B. Macwhinney, "competition," in B. Macwhinney (ed.),Mechastructur€," Lang. Speech23r ?EE-ZO9(1980). nisms of LanguageAcquisition, Lawrence Erlbaum, Hillsdale, NJ, 1987. 24. D. Waltz (ed.),"Special issue:Connectionistmodelsand their ap45. R. C. Schank, Identification of ConceptionalizationsUnderlying plicatiors," Cog.Scl. 9(1), 1-120 (January-March lggb). Natural Langudge,in R. c. schank and K. M. colby (eds.),coi25. J. McClelland and D. Rumelhart, "A Parallel Distributed Processputer Models of rhought and Language, w. H. Freeman and ing Model of Aspects of Language Learning, in B. MacWhinney co., San Francisco,CA, pp. 182-248,1929. (ed.), Mechanisms of Language Acquisition, Lawrence Erlbaum, 46. P. Langley and R. T. Neches, PRISM User's Manual, Technical Hillsdale, NJ, 1987. Report, Department of Computer Science,Carnegie-Mellon Uni26. J. L. Bybee and D. I. Slobin, "Rules and schemasin the developversity, Pittsburgh, pA, 1991. ment and use of the English past tense," Language 5g, z6s-2gg (1e82). 47. J. Lowrance, Grasper 1.0 Reference Manual, COINS Technical Report 78-20,University of Massachusettsat Amherst, December 27- R. A. Brown, A First Language: The Early Stages,Harvard Uni1978. versity Press,Cambridge,MA, 1g7g. 28. J. G. deVilliers and P. A. deVilliers, Language Acquisition, Harvard University Press, Cambridge, MA, lg7g. 29. M. D. S. Braine, "Children's first word combinations," Monogr. Soc. Res.Child Deuel.4t(1),1-149 (1976). 30. K. Nelson, "structure and strategy in learning to talk," Monogr. Soc. Res.Child Deuel.BS(1-2), 1-18? (February_April lgTB), Series L49. 31. D. Slobin, Crosslinguistic Evidence for the Language-Making Capacity, in D. Slobin (ed.), The CrosslinguisticStuJy of Language Acquisition, Lawrence Erlbaum, Hillsdale, NJ, 1gg4. 32. M. Bowerman, Early Syntactic Development,A Cross-Linguistic Study With Special Reference to Finnish , Cambrid,ge Stuiies in Linguistics, No. 11, cambridge university press, London rg7g. , 33. M- Bowerman, Learning the Structure of Causative Verbs: A study in the Relationship of cognitive, semantic, and syntactic Development,in E. Clark (ed.),papers and, Reportson Chitd,Lan_ guage Deuelopment,No. 8, Stanford University Committee on Linguistics, Stanford, CA, pp. L42_129, L974. 34. H. Tager-Flusberg, J. deVilliers, and K. Hakuta, The Development of sentence coordination, in s. Kuczaj II (ed.), Language Deuelopment:Problems, Theories and. Controuersies,Vol. t, Syntae and semantics, Lawrence Erlbaum, Hillsdale, NJ, lggi. 35' E' Matthei, The Acquisition of Prenominal Modifier Sequences: Stalking the SecondGreen Ball, Ph.D. Dissertation, Department of Linguistics, University of Massachusettsat Amherst, f gZg. 36' L' Solan and T. Roeper, Children's Use of Syntactic Structure in Interpreting Relative clauses, in H. Goodluck and L. solan (eds.), Papers in the Structure and Deuelopmentof Child Language, Vol.
48' W. Woods,What's in a Link: Foundations of Semantic Networks, in D. Bobrow and A. Collins (eds.),Representationand. (Jnd.er_ standing: Studies in Cognitiue Scienci, Academic, New york, L975. 49. B. Macwhinney, "The acquisition of morphophonolory,,, Monogr. soc. Res-child Deuel. 4s(L-z), t-rzg (192s). 50' J' Sachsand L. Truswell, "Comprehensionof two-word instruction by children in the one-wordstage,"J . Chitd Lang . 5, L7 -24 (Lg7g) . 51' B. MacWhinney and C. Snow, "The child data exchange system,,, J. Child Lang. t2, 27I-296 (198b). 52' B' MacWhinney and J. Anderson, The Acquisition of Grammar, in I. Gopnik and M. Gopnik (eds.),From Mod,elsto Mod,ules: Studies in cognitiue sciences,Ablex, Norwood, NJ, 19g6. General References Models of Language Acquisition J. R. Anderson, A Theory of Language Acquisition basedon General Learning Principles, Proceedings of the Seuenth International conferenceon Artificial Intelligence,vancouver, 8.c., pp. gz_10g, 'LL Augustlg8l E. Bates and B. MacWhinney, Functionalist Approachesto Grammar, in E. Wanner and L. Gleitman (eds.),Language Acquisition: The State of the Art, Cambridge University Press, London, 19g2, pp. L75-2L8. R. Berwick, The Acquisition of SyntacticKnowled.ge,MlT Press,Cambridge, MA, 198b. J. c. Hill and M. A. Arbib, "schemas, computation, and language
452
TANGUAGES,OBfECTORTENTED
acquisition,"Human Deuel.27, 282-296 (1984),an expositionof schematheory and its relevance to language acquisition. J. Moulton and G. Robinson, The Organization of Language, Cambridge University Press,London,1981,an expositionof the empiricists' position. S. A. Pinker, A Theory of the Acquisition of Lexical-Interpretive Grammars, in J. Bresnan (ed.;, The Mental Representationof Grammatical Relations, MIT Press, Cambridge, MA, 7982, pp. 665-726. Readings in Psycholinguistic Studies of Language Acquisition M. D. S. Braine, The Acquisition of Language in Infant and Child, in C. Reed (ed.), The Learning of Language, Appleton-CenturyCrofts, New York, I97L, pp. 7-95. E. V. Clark, What's in a Word? On the Child's Acquisition of Semantics in His First Language,in T. E. Moore (ed.),CognitiueDeuelopment and the Acquisition of Language, Academic, New York, L973,pp. 65-110. A. M. Peters, Language Segmentation: Operating Principles for the Perception and Analysis of Language, in D. Slobin (ed.), The Crosslinguistic Study of Language Acquisition, Lawrence Erlbaum, Hillsdale, NJ, 1984,pp. 1-80. S. L. Tavakolian (ed.),Language Acquisition and Linguistic Theor!, MIT Press,Cambridg", MA, 1981. E. Wanner and L. Gleitman (eds.),Language Acquisition: The State of the Art, CambridgeUniversity Press,London,1982,pp. 173-2L8. Readings in Linguistic Theory J. Bresnan, A Realistic Transformational Grammar, in G. Miller, J. Bresnan, and M. Halle (eds.),Linguistic Theory and Psychological Reality, MIT Press,Cambridg", MA, 1978,pp. 1-59. N. Chomsky,"A review of B. F. Skinner'sVerbalBehauior,"Language 35(1), 26-58 (1959), first published in 1957, Chomsky's wellknown responseto Skinner's position on language. Readings in AI Learning Systems E. L. Rissland, Examples and Learning Systems,in O. Selfridge,E. Rissland, and M. A. Arbib (eds.),Adaptiue Control of Ill-Defined Plenum Press,New York, 1984,pp. 149-163. System.s, J. C. Hnl Smith College
OBIECTORIENTED LANGUAGES, Object. Noun. 1. Something physical or mental of which a subject is cognitiuely aware. 2. Something which arousesernotionin an obseruer.
i, ?if*,f *.f;,ffiiiffiil:3,:?xff;r, Object-oriented proglamming is a style of programming that is based on directly representing physical objects and mental conceptsin the machine. The goal is to make the machine cognitively aware of the physical world and able to reason about it using mental representations.Sincethis goal is at the heart of AI, object-oriented programming is recognized among many researchers as the most appropriate vehicle for intelligent computation. But the seconddictionary definition applies to object-oriented programming as weIl. The debate
between adherents of object-orientedprogramming and advocates of alternative styles often arousesstrong emotions. ProvideBuildingBlocksfor Object-OrientedLanguages ConstructingIntelligentPrograms. It is the task of AI programmers to bridge the seemingly unfathomable gap between the flexible, creative, unpredictable performance of humans in problem solving and the rigid, mechanical, step-by-step nature of contemporary r3omputers. Obviously, it is too difficult to try to impart intelligence to the machine all at once. Instead, an intelligent program is built out of parts, each of which is slightly less intelligent. Each of these parts is further broken down into smaller :fragments, each of which relies on less knowledge and expertise. Finally, there are the least intelligent parts, which behavemechanistically, blindly following rules just like the robots porrtrayedin cartoons. A good programming language for AI must support this kind of program construction, building a problem solver out of parts, each of which can act like a little problem solver itself. That's how object-orientedprogramming wor:ks.An AI program is implemented as a collection of active objects,or actors.Each object represents a specializedpart of a problem solver and has its own local knowledge and expertise. Each object is defined in terms of other objectsto which it can seen messagesto solve problems for it. Each object is defined by its behavior in responseto messagesreceivedfrom other objects. What kinds of problem-solving(qv) capabilities should intelligent building blocks for AI programs have? EachObject Has the Ability To StoreInformation. .A problem solver must have memory to remember information it learns, which may be of use in the future. In an object-orientedlanguage each object has a set of private data, called its acquaintances,or instance uariables. Giving each object the ability to store information means that any object in the system can be made to learn over time. EachObject Has the Ability To ProcessInformation. Thinking requires processingpower. Rather than concentrating processing power in a centralized interpreter, objec,t-oriented programming distributes the ability to process information among the collection of objects that make up a s)'stem (see Logic; Inference). Each object is defined by its behravior,expressedin a program called a script, which sayswhat action an object takes when it receivesa message.Proceduresspecificto a particular subset of messagesare called methods.In conventional languages proceduresare active, but data like numbers and arrays are passive. In object-oriented languages even numbers and arrays can potentially be active, taking actions in responseto messages.A simple array object coul.drespond to a message,asking it to look up an element by indexing into a table, but a more complex object could perhaps perform logical deduction or apply heuristics to respond to a similar message. Each Object Has the Ability To Communicate. A problem solver may need to ask other problem solvers for information necessaryfor its work and help out other problem solverswith the results of its efforts. Object-oriented languages perform computation by sending messages between objects. Giving each object the abitity to communicate means that knowledge can be partitioned into independent areas of expertise. They can interact with each other to solve problems the way different peoplein a society cooperateto accomplishgoals (1,2).
IANCUAGES, OBfECTORTENTED 453 Each Object Has the Ability To Creafe New Information. resent the interactions between the medical system and the Problem solving may generate new information or new solu- human patient. The patient object can receive a messageasktion methods in the courseof processing.The amount and kind ing for symptoms, and of course,each symptom, such u, f.rr.r, of information generated often cannot be predicted in advance. is itself an object. Each fever object may carry with it particuwith object-orientedprogramming, &r object may create new lar information such as its temperature and duration. A simobjectsdynamically as the p.roblem-solvingprocessproceeds. ple model of diagnosis is to associate with each symptom a set This may mean just creating new instancesor copiesof previ- of possible causesand investigate each hypothesizedcause in ously introduced objects. It may also imply the creation of order of its likelihood. The fragment of the program dealing completelynew kinds of objects,with behavior that abstracts, with fevers might read as follows (again, keep in mind that generalizes,specializes,or combinesthat of others. *.heseexamplesare oversimplified): Object-oriented programming endows each object with all of the fundamental capabilities mentioned above:storage,pro- Define a Fever object as a kind of Symptom object, cessingpower, and communication. In conventional languages Each Fever object has: these capabilities are fragmented. Each basic data type has A Temperature, by defaurt 100 degreesF. only a fixed and very limited repertoire of behavior, only alA Duration, for example, 2 days. lowing some of the fundamental problem-solving capabilities outlined above. Only procedureshave the ability to compute; If I'm a Fever object, and I get a messageasking for my data structures like numbers, strings, and lists cannot be proMost-Likely-Cause, grammed to take action like procedures can. Only mutable I ask my Temperature whether it is greater than 100, and variables, arrays, and list cells have the ability to store inforask my Duration whether it is greater than B days. mation. Only IIO primitives have the ability to communicate. If so, I reply with a Serious-Infection. Integrating all of these capabilities is what gives object-oriOtherwise I reply with the Flu. ented programming its power. Because software for AI evolves rapidly, it is difficult to Diseaseslike the flu, of course, are representedby objectsin foreseewhat capabilities one will need in representinga con- the same way. A diseasemight, in turn, store a list .ry-.ratcept. One might start by representing a block in a blocks-world ing its most probable symptoms. Thus, a very simple top-level program with the use of a list of position and color. A later control structure for a diagnosis program might read as foldecision to change the format of the list or reimplement the lows: block as a procedure would normally require changing all the programs that accessthe block. Implementing the block as an Send a messageto the Patient object asking for Symptoms. active object means that the representation of the block con- Pick a Symptom object, and ask the Symptom for its cept can always be "made smarter" without altering programs Likely-cause, returning a Diseaseobject. that use it. Ask the Diseaseobject for its typical Symptoms,and see if the Patient has any of these other symftoms as well. If the symptoms of the Diseaseare all the same as the A MedicalExamplelllustratesObject-OrientedProgrammingin Symptoms the Patient has, Al Applications. send a messageto the Diseaseobject asking for its The best way to write an object-oriented program is to anTreatment. thropomorphize the conceptsthat make up the program. For Suggest the Treatment for the patient. each important conceptthe programmer can imagine an intelligent expert whose job it is to deal with that concept. The A treatment is another kind of object. Specifickinds of treatexpert stores knowledge about that particular concept, can ments can be built out of the object for treatment by the techsolve problems regarding it, and can answer questions about nique of inheritance or delegation, discussedbelow. Other it. Each such expert is implemented as an object. kinds of treatments might include physical therapy or surThe processof writing an object-orientedprogram consists gery. Drug treatment objects may respondto messagesasking of deciding what kinds of objects are needed, what kinds of for their recommended dosageor contraindicationsby accessmessagesthey will respond to, and what kinds of actions the ing an on-line version of the Physician'sDesk Reference. objectswill take in responseto the messages.Describing the By defining an object to represent a category of drugs, such program using a "procedural English" notation will avoid as Antibiotics, general information about antibiotics need not the distraction of discussing the syntax of particular object- be repeated in the objectsrepresenting particular drugs, such oriented languages.Names of particular objectsand messages as Penicillin. When an object like Penicillin receivesa mesare capitalized, and the descriptions are indented to show the sage,it first tries to reply basedon its own local knowledgeand structure of the program. expertise. If the knowledge specificto penicillin is nof suffiHow would object-orientedprogramming apply in an exam- cient, the messageis delegatedto the objectfor its drug family, ple AI application? An example from the domain of medical such as antibiotics. If this does not suffice, the object for all diagnosis serves to illustrate how the programming tech- drugs is called upon, and so on. niques are used (seeMedical advice systems).An apology to This inventory of objects for medical diagnosis programs medical experts: The exampleshere have been drastically sim- illustrates a fundamental principle of organization for objectplified for expository purposes.They are meant to be sugges- oriented programs. Knowledge about how to use each individtive rather than accurate. ual conceptin a domain residesin the objectfor that concept.A First one introduces an object that servesas the computer's large part of becoming an expert in a domain is assimilating representationof the patient. The messagesto this objectrep- technical concepts. Theseare useful becausethe knowledgefor
454
OBfECTORIENTED LANGUAGES,
each one is relatively independent of the others, and they can interact with each other in predictable ways. Objectsprovide a means for directly representing the kinds of technical concepts that give human experts their problem-solving power. Contrast this with the alternative rule-based style of AI programming (seeRule-basedsystems).A rule-basedprogram consistsof a large number of independent rules, in an if-then form, of which the following might be typical: If the patient has a fever, and the fever is above 100 degreesF, and the fever lasts for more than 3 days, Then the patient probably has a serious infection [with ProbabilitY 757o]. Instead of being organized around objectswhich represent the important conceptsof the problem domain, a rule-based program is a uniform collection of rules, each of which describesa certain situation that can occur. The chief advant ageof rules is that it is easy to add a new rule to the collection. The corresponding disadvantage is that knowledge about particular concepts such as fevers or penicillin is scattered throughout the program. Addittg a new rule regarding fevers to a large set of rules may causesurprising interactions that can be difficult to debug. If one wants to say what to do if the fever is above 100"F Uut less than 3-days duration, the conditions in the rule must be repeated, making it harder to share knowledge about similar situations. Object-oriented programs keep knowledge associatedwith a particular concept more localized. In addition to an association-orientedapproach to diagnosis, which simply tries to match diseasesand symptoms that are commonly found together, a more advanced approach might try to reason about the functionality of parts of the body to trace causes.Object-orientedprogramming becomesuseful here in modeling organs,bloodvessels,and other body parts as objects. Each part keeps track of other parts to which it is connected.Messagessent from one part to another model the actions that each Part Performs. Define a Lung object to be a kind of Organ object. With Entering-Blood-Vesselsand Leaving-Blood-Vessels. If I,m a Lung and I get a Breathe message,with a Blood-Flow, I send the Entering-Blood-Vesselsa messageto Decrease the carbon-Dioxide content of the BIood-FIow. I send the Leaving-Blood-Vesselsa messageto Increase the Oxygen content of the Blood-Flow' Thus, an important componentof a medical program can be a functional simulation of the body's various systems.Knowing that a diseaseaffects the lungs might lead one to hypothesize that there will be an adverse effect on the oxygen content of the blood. The top-level control in such a simulation consistsof sending messagesto objects representing all the body systems in paialtel, telling each one to begin performing its specificfunciior,. Object-orientedprogramming is especially well suited to this kind of simulation. Since object-oriented programs can have many objectsactive simultaneously, sending and receiving messages,the kind of parallelism that takes place in the boly, wheie alt the various systemsare active simultaneously, can be naturallY modeled.
BehaviorYields DefiningObjects by Their Message-Passing Modularityand Extensibility. Why is it so important for AI to represent conceptsas active objects and computation by message-passingbehavior? Although simple scientific and business applications can get away with representing concepts in only a single wBY, complex AI programs may need more than one representation for a concept. Conventional programming usually requires choosing specific storage formats for data, and users of that data are dependent on knowledge of the storage fbrmat. Having multipll representations of a conceptin a traditional program usually means that the users must know which particular representation is being used at any given time. Because objects are defined solely by their message-passingbehavior, eachobjectis free to implement its responsein a urrique way. A program can make use of a conceptwithout knowlng the exact details of its internal representation. Thus object-orientedprogramming provides better support for programs that make frequent use of multiple representations. Computation in AI is so diverse that the flexibility of changing implementation details of a componentof a program without affecting users of that component is essential. In the medical example, symptom objects might have many different ways of responding to a message asking for their possible causes.A symptom might merely store a list of possibilities and deliver that list whenever it is asked the question. In this caseit woutd be playing the role of a "data structure" in conventional languages.trnanother casedetermining the possible causesof a symptom might require running a functional simulation sothat the symptoms'srole would be more arkinto that of a procedure in conventional languages. If users of the symptom object interact with it only via messages,the resulting program is insensitive to the details of the implermentationof the symptom.
Object-OrientedProgrammingPermitsSharingKnowledge BetweenRelatedGroupsof Obiects. All knowledge pertaining to drugs in general should be grouped in a-single object, and this knowledge;extended to by simply farticular drugs, such as penicillin and digitalis, iefining the unique characteristics of each drug without repeating common information. There are two major mechanisms for this, and different object-oriented languages take different positions on this issue. One is inheritance (see Inheritance hierarchy), which involves definin g a classto represent a set of objects and to be used as a blueprint for makin g instance objects representing particular members of that set (see Inheritance hierarchy). tf,u class is given a list of names for instance uariables, and each instance can supply its own values for these variables. The classdefinesmethods,proceduresfor respondingto particular kinds of messages.Every instanceusesits class'smethods to decide how to respond to messages. Subclassescan build upon the knowledge of previous classes,adding more instance variables and methods, so that antibiotic is defined as a subclassof drug. Each instance object gets a new copy of each of the variables of its class,the super.1.r, of that class, and so on. When a messageis sent to an instance, first the methods of the class are tried, then the messageof its superclass,and so on, going up the class chain'
TANGUAGES, OBfECTORIENTED An alternative stratery is delegation. Rather than divide the world into classesand instances, each object serves as a rototype. A prototypical object can form new objectsby making copiesof itself (perhaps with modifications) or by creating new objects that have additional behavior and forward the messageto the prototype in the event that the additional behavior is not appropriate. This kind of forwarding messagesis called delegation, analogous to the way a specialist physician might delegate responsibility for a patient to a physician in another speciality if the patient's malady proves outside the original physician's area of expertise. A penicillin object can be made by copying a prototypical antibiotic objectand adding additional information. When a messageis sent to the penicillin object,it can either respondbasedon its own local information and expertise or delegatethe messageto the more general antibiotic object.
Obiect-OrientedLanguages Are Well-Suitedfor Parallelism
45s
solving processesto physical processorscan be made dynamically (4). Object-OrientedlanguagesHave a long History It is nearly impossible to say who "invented" object-oriented programming; the ideas have developedout of a diverse group of programming cultures to which many have contributed. The roots of objeci-oriented programming come from two strands: from the community of LISP programmers in AI and the community of Simula programmers working on simulation. Even early programs written in LISP, the secondoldest programming language, made use of LISP's symbols to represent concepts in a manner reminiscent of today's object-oriented languages. Each symbol has a property list that can store information retrieved using another symbol as a key. The key acts like a messagesent to the symbol representing an object. Because programs can be used as data in LISP, programs could be stored on property lists. Running programs retrieved from property lists was a popular technique for doing object-oriented programming. LISP's coNS primitive and garbage collection permitted creation of lists dynamically, so the essential ability to have dynamic objectswas present. S. Papert's similar Logo (5) language was taught to children with a model that explicitly talked about programming in terms of objects. Simula (6) pioneeredthe idea of classes,used to implement general objects that have knowledge to be shared by several instances.Subclassescould build on the behavior of previously defined classes,introducing the idea of inheritance u*orrg objects. The simulation applications for which Simula was used required conceptual parallelism, having many objects active at the same time, but Simula provided only the pseudoparallelism of coroutines and a global scheduler. At roughly the same time, Kay at Xerox (7) and Hewitt at MIT (8) realized that these techniques found in LISP and Simula could form the basis of a fundamentally different way of looking at computation and created the first object-oriented programming languages. Kay,s Smalltalk (g_11) was designed as a language for people other than computer scientists to use for general information-handling needs.Hewitt's Actors were developedfor AI applications (IZ) and for understanding the theoretical nature of computation (18).
AI programs have traditionally had a voracious appetite for computation power. The changing technology and economics of computer systems in the near future indicate that the primary route to increased performance will lie in the exploitation of parallelism in multiprocessor computer networks. Thus, AI applications must move increasingly toward parallel programming techniques. Because neurons in the brain are very slow compared to computer hardware, it is clear that the brain must rely on parallelism for its tremendous computation power. Modeling the parallelism of the mind requires computer languages that can exploit parallelism effectively. One of the primary reasonsAI researchersshould be interested in object-orientedprogramming is that these languages are among the best for exploiting parallelism. Becauseknowledge in object-orientedprograms is localized, each object containing its own local knowledge and expertise, different processors can work on different objects at the same time. Object-oriented programs do not rely on having global state, which can becomea bottleneck in parallel systems.For example, the blackboard architecture (see Blackboard systems) adopted by some AI programs can clog up if many processors attempt to accessthe single blackboard simultaneously unless special care is taken to prevent this situation. Parallelism brings with it the problem of synchronization, and object-orientedlanguages can contribute by providing ob- object-oriented l-anguagesHave creat Diversity jects that manage accessto shared resources.Objects such as It is difficult to say which current languages are "truly the serializers in Actor languages (g), which protect actors object-oriented" since the ideas appear in different forms in with changeable internal state, can receive messagesfrom different languages and supported to varying extents. Objectmany processessimultaneously but queue up requests so that oriented programming is more a frame of mind than a particuchangesto the internal state happen serially, incon- lar language. It is possibleto write object-orientedprograms in "'otidittg where sistencies.In languages that are wholly object oriented-, nearly any language, even machine languages. Nevertheless, all computation happens by messag" pumirg, objects can be having explicit representationsfor objects,messages,and reimplemented that provide parallel computation and synchro- sponses to messagescan make a big'difference in the convenization transparently. Users of parallei resourcesneed not be nience with which object-orientedprogramming can be used. concernedwith the prevention of low-level timing errors, alObject-oriented languages fall into two major categories: lowing parallel problem solvers to be programmed as easily as the uniform object-orientedlanguages and extensionsto traditheir serial counterparts. tional languages.The uniform languagesrepresent all compuBecauseobjects are created dynamically, recoveredby gar- tation in terms of objects and messagepassing. In these sysbage collection, and dependent only on their message-passing tems there are no passive data; every object can be active and behavior, objects can move from processorto processorin a receive messages.This is the most radical approach.To date, parallel system in a flexible way. The number of processes only Smalltalk and the Actor languages are really uniform need not be fixed in advance, and the allocation of problem- object-oriented languages in this sense.
456
LAW APPLICATIONS
The language extensions take an existing language with passivedata and active proceduresand add a new data type to represent objects that can respond to messages.Traditional data types in the language, like numbers and arrays, are not themselves objects. LISP has been extended in this way in several different implementations, as have Algol-derived languagessuch as Pascal, C, and Ada, though the Algol family of languages are hampered by the lack of truly dynamic storage allocation with garb agecollection. Examples of these are Flavors (14), Loops (15), and Director (16) on the LISP side and Intermission (17) in PROLOG, Traits (18) and Clu (19) based on Algol-like languag€s,and Objective C and Apple's Object Pascal. The advantage of a uniform representation is greater modularity and extensibility. Uniformity allows building new kinds of numbers and lists or other system data types as easily as any other sort of object. It obviates the need to determine that an expressionyields a user-definedobjectbefore attempting to send it a message.The disadvantage is that uniform implementations can be less efficient on conventional machines since conventional optimizations may involve knowledge of specific storage formats in violation of the messagepassing protocol. The emergenceof new machine architectures that specifically support object-oriented programming holds out the promise that object-orientedlanguageswill be no less efficient than procedural languages.Parallelism holds out the promise that object-orientedlanguages will be a more effective way to harness increased processing power for AI applications. Objectoriented languages will no doubt win an increasingly prominent place in the tool kit of the AI programmer.
Description System, Proceedings of the First National Annual Conferenceon Artificial Intelligence,American Associationfor Artificial Intelligence, Stanford, CA, pp. 157-L64, August 1980. 13. C. Hewitt and H. Baker, "Laws for Communicating Parallel Process€s,"1977 IFIP CongressProceedings,Toronto, Ontario, pp. 987-992, t977. L4. D. Moon, D. Weinreb, et al., Lisp Machine Manual, Symbolicsand MIT Press,Cambridg", MA, L984. 15. D. Bobrow and M. Stefik, "Knowledge programming in Loops,"AI Mag.4(3), 3-13 (August 1983). 16. K. Kahn, Dynamic Graphics Using Quasi-Parallelsim,ACM SigGraph Conference,Atlanta, GA, pp. 357-362, 1978. L7. K. Kahn, Intermission-Actors in Prolog, inLogic Programming, Academic Press,New York, pp. 213-230, 1982. 18. G. Curry, L. Baer, D. Lipkie, and B. Lee, Traits: An Approachto Multiple-Inheritance Subclassing,Conferenceon Office Information Systems,Philadelphia,PA, ACM SIGOA, pp. 1-9, June t982. 19. B. Liskov, A. Snyder, R. Atkinson, and C. Schaffert, "Abstraction mechanismsin CIu," CACM 20(8), 564-576 (1977). General References
H. Baker and C. Hewitt, The Incremental Garbage Collection of Processes,Conferenceon Artificial Intelligence and Programming Languages,ACM, Rochester,NY, pp. 55-60, August 1977. A. Bornirg, Thinglab-An Object-OrientedSystem for Building Simulations Using Constraints, Proceedingsof the Fifth IJCAI, Cambridge, MA, pp. 497-498, August L977. R. J. Byrd, S. E. Smith, and S. P. deJong, An Actor-BasedProgramming System, Conferenceon Offi.ceInformation Systems,Philadelphia, PA, ACM SIGOA, PP. 67-78, June 1982. C. Hewitt and H. Lieberman, Design Issues in Paratlel Systems for Artificial Intelligence, Proceedings of the CompCon-S4Conference,IEEE, San Francisco,CA, pp. 418-423, March 1984. K. Kahn, Uniform-A Language Based Upon Unification which UniBIBLIOGRAPHY fies (much of) Lisp, Prolog, and Act 1, Technical Report, University of Uppsala, Uppsala, Sweden,March 1981. 1. W. A. Kornfeld,and C. Hewitt, "The scientificcommunitymetaH. Lieberman, A Preview of Act 1, AI Memo 625, MIT, Cambridge, 24-33(JanuSMC-11(1), phor,"IEEE Trans.Sys.Man Cybernet. MA, 1981. ary 1981). H. Lieberman, "Machine tongues: Object oriented programming," 2. M. Minsky, The Societyof Mind, Simonand Shuster,New York, Comput. Mus. J. 6(3), 8-21 (Fall 1982). 1987. H. Lieberman, An Object Oriented Simulator for the Api&tY, National 3. C. Hewitt, G. Attardi, andH. Lieberman,SpecifyingAnd Proving Conferenceon Artifi,cial Intelligence,American Associationfor Aron PropertiesOf GuardiansFor DistributedSystems,Conference tificial Intelligence, Washington, DC, pp. 241-246, August 1983. Semantics of Concurcent Computing, Evian, France, SpringerVerlag, Berlin, pp. 316-336, L979. H. LTnSERMAN 4. H. Lieberman, Expecting the Unpredictable: When Computers MIT Can Think in Parallel, in L. Vaina (ed.),Matters of Intelligence,D. Reidel, Amsterdam, 1986. 5. S. Papert, Mindstorms, Basic Books, New York, 1981. 6. G. M. Birtwistle, o-J. Dahl, B. Myhrhaug, and K. Nygaard, LAW APPTICATIONS Simula Begin, Van Nostrand Reinhold,New York, 1973. 10(3), IEEE media," dynamic "Personal A. Goldberg, 7. A. Kay and Since the earliest days of computing, there have been lawyers 31-39 (March L977). were excited by the prospectof intelligent machines.WeIl who 8. C. Hewitt, Viewing Control Structures as Patterns of Passing the term artifi,cial intelligence was invented, there had before (eds.), IntelliArtificial R. Brown and P. Winston in Messages, proposals to use computers for retrieving legal source been gence,an MIT Perspectiue,MIT Press, Cambridg", MA, pp. 433(1) for analyzing the leeways available to a and materials 1979. 465, g. A. Goldberg and D. Robson,Smalltalk-8}: The Language and its judge in deciding a new case (2). There were also, of course, many ideas about less ambitious uses of computers in the law Implementation,Addison-Wesl.y,Reading,MA, 1983. (3,4). 10. A. Goldberg, Smalltalk-8|: The InteractiueProgramming EnuiThere were good reasonsfor lawyers to envision more than 1984. MA, Reading, Addison-Wesley, ronrrlenf, routine data-processing applications of computers. The rea11. G. Krasner (ed.),Smalltalk-S}: Bits of History and Words of Adsons stem from the philosophy of law, or jurisprudence' which uice,Addison-Wesley,New York, 1984. in the United States has dealt mainly with the problem of with a L2. C. Hewitt, G. Attardi, and M. Simi, KnowledgeEmbedding
LAW APPTICATIONS 457 judicial decision making (5). The underlying concernis a practical one. Given that judicial decisionsmay be highly controversial and may affect everyone,what is it for a decision to be rationally justified? From this question, it is only a short step to the questions of how far legal reasoning is mechanizable and how far legal decisions are, or ought to be, computable. The relevant writers include Holmes (6), Cardozo(7), Levi (8), Llewellyn (9,10),Hart (11), and Dworkin (12).A concisesurvey of the issuesappearsin Hart (13). Gilmore (14) providesa salutary historical perspective. Early tliinking about intelligent programs in law (15-12) was concurrent with the early development of AI; there was not much interaction. The actual programs were of two main t;4res.One group of programs was describedas concernedwith the prediction ofjudicial decisionsor, more generally, with the analysis ofjudicial behavior. In these programs the data represented the fact patterns in a large number of somewhat similar cases.The universe of possiblefact patterns was definedby a predetermined set of propositions, some subset of which (or their negations) was taken to represent the facts of any particular case. The outcome of each case was viewed as a mathematical function of its facts. The general problem was to determine a good function, which could then be used to predict the results in other cases.For surveys of this work seeRefs. 3 and 18. There are also some related recent projects (19-21). Second,there was the problem of information retrieval or, more accurately, document retrieval. After many experiments and much debate (22,29) the approach to this task generally settled down to a common one. It involved storing essentially the full text of the statutes, court decisions, or other documents to be retrieved; the user queried the databaseby giving a Boolean combination of keywords that were expectedto appear in the relevant documents. This approach has been refined and made the basis of the major commercial retrieval systems (23,24). A third area of early work, not so immediately focusedon computer implementation, was the application of formal logic (qv) to the law. This topic is associatedprimarily with the work of Allen (e.g., Refs. 25-27). The aim was to represent statutes and other legal documentsin a way that avoided syntactic ambiguity-in particular, ambiguity about the scopeof logical connectivesand about the mapping from English words like if and unlessto these connectives.There was also interest in deontic logic, the logic of statements about what is required, permitted, or prohibited (e.g., Ref. 28). By the early 1970sthe initial efforts in all these areas had lost much of their intellectual momentum. In the area of predicting judicial decisionsthe premises and the significance of the researchhad been criticized as, at best, uncleir (2g,80).In information retrieval a number of observersfelt that the keyword approachhad reachedits limits (81,92),and in Canadaa report prepared for the government recommendedagainst continued public funding (33).Additionally, the translation of law into formulas of logic was subject to two difficulties. First, the work was limited to the propositional level; problems of intrapropositional structure and the meaning of the nonlogical vocabulary remained untouched. Second,when real were found, the formalism pressed the translator"-biguities to make a choiceamong readings-even if it was an openquestion what reading the courts would eventually adopt. Foi this reason, the development of a normal form for statutes seemedmuch better adapted for use in the initial drafting of statutes than for representation of existing ones.
EarlyAl Work in Law The beginning of AI work proper in law can be dated to L970, with a paper by Buchanan and Headrick titled "SomeSpeculation about Artificial Intelligence and Legal Reasonittg" (3a); but see also Mehl (35). The lawyer's task, &s Buchanan and Headrick saw it, was always goal directed, on behalf of a client. Two general situations were envisioned. In one the relevant events had already occurred; the lawyer's task was to advise the client about his rights and liabilities and to construct an argument why the client should win his case.Suggesting that clear yes-or-no answers are rare in the law, the authors emphasizedthe problem of argument construction. In the second situation the client sought advice about future actions to achieve his goals, including businessand other goals as well as legal ones. Here, the ideal was to find a plan in which the client's actions closely matched somefavorable, prototypical legal situation and which also minimtzed the risks to the client's other goals. In the remainder of their paper Buchanan and Headrick tried to identify some of the lawyer's thought processesand to map these onto some of the then-current work in AI. The thought processesmentioned are finding conceptual linkages in pursuing goals; recognizing and characterizing relevant facts; resolving rule conflicts, by finding or constructing other rules; and finding and using analogies.The AI work discussed includes Heuristic DENDRAL (36), the General Problem Solver (37), and Evans's analory program (88). From the perspective of 15 years later, the paper shows an interesting tension between the complexities of legal reasoning-then and now, far from fully analyzed-and the relative simplicity of the early AI techniques and the well-formed problems to which they had been applied. The importance of DENDRAL was that it succeededin reducing a significant real-world problem to a well-formed one within the grasp of AI and yet produced results of interest to chemists who were concernedwith the original probleffi, not with the AI version. The question raised by Buchanan and Headrick was whether the samecould be done for law. AI researchers are still exploring that question. The first implemented AI programs in law were, of necessity, much narrower in scope than the one Buchanan and Headrick envisioned. Maggs and deBessonet(39) looked toward developing "a gen eraltzed formal language approach to the analysis of systems of legal rules" that would permit "questions of an extremely specific nature" to be answered (40). These authors used a set of rules based on statutes and expressedin a normalized propositional calculus (see Logic, propositional).The main element of their program was a theorem prover (see Theorem provirg) implemented in LISP and using a British Museum algorithm (41). Suggestedapplications of the program included determining whether the rules were consistent and nonredundant, answering questions about liability, and generating questions to be asked in a client interview. Popp and Schlink's proSaffi, JUDITH (42), was an interactive consultation system patterned on MYCIN (43). Its rule base, which apparently dealt mostly with the law of negligence,was drawn from the German Civil Code.The program's dialogue with the user proceededtop down, asking yes-no questionsin an attempt to establisha basisfor liability. A user responsemeaning "I don't know" was permitted; its effect was to invoke the next lower level of rules. The authors suggested
458
LAW APPLICATIONS
that if the rules bottomed out and the user still did not know, the program should refer him to an information retrieval system. Both JUDITH and the Maggs and deBessonetprogram represented legal rules at the propositional level. There was no separaterepresentation for the facts of a case;the situation to which the rules were being applied could be describedonly in terms of whether these legal propositions were true. This left a heavy burden of legal expertise on the system user. From another viewpoint, it meant that the approachcaptured only one small aspect of legal knowledge. The n_extsteps were taken by McCarty, in the progTam TAXMAN I (44), and Meldman, in a doctoral dissertation at MIT (45). In each of these projectsthe input was a statement of the facts of a case; the output, some conclusionsabout their legal import. Both the facts and the law were now represented with predicate- argument structures, not unanalyzed propositions. In McCarty's case the language used was a modified version of Microplanner (46); in Meldman's, a notation meant to be converted to the not-yet-implemented language OWL (47). In TAXMAN I the subject matter was the taxation of corporate reorganizations under the Internal Revenue Code. The relevant code provisions define several types of tax-free reorgantzations. Given a representation of facts such as "Phellis owns 250 shares of the common stock of the Delaware corporation" and "the Delaware corporation transferred its assetsto the New Jersey corporation," TAXMAN I determined whether, according to its definitions, any of three different types of tax-free corporate reorganizations had taken place. As McCarty pointed out, TAXMAN I embodied much too narrow a picture of legal reasoning to be accurate. The program reached its conclusions deductively, leaving no space within which opposing lawyers might argue about whether a given reorganization qualified as tax free. Further, although the axioms defining a tax-free reorgantzation were based on the definitions in the Internal Revenue Code,there had been three Supreme Court casesin which the literal definition was satisfied but in which, the Court held, the reorganization was taxable becausesome other element was missing. How could such a move be legitimate, and how could a program have found the move? These questions were the impetus for the TAXMAN II project describedbelow. In Meldman's dissertation (4il the subject matter was the tort law of assault and battery. Unlike the statute law of the projects mentioned above, assault and battery is primarily a caselaw area-that is, an area in which the rules have developed out of judicial decisions over the years, not by discrete legislative enactment. In this respectthe law of torts is typical of the subjectstraditionally stressedin American legal education and jurisprudence. But given the primacy of judicial decisions and, at the same time, the practice of describing the law in terms of general rules, Meldman had to treat the relationship between the rules and the precedentsas well as the bearing of both on the analysis of a new case.Thus, his was the first AI project to represent more than one source of legal knowledge. For Meldman the dichotomy between specific cases and general rules was in part a distinction between primary and secondary authority. The most general rules in the knowledge basewere attributed to secondaryauthority, namelY,to a fictitious encyclopediacalled Corpus Juris Mechanicum. With the casesas prim ary legal authority, the question arose, as it al-
ways does in studies of legal reasonitg, of how a case can be authority for anything beyond its own particular facts. Meldman managed the answer in a very simple way: The casewas itself represented as a rule. The content of the rule differed from the rules of secondaryauthority only in that it was "often more specific"; the form differed in that it was attributed to a particular decision, whose specific facts were also stored but not used in the reasoning process(48). The input to Meldman's system was a representation of a simple hypothetical case,like the following: "With the purpose of frightening Gordon Good, Howard Hood visibly points a saturday-night special at him and grabs the umbrella that he is holding. The saturday-night special is not loaded" (49). The top-level goal of the system was to find an instance of assault, or batt€ry, or both. Subgoals could succeedby the application of rules basedon secondaryauthority, by matching to the rules of the cases,and if all else failed, by asking the user. The use of casesdiffered from the use of other rules in that a case could match either exactly, with the help of an abstraction hierarchy; or by analogy. The version of analory usedwas too simPle, as Meldman recogtrized.In effect, it simply permitted the rule of a case to be temporarily generalized by replacing all its predicates by their parents in the abstraction hierarchy. A more basic problem, however, was the way caseswere used even for exact (nonanalogical)matching. Meldman assumeda functional correspondencebetween cases and the rules they stood for; in doing so, he relied centrally on the conceptof the holding or ratio decidendi of a case.This conceptis a standard one in the law, but it quickly breaks down under examination: a finite amount of data, which can be describedin many different ways, does not determine a unique general rule (for discussion,seeRef. 50). The question of how best to represent and use cases in a legal reasoning program is still a research problem. RecentDevelopmentof LegalAnalysisPrograms McCarty'sTAXMAN ll. A question that runs throughout Iegal analysis programs is how one should think about the connection between the general words or conceptsemployed in legal rules and the situations to which the rules may be applied. Clearly, one always works with a description of the situation, not the situation itself. It is assumed,in most research, that this description is fixed and undisputed. But the situation description itself must use general language. The question becorneshow that language connects with the general language of the rule. In TAXMAN I, the connections were treated as definitional. For instance, one kind of corporate reorganization, a and coNrRoL; was definedin terms of eceuISITIoN B-REoRGANrzATroN, in terms of ExcHANcn, nxcsANGES; list of a of in terms AceursrrroN, thought were definitions the Further, TRANSFERs. pair of a of as creating a conceptual hierarchy whose elements were connectedby relations of abstraction and expansion. Thus, a was seen as a straightforlegal concept like B-REoRGANrzArroN descriptions of situaconcrete more from abstraction ward tions. Some legal concepts,McCarty observed,do not have this tidy structure. The initial examples were the extra requirements for a tax-free reorganization, which had been imposed by the Supreme Court although they were not mentioned in the statute. The conceptsthe Court used, which go under the names continuity of interest, businesspurpoEe'and step trans-
LAW APPLICATIONS 459 actions, were seen as amorphous concepts,which TAXMAN I was unable to represent. Understanding the nature and use of amorphousconceptsis the major goal of TAXMAN II (51-53). In this work, now using the representation language AIMDS (54), two styles of conceptual representation are provided. The simpler, which provides the capabilities of TAXMAN I, is called a logical templaterepresentaiion. For conceptstreated as amorphous,the representation is in terms of prototypes-an idea whosebackground in the literature includes Wittgenstein's family resemblances (55), Hart's open texture (11), Putnam's stereotypes(56), and Minsky's similarity networks of frames (57). The representation proposed, called a prototype-and-deformationmodel of conceptual structure, is describedas having three elements: (a) optionally, Br invariant that states necessuty, but not sufficient, conditions for an instance of the concept; (b) a set of exemplars of the concept; and, to make the exemplars into more than a disjunctive definition, (c) a set of transformations stating how one exemplar can be mapped into another (58). Algorithms for using the prototype-and-deformation structures are still under development (59); there is also related work on analogy using the assumedpurposesof legal rules (60). So far, the TAXMAN II work has focusedon a different tax problem, namely, on the meaning of income in the 1g1B constitutional amendment authorizrng a federal income tax. In particular, McCarty and Sridharan consider a 1920 casein which the Supreme Court had to decidewhether a stock dividend is income(53).Use of the representationis directednot to computing an answer to this question but to modeling the arguments made in the majority and dissenting opinions. One important aspect of the argument involved being able to say what it is to own common stock or other securities in terms of the rights and obligations that ownership entails. Largely for this reason, McCarty has recently turned his attention to the logic of these concepts,deontic logic (G1).
conclusion; any case may present special circumstancesthat were not anticipated in the knowledge basebut that, oncethey occur, can be argued to override the computed result. Nevertheless,there are many points in each casethat a legal reasoning program needs to determine but that lawyers would not even call "issues"-fssause it is somehowobvious to the parties on both sides that, with respect to these points, the case does not present any special circumstances.A framework for legal reasoning programs is proposed that has two main phases: first, heuristically settling the obvious points and identifying the serious issues,the "hard questions" in the case; and second,finding arguments on both sides of the hard questions. Reasoningfrom precedentcases,as providing analogies, prototyp€s, and the like, is viewed as part of the argument of hard questions, to be undertaken only on selectedpoints in a casein a context of other findings that seemnot to be problematical. In the dissertation Gardner applies this framework to problems involving the formation of contracts by offer and acceptance. The program, written in LISP and using the representation language MRS (64), takes as input a representationof a problem from a law schoolor bar examination. The output is a data structure similar to a decision tree, in which the decision points correspondto the "hard questions" that would need to be resolved in order to decide the case. Two kinds of legal questions can be recognized.One kind asks what rule of law is to be applied; such questions reflect disagreement among courts or commentators about what the rules are or ought to be. The other, more common kind asks about the meaning of some word or phrase within a legal rule-that is, whether the caseat hand presents an instance of some legal predicate. The need to raise such questions arises from a crucial feature of legal language: Its meaning is not fully fixed in advanceof its use in application to particular cases.This feature, often referred to as open texture, must be allowed for in any realistic legal reasoning program. In McCarty's terminology, all concepts are potentially amorphous concepts. Gardner's Dissertation.In the work described above, two The problem then arises of how legal words can ever be very different approachesto legal reasoning have been taken. applied confidentlyto situations describedin nontechnicallanOne side, including the propositional logic programs and TAX- guage.In other words, how can legal questionsever have clear MAN I, representsthe law as a set of rules that can be applied answers, and how can a program recognize those that do? deductively to reach a conclusion about liability in a given Gardner uses some initial heuristics based on the cases,in case.The other side, including the early prediction programs particular, on knowledge of situations that legal predicates and the prototype structures of TAXMAN II, treats legal have standardly been used to cover. For the furthei developknowledge as knowledge of cases,from which we reason by ment of these heuristics, a novel kind of study of precedent comparison to other cases. Meldman's thesis encompassed casesis proposed.In the context of a dispute involving a deboth sides, but only by making drastic oversimplifications. scribed fact situation and some applicable legal rules, what TAXMAN II allows the two sides to coexist but says nothing points did all sidesfind so obvious as not to be worth discussabout how they fit together. A starting point for Gardner's ing? How do these situations, presentingclear caseson certain project (62,63) is the need for a unified framework that can points, compare with cases in which those points were the account for both these aspectsof legal reasoning. important issues?Such a study may prove illuminating with A secondfocus of Gardner's work concernsthe problem of respectto the general problem of commonsenseknowledge (see validation for a legal reasoning program or, viewed another Reasonitg, commonsense),which is a central problem for curwaY, the allocation of responsibility between the providers of rent AI (0S,66). such programs and their hypothetical users.To the extent that a program reaches definite conclusions(as opposedto finding analogies, for instance), there is an implicit claim that a user Other Current Proiects. Two other representative projects can rely on them. On some legal issues no such claim can be are those of Rissland at the University of Massachusettsand made-as in McCarty's stock dividend case,on which the jus- Waterman and Peterson at the Rand Corporation. This pair of tices of the Supreme Court disagreed. The question then projects continues the contrast, introduced above, between arises, on what kinds of issues (if any) is it appropriate for a case-basedand rule-based approachesto legal reasoning. program to produce an answer? A program's conclusionabout Rissland's special interest is the use of examplesin reasona legal issue must always be understood as only a default ing. Originally concernedwith the elements of mathematical
LAW APPLICATIONS
knowledge, in which examples figured prominently (67), she has turned to law as another domain where cases have an important role. The current work (68-71) deals largely with hypothetical cases, especially as they may be used in law teaching and in developing arguments in preparation for litigation. The principal domain is the law of trade secrets as applied to the protection of software. Each case,hypothetical or actual, is represented in a frame-basedformat (seeFrame theory), where the slots are chosento correspondto classesof facts, called dimensions, that have been identified as legally significant in the secondary literature. Unlike most projects, which take as fixed the description of the case at hand, Rissland's HYPO program treats the description as a hypothetical situation that is to be modified, along specifieddimensions,to produce a new hypothetical casethat is stronger for a specified party. The modifications are guided by comparison with casesstored in the knowledgebase;they are to be usedfor purposessuch as developing an argument from the precedents, assessingthe strength of one's case, and coping with uncertainty about what will be proved at trial. Another aspectof the project involves the classification of detailed moves in argumentation, partieularly the various purposes that can be served by introducing new facts or deleting facts from one's description of a situation. Waterman and Peterson'sproject (72,79, unlike Rissland's, follows the paradigm of a rule-based system (qv). Noteworthy in this project is the kind of expert behavior to be modeled. Given a description of a personal injury claim, the question for the system is not how the caseshould be argued or decidedbut what dollar amount represents the "worth" or settlement value of the case.Among experiencedlawyers and insurance claims adjusters, Waterman and Peterson find that this amount dependsnot just on the law but also on matters such as conventions for assigni.tg a dollar value to pain and sufferirg, the nearness of the trial date, and the skill of the attorneys on both sides.Originally, the knowledgeengineering approach to determining case worth was described as a way of doing empirical research into out-of-court-settlement practices (72). Currently, however, the authors are developing a system to estimate settlement values for asbestos personal injury claims, and this system is reportedly to be used in litigation pending in the U. S. District Court (74). Finalty, there are several other projects for legal analysis that have narrower goals, different methods, or both. Welch and Finan (75,76) report a program, LAWGICAL, which is similar to JUDITH @D but with a different user interface. Hellawell (77,78) has written BASIC programs for applying particularly complicated provisions of the tax law; the most recent (78) makes some use of AI search methods. Several writers (79-83) have looked at PROLOG as a language for legal analysis programs. A serious question here, only beginning to be addressed,is PROLOG's treatment of failure as negation, given that legal rules may or may not be given this interpretation and that statements of legal problems may or may not satisfy the closed-worldassumption (84). Most ambitious is Stamper'sLEGOL project (85-88), whosegoalsinclude creating a language in which all legal rules can be stated and automating the application of the rules in routine situations. However, the author explicitly dissociateshimself from AI, favoring instead an "information systemsapproach" (89). The result is described as a high-level system specification language intended for database applications (87).
Tasksfor Law Applicationsof Al In the application of AI to law, the projects described above represent the most extensive line of work so far. Although there is considerable variation among them, each assumes that there is a present situation, often a dispute, with respect to which the program is to suggest issues, arguments, or an outcome. Other legal tasks are possible: Some AI work has been done on planning, drafting, and text retrieval. In addition, a few projects have been concerned not so much with a generic task as with providing assistance to government officials in administering particular provisions of the law (90,91). McCarty (92) has surveyed some of this work in terms of Hart's distinction between deep and surface systems (93). In the planning (qv) area Michaelson(94,95)has developed a program called Taxadvisor, which consults on income tax and transfer tax planning for individuals. Its recommendations cover topics such as retirement income, tax-sheltered investments, gifts, and will provisions. Written in EMYCIN (96), the program is describedas a rather straightforward application of cument expert system technology. The task of drafting legal documentshas been addressedby several writers, including Sprowl (97,98)and Boyd and Saxon (99,100).The documentsconsideredinclude wills, divorcecomplaints, and security agTeements;in general, they are those that can be composed largely of standardized passages,or "boilerplate," with some tailoring for the particular client. Both the selectionof passagesand the tailoring (as by filling in blanks) are computedon the basis of the program's representation of the law and on user-supplied information, some of which requires legal judgment. Boyd and Saxon conclude that to aid the attorney in exercising this judgment, the system should provide help messages and warnings from an extensive database, including statutory provisions, decisions, and related commentary, and amounting, for the domain of secured transactions, to "an informational filing system that is arranged around the transaction it governs" (1 0 1 ) . With this proposal Boyd and Saxon touch on the traditional of document retrieval. Here AI researchershave proble"r.u with the question of how to represent the condealt primarily tent of rtututes, decisions,and other sourcematerials. Hafner (102) has produceda semantic network (qv) representation for part of the law of negotiable instruments; Karlgren and walker, in a project called Polytext (103,104),have experimented with three different representations for a small set of rules regarding arbitration; and deBessonet(105) is working on a representation to support conceptualretrieval of the provisionsof the Civil Code of Louisiana. Blair and Maron (106) have recently shown the need for improved representationsto be even greater than is generally supposed.Studying standard full-text retrieval from a large database of documents to be used in the defenseof a single lawsuit (a "litigation support" system), they found that queries retrieved only about 207oof the relevant documents even though the lawyers and paralegals using the system believed the rate was over 7\Vo. Karlgr.tr and Walker, who also discussproblems with current retrieval systems,proposedesiderata for a better design-most significantly, that the system should be "a computer conference system that is endowed with more intelligence," not an oracle (107).
LAW APPLICATIONS
Law and CognitiveProcesses
461
large collection of papers (116, tI7); a second such conference took place in Florence in September 1985. Readers of the European work should be aware that continental law and AngloAmerican law stem from distinct traditions. Merryman (118) provides a readable introduction to their differences.
Newell and Simon, in their discussionof human problem solvitg, make a distinction useful for AI and law: between the demands of the task environment and the psychology of the problem-solving subject (108). Most of the projects described above are concerned primarily with the task environment. Stated differently, most are concernedwith what moves are BIBLIOGRAPHY legitimate in legal reasonirg, not with how any particular individual comesto find them. In a more psychologicalmode, 1. L. O. Kelso, "Does the law need a technologicalrevolution?" RockyMt. Law Reu.l8, 378-392 (1946). protocolanalysis has been proposedby Johnson,Johnson,and Little (109)for comparing expert and novice legal performance 2. J. Frank, Courts on Trial, Princeton University Press,Princeton, NJ, pp. 206-208, L949. and by Stratman (110) for studying lawyers' composition of arguments to an appellate court as well asjudges' deliberation 3. C. Tapper, Computersand the Law, Weidenfeld and Nicolson, London,L973. and opinion writing. Another line of work is concerned not so much with the 4. R. P. Bigelow (ed.;, Computers and the Law: An Introductory professional's thinking as with aspects of the law that are Handbook, 3rd €d., Commerce clearing House, chicago, IL, 1 9 8 1. familiar to everyone.Schank and Carbonell (111), looking at 5. H. L. A. Hart, "American jurisprudence through English eyes: newspaperheadlines like "Catawba Indians land claim supThe nightmare and the nobledreaml' Ga.Law Reu.11,969-9gg ported" and "Burma appeals to UN to settle border dispute (L977).Reprinted in Hart, Essays in Jurisprudence and Philosowith Thailard," propose a vocabulary of "basic social acts" phy, Clarendon Press, Oxford, 1988. that includes notions such as disputes, petitions to an author6. o. w. Holmes, "The path of the law," Haruard Law Reu. lo, 487ity, governmentaldecisions,and resolution of disputesthrough 478 (1897). Reprinted in Holmes, CollectedLegal Papers, Harother means (seeScripts). Using this vocabulary,Schank and court Brace, New York, I92L. Carbonell suggest representations for sentences including 7. B. N. Cardozo,The Nature of the Judicial Process,Yale Univer"the Supreme Court decided segregation is illegal"; "the cop sity Press, New Haven, CT, Ig2I. gave the speeder a ticket"; and "Nader brought suit against 8. E. H. Levi, An Introduction to Legal Reasoning, University of GM, but the matter was settled out of court." Dyer, in an ChicagoPress,Chicago,IL, 1949. ambitious project for natural-langu age story understanding 9. K. N. Llewellyn, The Bramble Bush: On Our Law and Its Study, (Ll2) (see Natural-langu age understanding; Story analysis), Oceana Publications, Dobbs Ferry, NY, 1960 (originally pubconcentrated on stories about divorce. Understanding them lished 1930). required some knowledge, in the form of "memory organiza10. K. N. Llewellyn, The CommonLaw Tradition: Deciding Appeals, tion packets" (qv) or MOPs, of things like marital contracts Little, Brown, Boston, 1960. and legal disputes,as well as an ability to representand recog11. H. L. A. Hart, The Conceptof Law, Clarendon press, Oxford, nize the characters' chains of reasoning, sometimes faulty, 19 6 1 . about the outcomeof a divorce case. 12. R. Dworkin, Taking Rights Seriously,Harvard University Press, Finally, law may be used as a source of illustrations of Cambridge,MA, L977. various kinds of cognitive processing.Rissland's work on ex13. H. L. A. Hart, Problems of Philosophy of Law, in p. Edwards amples,in law and mathematics (67-7U, could well be placed (ed.),The Encyclopediaof Philosophy, Vol. 6, Macmillan and The in this category. In current work at UCLA Dyer and Flowers Free Press, New York, pp. 2G4-276, L967. Reprinted in Hart, (113) emphasizethe richness of law as a domain for studying Essays in Jurisprudence and Philosophy, Clarendon Press, Oxcognitive processes,including natural-language use, memory ford, 1983. organization and retrieval, learning, analogy, argumentation, 14. G. Gilmore, The Ages of American Law, Yale LJniversity Press, and the commonsensebackground of legal expertise. Another New Haven, CT, 1977. such project is that of Bain (114) at Yale, whose theme is the 15. E. A. Jones, Jr. (ed.),Law and Electronics:The Chaltengeof a importance of a reasoner's own goals in interpreting the New Era, Proceedingsof the First National Law and Electronics actions of others. Bain uses the context of a mock plea-barConference,Lake Arrowhead, CA, october zt-zl, 1960, Matgaining session,with a prosecutor, a defenseattorney, and a thew Bender, New York, IgGz. judge as participants, to examine reasoning about whether a 16. H. W. Baade (ed.;, Jurimetrics, Basic Books, New york, 1968. criminal defendant's actions were justified, for instance by Originally published in Law Contemp.Probl. 28, L-270 (196g). self-defense. L7. L. E. Allen and M. E. Caldwell (eds.) Communication Further Reading Interest in law and AI has increased rapidly in the past few years, and several new collections of papers can be expected to appear soon. At the University of Houston Law Center, conferences on law and technology were held in 1984 (11b) and lggb and are expected to become an annual event. A panel on legal reasoning was conducted at IJCAI-85, the biennial International Joint Conference on Artificial Intelligence. In Europe a 1981 conference on logic, informatics, and law produced a
Sciences , and Law: Reflections from the Jurimetrics Conference,BobbsMerrill, Indianapolis, IN, 196b. 18. E. Mackaay and P. Robillard, "Predictingjudicial decisions:The nearest neighbor rule and visual representation of case patterns," Datenuerarbeitungim Recht gr g02-sg r (L974). 19. c. M. Haar, J. P. sawyer, Jr., and s. J. cummitrgs, ,,computer power and legal reasoning: A casestudy of judicial decisionprediction in zoning amendment cases,"Am. Bar Found,. Res.J. 1977,651-769. 20. J. Bing, Legal Norms, Discretionary Rules and Computer programs, in B. Niblett (ed.), Computer Scienceand, Law, Cambridge university Press,cambridge, u.K., pp. 119-186, 1gg0.
462
TAW APPLICATIONS
2I. M. Borchgrevink and J. Hansen, SARA: A System for the Analysis of Legal Decisions,in J. Bing and K. S. Selmer (eds.),A Decade of Computers and Law, Universitetsforlaget, Oslo, pp. 342-375, 1980. 22. A. S. Fraenkel, Legal Information Retrieval, in F. L. Alt and M. Rubinoff (eds.), Aduances in Computers,Vol. 9, Academic Press, New York, pp. 113-178,1968. 23. J. Bing and T. Harvold, Legal Decisions and Information Systems, Universitetsforlaget, Oslo, L977. 24. J. A. Sprowl, A Manual for Cornputer-AssistedLegal Research, American Bar Foundation, Chicago,IL, 1976. 25. L. E. Allen, "symbolic logic: Arazor-edged tool for drafting and interpreting legal documents,"YaIe Law J. 66,833-879 (1957). 26. L. E. Allen, "Beyond document retrieval toward information retrieval," Minn. Law Reu. 47,713-767 (1963). 27. L. E. Allen, Language, Law and Logic: Plain Drafting for the Electronic Age, in B. Niblett (ed.), Computer Scienceand Law, Cambridge University Press, Cambridge, U.K., pp. 75-100, 1980. 28. A. R. Anderson, "The logic of norms," Log. et Anal. l, 84-91 (1958).Reprinted in Ref. 17. 29. J. Stone, Law and the Social Sciencesin the SecondHalf Century, Lecture 3, University of Minnesota Press, Minneapolis, 1966. Another version is Stone, "Man and machine in the search for justice," StanfordLaw Reu.16' 515-560 (1964). 30. L. L. Fuller, "science and the judicial process,"Haruard Law Reu. 79, 1604-1628 (1966).
3 1 . W. E. Boyd, "Law in computersand computersin law: A lawyer's
view of the state of the art," Ariz. Law Reu. L4, 267-31L (1972). 32. J. E. Leininger and B. Gilchrist (eds.),Proceedingsof the AFIPSI Stanford Conferenceon Computers,Societyand Law: The Role of Legal Education, Montvale, NJ, June 25-27 , 1973. 33. P. Slayton, Electronic Legal Retrieval: A Report Preparedfor the Department of Communications of the Government of Canada, Information Canada, Ottawa, L974. 34. B. G. Buchanan and T. E. Headrick, "Some speculation about artificial intelligence and legal reasonirg," Stanford Law Reu. 23, 40-62 (1970). 35. L. Mehl, Automation in the Legal World: From the Machine Processingof Legal Information to the "Law Machine," in Mech' National Physical Laboratory anisation of Thought Proce.sses, Symposium No. 10, November 1958, Her Majesty's Stationery Office,London, pP. 757-787, 1959. R. 36. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum,and J. Lederberg, Applications of Artificial Intelligencefor Chemistry: The DENDRAL Project, McGraw-Hill, New York, 1980. 3 7 .A. Newell and H. A. Simon, GPS, a Program That Simulates Human Thought, in E. A. Feigenbaum and J. Feldman (eds.), Computersand Thoughf, McGraw-Hill, New York, pp.279-293, 1963. 38. T. G. Evans, A Heuristic Program to Solve Geometric Analogy Problems, in M. Minsky (ed.),Semantic Information Processing, MIT Press,Cambridge,MA, 1968. 39. P. B. Maggs and C. G. deBessonet,"Automated logical analysis of systemsof legal rules," Jurimetr. J. 12, 158-169 $972)-
40. Reference39, pp. 159-160. 4t. A. Newell and H. A. Simon, Human Problem Soluing, PrenticeHall, EnglewoodCliffs, NJ, p. 108, 1972.
42. w. G. Popp and B. schlink, "JUDITH, & computer program to advise lawyers in reasoning a case,"Jurimetr. J. 15' 303-3I4
(1e75). 4g. E. H. Shortliffe, Computer-Based Medical Consultations: CIN, American Elsevier, New York, 1976.
MY'
44. L. T. McCarty, "Reflectionson TAXMAN: An experiment in artificial intelligence and legal reasonirg," Haruard Law Reu. 9O, 837-893 Q977). 45. J. A. Meldman, A Preliminary Study in Computer-Aided Legal Analysis, MAC-TR-157,MIT. A condensedversion is Meldman, "A structural model for computer-aidedlegal analysis," R utgers J. Comput.Law 6,27-7L (1977). 46. G. Sussman,T. Winograd, and E. Charniak, Micro-Planner ReferenceManual (revised),A.I. Memo 203A, Artificial Intelligence Laboratory, MIT, 1971. 47. P. Szolovits,L. B. Hawkinson, and W. A. Martin, An Overview of OWL, a Language for Knowledge Representation,MIT/LCS/TM86, Laboratory for Computer Science, MIT, Cambridge, MA, 1977. 48. Reference45, pp. L32-133. 49. Reference45, p. 16. 5 0 . J. Stone, Legal System and Lawyers' Reasonings,Stanford University Press, Stanford, CA, 1964. 5 1 . L. T. McCarty, N. S. Sridharan, and B. C. Sangster,The Implementation of TAXMAN II: An Experiment in Artificial Intelligenceand Legal Reasonitg, LRP-TR-2,Laboratory for Computer Science Research, Rutgers University, New Brunswick, NJ, 1979. 52. L. T. McCarty and N. S. Sridharan, The Representationof Conceptual Structures in TAXMAN II: Part One: Logical Templates, LRP-TR-4, Laboratory for Computer ScienceResearch,Rutgers University, 1980. A shorter version is McCarty and Sridharan, The Representationof an Evolving System of Legal Concepts:I. Logical Templates,Proceedingsof the Third Biennial Conference of the Canadian Societyfor ComputationalStudiesof Intelligence, Victoria, 8.C., pp. 304-311, 1980. 53. L. T. McCarty and N. S. Sridharan, A Computational Theory of Legal Argument, LRP-TR-l3, Laboratory for Computer Science Research,Rutgers University. A shorter version is McCarty and Sridharan, The Representation of an Evolving System of Legal Concepts: II. Prototypes and Deformations, Proceedingsof the SeuenthInternational Joint Conferenceon Artificial Intelligence, Vancouver,8.C., pp. 246-253, 1981. 54. N. S. Sridharan, AIMDS User Manual, Version 2, CBM-TR-8$' Department of Computer Science, Rutgers University, New Brunswick, NJ, 1978. 55. L. Wittgenstein, PhilosophicalInuestigations, 3rd ed., G. E. M. Anscombe(trans.), Macmillan, New York, 1958. 'Meaning,' in K. Gunderson(ed.), 56. H. Putn&ffi, The Meaning of Language, Mind, and Knowledge,Minnesota Studies in the Philosophy of Science,Vol. ?, University of Minnesota Press,Minneapolis, pp. 131-193, L975.Reprinted in Putnam, Philosophical Papers,Vol. 2, Mind, Language and Reality, Cambridge University Press,Cambridge,U.K., pp. 2L5-271, 1975' 57. M. Minsky, A Framework for RepresentingKnowledge, in P. H. winston (ed.),The Psychologyof computervision, McGraw-Hill, New York, pp. 2lt-277, L97558. Reference53, P. 7. bg. D. Nagel, ConceptLearning by Building and Applying Transformations between Object Descriptions, LRP-TR-l5, Laboratory for Computer ScienceResearch,Rutgers University, New Brunswick, NJ, 1983. 60. S. Kedar-Cabelli, Analogy with Purpose in Legal Reasoning from Precedents:A Dissertation Proposal, LRP-TR-l7, Laboratory for Computer ScienceResearch,Rutgers University, New Brunswick, NJ, 1984. 61. L. T. McCarty, Permissions and Obligations, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, PP. 287-294, 1983'
LAW APPTICATIONS 62. A. v.d.L. Gardner, An Artificial Intelligence Approaclt to Legal Reasoning, Bradford Books/The MIT Press, Cambridge, MA, 1987. Gg. A. v.d.L. Gardner, Overview of an Artificial Intelligence Approach to Legal Reasoning,in C. Walter (ed.),Computing Power and.Legal Reasoning, west Publishirg, st. Paul, MN, pp. 247274, 1985. 64. M. R. Genesereth,R. Greiner, M. R. Grinberg, and D. E. Smith, The MRS Dictiontty, Memo HPP-S0-24,Stanford Heuristic Programming Project, Stanford University, December1980,revised January 1984. 65. J. McCarthy, Some Expert SystemsNeed Common Sense,in H. R. Pagels (ed.;, Computer Culture: The Scientific, Intellectual, and, Social lrnpact of the Compufer, New York Academy of Sciences,New York, pP. 129-137, 1984. 66. J. McCarthy, What Is Common Sense?AAAI Presidential Address, National Conference on Artificial Intelligence, Austin, TX, 1984. G7. E. R. Michener, "Understanding understanding mathematics," Cog.Scl. 2,36t-383 (1978).
463
82. D. A. Schlobohm,TA: A Prolog Program which Analyzes Income Tax Issuesunder Section318(a)of the Internal RevenueCode,in C. Walter (ed.), Computing Power and Legal Reasoning, West Publishitg, St. Paul, MN, pp. 765-815, 1985. 83. C. D. MacRae, User Control Knowledge in a Tax Consulting System, in L. F. Pau (ed.), Artificial Intelligence in Economics and Management,North-Holland, Amsterdam, 1986. 84. R. Reiter, On Reasoning by Default, rn TINLAP-Z: Theoretical Issuesin Natural Langua,geProcessing-2,Urbana, IL, pp. 2102L8, 1978. 85. R. K. Stamper, The Automation of Legal Reasoning:Problems and Prospects,in J. Madey (ed.), SelectedTopics in Information Processing:IFIP -INFOPOL-76, Proceedings of the IFIP-INFOPOL Conference on Information Processitg, Warsaw, Poland, March 22-27 , L976, North-Holland, Amsterdam, pp. 433 -447 . 86. R. K. Stamper, "The LEGOL 1 prototype system and language," Comput.J.20, 102-108 (L977). 87. S. Jones, P. Mason, and R. StamP€r,"LEGOL 2.0: A relational specification language for complex rules," Inf. Sys. 4, 293-305 (197e).
68. E. L. Rissland, Examples in Legal Reasoning:Legal Hypotheticals, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, PP. 90-93, 1983. 69. E. L. Rissland,E. M. Valcarce,and K. D. Ashley, Explaining and Arguing with Examples, Proceedings of the Fourth National Conferencean Artifi,cial Intelligence, Austin, TX, pp. 288-294, 1984. 70. E. L. Rissland,Argument Movesand Hypotheticals,in C. Walter (ed.), Computing Power and Legal Reasoning,West Publishitg, St. Paul, MN, pp. 129-L43, 1985. 7t. K. D. Ashley, Reasoning by Analogy: A Survey of SelectedA.I. Researchwith Implications for Legal Expert Systems,in C. Walter (ed.),Computing Power and Legal Reasoning,West Publishitg, St. Paul, MN, pp. 105-t27 , 1985. 72. D. A. Waterman and M. A. Peterson,Models of Legal Decisionmaking, Report R-27I7-ICJ, Institute for Civil Justice, Rand Corporation, 1981. 73. D. A. Waterman and M. A. Peterson, "Evaluating civil claims: An expert systems approach,"Expert Sys. 1' 65-76 (1984).
88. R. Stamp€r, LEGOL: Modelling Legal Rules by Computer, in B. Niblett (ed.),Computer Scienceand Law, Cambridge University Press,Cambridge,U.K., pp. 45-71, 1980.
74. M. A. Peterson, "New research tools to watch for," Calif. Law. 5(3), L9-21 (1985). 75. J. T. Welch, "LAWGICAL: An approachto computer-aidedlegal analysis,"Akron Law Reu. 15, 655-673 (1982).
94. R. H. Michaelson, A Knowledge-BasedSystem for Individual Income and Transfer Tax Planning, Ph.D. Dissertation, Department of Accountancy, University of Illinois at Urbana-Champaign, t982. 95. R. H. Michaelson, "An expert system for federal tax planning," Expert Sys. l, 149-167 (1984).
76. J. P. Finan, "LAWGICAL: Jurisprudential and logical considerations," Akron Law Reu. 15, 675-711 (1982). 77. R. Hellawell, "A computer program for legal planning and analysis: Taxation of stock redemptions," Columbia Law Reu. 80, 1363-1398(1980). 78. R. Hellawell, "SEARCH: A computer program for legal problem solving,"Akron Law Reu.15' 635-653 (1982). 79. A. Hustler, Programming Law in Logic, ResearchReport CS-8213, Department of Computer Science,University of Waterloo, Ontario, 1982. 80. M. Sergot, Prospects for Representing the Law as Logic Programs, in K. L. Clark and S.-A. Tarnlund (eds.),Logic Programrning, Academic Press, London and New York, pp. 33-42, L982. 80a. M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriw aczek,P. Hammond, and H. T. Cory, "The British Nationality Act as a logic program," CACM 29,370-386 (1986). 81. T. F. Gordon, Object-Oriented Predicate Logic and Its Role in Representing Legal Knowledge, in C. Walter (ed.), Computing Power and Legal Reasoning,West Publishitg, St. Paul, MN, pp. 163-203, 1985.
89. Reference85, p. 445. 90. J. R. Buchanan and R. D. Fennell, An Intelligent Information System for Criminal Case Management in the Federal Courts, Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge, MA, pp. 901-902, L977. 9 1 . J. L. Feinstein, A Knowledge-BasedExpert System Used to Prevent the Disclosure of Sensitive Information at the United States Environmental Protection Agetrcy,in C. Walter (ed.),Computing Power and Legal Reasoning,West Publishitg, St. PauI, MN, pp. 661-697, 1985. 92. L. T. McCarty, "Intelligent legal information systems:Problems and prospects,"R utgers Comput. Technol. Law J. 9, 265-294 (1983).Also in C. Campbell (ed.),Data Processingand the Law, Sweet & Maxwell, London, 1984. 93. P. E. Hart, "Directions for AI in the eighties," SIGART Newsl. (79),11-16 (1982).
96. W. Van Melle, A. C. Scott,J. S. Bennett, and M. A. S. Peairs,The EMYCIN Manual, Report No. STAN-CS-81-885,Department of Computer Science,Stanford University, 1981. 9 7 . J. A. Sprowl, "Automating the legal reasoning process:A computer that uses regulations and statutes to draft legal documents,"Am. Bar Found. Res.J.1979, 1-81. 98. J. A. Sprowl and R. W. Staudt, "Computerizing client servicesin the law schoolteaching clinic: An experiment in law office automation," Am. Bar Found. Res.J. 1981,699-751. 99. W. E. Boyd and C. S. Saxon, "The A-9: A program for drafting security agreements under Article 9 of the Uniform Commercial Code,"Am. Bar Found. Res.J. 198f, 637-669. 100. C. S. Saxon, "Computer-aideddrafting of legal documents,"Am. Bar Found. Res. J. 1982,685-754. 1 0 1 . Reference99, p. 656. L02. C. D. Hafner, An Information Retrieual SystemBased on a Computer Model of Legal Knowledge, UMI Research,Ann Arbor, MI, 1 9 8 1.
464
L E A R N I N GM , ACHINE
103. H. Karlgren and D. E. Walker, The Polytext System: A New Design for a Text Retrieval System, in F. Kiefer (ed.),Questions and Answers, D. Reidel, Dordrecht, The Netherlands, pp. 278294, 1993. L04. D. E. Walker, "The organization and use of information: Contributions of information science,computational linguistics and artificial intelligence,"J. Am. Soc.Iruf.Sci. 32,347-g6g (1981). Another version of the paper is Walker, Computational Strategies for Analyzing the Organization and Use of Information, in S. A. Ward and L. J. Reed (eds.),Knowledge Structure and [Jse: Implications for Synthesis and Interpretation, Temple University Press,Philadelphia, pp. 229-284, 1983. 105. C. G. deBessonetand G. R. Cross,Representationof Some Aspects of Legal Causality, in C. Walter (ed.), Computing Power and Legal Reasoning, West Publishirg, St. Paul, MN, pp. 2052 L 4 ,1 9 9 5 . 106. D. C. Blair and M. E. Maron, "An evaluation of retrieval effectiveness for a full-text document-retrieval system," CACM 28, 289-299 (1985).
ing are presented,including methods for learning from examples, learning in problem solvirg, learning by analory, grammar acquisition, and machine discovery. In each case the techniques are illustrated with paradigmatic examples. Reasons for StudyingMachineLearning
One of the defining features of intelligence is the ability to learn. Thus, rnachine learning is a central concernof the field of AI. Upon closerinspection,three clear reasonsfor this concern become apparent. The first of these revolves around expert systems (qv) that, despite their success,often require man-years to construct and perfect. The bulk of the work goes into developing and debugging extensive domain-specific knowledge bases.A better understanding of the learning processmight allow automation of the construction of expert systems, and this in turn would greatly speedthe developmentof applied AI systems. 1 0 7 . Reference103, p. 288. The secondreason for studying machine learning is more theoretical. Many AI researchers find expert systems unat108. Reference4L, p. 55. tractive becausethey lack the generality that sciencerequires 109. P. E. Johnson, M. G. Johnson, and R. K. Little, "Expertise in of its theories and explanations.On this dimension,the study trial advocacy:Some considerationsfor inquiry into its nature and development,"CampbellLaw Reu. 7, tL9-143 (1984). of learning may reveal general principles that apply across 1 1 0 . J. F. Stratman, "Studying the appellate brief and opinion com- many different domains. Artificial intelligence has already posing process:A window on legal thinkirg," Juris 19(1),9-14 discovered fairly general principles in problem solving (qv) (Fall 1984);(2), L2-19 (Winter 1985). and search, (qv) and machine learning holds the potential for 111. R. C. Schank and J. G. Carbonell, Jr., Re: the Gettysburg Adsimilar results. dress-Representing Social and Political Acts, in N. V. Findler A third research goal involves modeling human learning (ed.),AssociatiueNetworks: Representationand Use of Knowlmechanisms.Generality is a central concernin this endeavor edge in Computers, Academic Press, New York, pp. 327-362, as well since all humans share a basic cognitive architecture 1979. but behave quite differently in similar circumstances.A major LLz. M. G. Dyer, In-DepthUnderstanding: A ComputerModel of Intedeterminant of such variation is the experience and level of grated Processing for Narcatiue Comprehension, MIT Press, knowledge of each individual. Thus, understanding human Cambridg., MA, 1983. learning mechanismsprovides one path toward explaining the 113. M. G. Dyer and M. Flowers, Toward Automating Legal Experinvariant features of the human information-processing systise, in C. Walter (ed.;,ComputingPower and Legal Reasoning, tem. However, useful applications would also emerge from a West Publishirg, St. Paul, MN, pp. 49-68, 1985. deeperunderstanding of human learning sincethis would proLI4. W. M. Bain, Toward a Model of Subjective Interpretation, vide insights into the educational process,leading to better YALEU/CSD/RR No. 324, Department of Computer Science, design of both classical teaching materials and intelligent auYale University, New Haven, CT, 1984. tutoring systems. tomated (ed.), 115. Power and Legal Reasoning, C. Walter Computing West Publishirg, St. Paul, MN, 1985. 116. C. Ciampi (ed.),Artificial Intelligenceand Legal Information Systems, edited versions of selectedpapers from the International Conferenceon "Logic, Informatics, Law," Florence, Italy, April 1981,Vol. 1, North-Holland,Amsterdam, L982. LL7. A. A. Martino (ed.),Deontic Logic, ComputationalLinguistics, and Legal Information Systems,edited versions of selectedpapers from the International Conferenceon "Logic, Informatics, Law," Florence, Italy, April 1981, Vol. 2, North-Holland, Amsterdam, L982. 118. J. H. Merrym an, The Ciuil Law Traditian: An Introduction to the Legal Systems of WesternEurope and Latin Americo, Stanford University Press, Stanford, CA, 1969.
A. v.n.L. GInDNER Stanford Universitv
L E A R N I N CM, A C H I N E The field of machine learning studies computational methods for acquiring new knowledg", new skills, and new ways to organize existing knowledge. In this entry some of the basic techniques and principles that underlie AI research in learn-
A Brief Historyof MachineLearning The interest in computational approachesto learning dates back to the mid-1950s and the beginnings of AI. However, early learning techniques tended to focus on numerical encodings and parameter-tuning techniques. This contrasted with AI's growing emphasis on symbolic representations and heuristic methods, and in fact the early research on machine learning was more closely affiliated with the field of pattern recognition (qr) than it was with AI itself. Learning researchers were especially concernedwith issues of generality and attempted to construct systems that learned with very little initial knowledge. This stage continued until the mid-1960s, when AI researchersfirst began to shift their attention to purely symbolic systems and knowledge-intensiveapproaches.In this period most researchers avoided issues of learning while they attempted to understand the role of knowledge in constraining search.However, somework in machine learning continued in the background, this time borrowing the symbolic representations and heuristic methods that had becomecentral to AI. It was during this stage that the first significant work on concept
L E A R N I N GM , ACHINE
some open problems in the area. The reader is encouragedto peruse other reviews of machine learning such as papers by Mitchell (9), Dietterich and Michalski (10),and Carbonell,Michalski, and Mitchell (11). In addition, there are two collected volumes of machine-learning research (L2,13) and Ref. L4, which present recent results in this active area. LearningConceptsFrom Examples
+ Figure 1. Positive and negative instances of "arch."
learning (qv) and language acquisition (qt) was carried out, which laid the basis for later efforts. In the late 1970s a new interest in machine learning emerged with AI and grew rapidly over the courseof a few years. Researchin conceptlearning and language acquisition continued, but this was joined by work on learning in the context of problem solving (qv), 8s well as work on taxonomy formation, analogical reasonitg, and machine discovery. Well-established methods were used to aid the construction of expert systems, and new methods were constantly being formulated and tested. A substantial fraction of learning researchershad always been concernedwith human learning, and this undercurrent continued into the new period. The number of published papers on machine learning increased dramatically, and the trend continues unabated. With this brief history as context, consider the problems and methods of machine learning which are discussedbelow in more detail. The entry is organized accordingto five categorical tasks that have been addressedin the machine-learning literature-learning from examples, learning search heuristics, learnins by analogy, graffimar acquisition, and learning by discovery. Taken together, these problem classescover the vast majority of researchthat has been carried out in machine learning. Unfortunately, there is not enough spaceto describe all approachesto learning, only the most widely applied symbolic methods.The omitted work includes genetic-learningalgorithms (1,2), connectionistmodelsof learning (B),chunking and macro operator acquisition (4,5), learning from instruction (6,7), and knowledge-acquisition aids for expert systems (8). In each case the learning tasks are described which consider the main methods that have been employed,and identify
The task of learning concepts from examples is the most widely studied problem in machine learning (see Concept learning). Concept formation appears straightforward: Given examples and counterexamples of some concept,generate an intensional definition of that concept. This definition should cover all the examplesbut none of the counterexamples,and it should correctly classify future instances.Despite its apparent simplicity, there are hidden complexities and multiple approaches, the primary ones are considered below. However, directly below is an example that clarifies the process and points out some of the problems. Example: Learning the ARCH Concept. Perhaps the best known research on learning from examples is Winston's (15) work on the "arch" concept. Figure 1 presents one example (positive instance) of this concept and one counterexample (negative instance). Given these instances, one might conclude that "An ARCH consists of two nontouching vertical blocks and one horizontal block." This intensional definition coversthe positive instance and excludesnegative instance. Of course,one could define ARCH extensionally, 8s the union of all positive examples of ARCH ever encountered. However, the conceptshould be as simple as possible,and should predict the classesof new instances. Although the initial definition given above is almost certainly incorrect, there is hope that it will eventually converge on the correct description of the concept. Considerthe two instancesshown in Figure 2. Upon consideration of the positive instance, one realizes that the above conceptof arch is too restrictive since it excludesthis instance. Therefore, the conceptis revised to "An ARCH consistsof two nontouching vertical blocks and one horizontal obj ect." However, this new hypothesis also covers the new negative instance, suggesting that it is overly general in some respect. Revising the definition to exclude this nonexample,one might get "An ARCH consistsof two nontouching vertical blocks and a horizontal object that rests atop both blocks." One can continue along these lines, gradually refining the conceptto include all positive instancesbut none of the nega-
+ Figure 2. Additional positive and negative examples of ,,arch.,,
LEARNING, MACHINE
tives. New positive instances that are not coveredby the current hypothesis (errors of omission) point out that the concept being formulated is overly specific,whereas new negative examples that are covered by the hypothesis (errors of commission) indicate it to be overly general. We have not been very specific about how the learner responds to these two situations, but some of the alternatives are consideredbelow. Most systems that learn from examples employ these two types of information, though one will see that they use them in quite different ways. Learningfrom Examplesas Search.As Mitchell (9) and Dietterich and Michalski (10) have pointed out, all AI systemsthat learn from examples can be viewed as carrying out search (qv) through a space of possible concepts (see Concept learnitg). However, such "hypothesis spaces" ate unusual along a number of dimensionsoand they are worth considering in more detail. For this purpose,another example is used that learns to distinguish between diseasedcells and healthy ones(seeMedical-advice systems). This example is isomorphic to one constructed by Mitchell (9); his features have simply been replaced with biological ones. Figure 3 presents five sample cells, three of which are identified with diseaseX (PL,Pz,Pg) and two of which are not (l/r, Nr). Note that each ceII contains two bodies,and that each of these bodies has three attributes-number of nuclei (one or two), number of tails (one or two), and its color (light or dark). The fact that each cell contains two bodies will prove important in the discussionsbelow. Although a graphical representation has been used in the figure, one can also represent the instances in propositional terms. For example, one might represent P1 as {(two two dark) (one one light)}, where the first term in each list stands for the number of nuclei, the second represents the number of tails, and the final term stands for the color. The two propositions are enclosedin curly brackets to indicate that order is unimportant. Thus, the negative instanceN1 could be represented as either {(two one dark) (one one dark)) or as {(one one dark) (two one dark)}. Now that these instanceshave been examined,one can turn to the representation of concept descriptions (or hypotheses). Figure 4 presents two possible descriptions in the same graphical format as the instances. Note that only someof the features are present; this indicates that the missing features are consideredirrelevant. Thus, hypothesis(o) states that for a cell to predict diseaseX, itmust have one body with two nuclei and another body with one tail and a light color. In propositional terms, this description can be representedby {(two ? ?) (? one light)), where ? means that the value occupyingthat position is irrelevant. Now exarnine the bottom description (b). This has one less feature than hypothesis (o) and can be represented as {(two ? ?) (? ? tight)}. As a result, the intensional definition (b) covers more instances than (o). In such casesit can be said that (b) is more general than (o), and that (o) is more specificthan (b). Consider the overall structure of this spaceof hypotheses. Figure b shows a number of states in the description space' with the most specific hypotheses at the top and the most general one at the bottom. Note that the most specificdescriptions correspondto instancessince they have all features spec(c) has no feaified. In conirast, the most general hypothesis (? ?)}. The most ? ?) ? as tures given and can be represented {(? the generalis that space this about notice important thing to hypotheses some although is, That partial. is only ityordering
Pr
Pz
Nr
Ps
Nz
+
@W
+
@W W@
+
W@
cells. Figure 3. Positiveand negativeinstancesof diseased are related along the generality-specificity dimension, others are unrelated. For instance, hypotheses(o) and (b) in Figure 5 are both more specific than hypothesis (c), but neither is more specificthan th; other. It is this partial ordering that leads to slarch through a sizable lattice of potential concept descriptions in learning from examples. The generality dimension also suggeststwo classesof operators for moving through this problem space-one can make an existing hypothesis more general or one can make it more specific. fn"t" options also suggest two basic schemes for slarching the space of concept descriptions. In the first, one begins *itt the most general hypothesis, and as new instances are encountered, more specific descriptions are produced. In the other, one begins with a very specifichypothesis, moving to more general descriptions as new data are observed.Both approa.h.r take advani ageof the partial ordering on hypothe-
LEARNING,MACHINE
OW specific
OW Figure 4. The generality ordering on concept descriptions.
467
ses to constrain search, and most machine-learning research has employed one or the other of these methods. Of course,one can combine both search directions, moving toward more general hypothesesin somecasesand toward more specificonesat other times. Whereas most concept-learningsystemshave organized search in the manner outlined above, a notable exception is the genetic-algorithm approach (2), which doesnot use the partial ordering. Taken together, the representation for states and the set of operators define a problem space(16), and the structure of this spacehas been examined for the task of learning from examples. However, one must still search (qv) the resulting spacein an effective manner, and again a number of possibilities emerge. Some researchers (15) have carried out a depth-first search(qv) through the conceptspace,and others (9) have used breadth-first search. Such exhaustive search methods are guaranteed to find the optimal concept definition but may prove prohibitively expensive. Other researchers have used numeric evaluation functions to direct a heuristic search; for example,Michalski (L7) has used a beam-search(qv) method. Specific-to-GeneralMethods. As shown above, one can search the space of concept descriptions in two alternative directions-from specificdescriptions to more general ones or from general hypothesesto specificones. Of course,there are many different ways to instantiate these basic methods, but this entry works with breadth-first versions since they are the simplest for tutorial purposes.Beginning with the specific-togeneral method, it is assumed that the learner is presented with the instances from Figure 3 in an incremental fashion, and the active hypotheses are examined after each instance has been processed. In the specific-to-generalscheme,the initial hypothesis is initialized to the first positive instance encountered by the learner. Thus, after observing instance Py from Figure 3, the system would create the initial hypothesisI/1 shown in Figure 6. This description is very specificand coversonly the instance
xNz ordering of the hypothesis space.
Figure 6. Searching from specific to general hypotheses.
468
LEARNING,MACHINE
on which it was based.When the next positive instance (Pz) is encountered,the system notes that F/1 fails to match, suggesting that this hypothesis is overly specific. Accordingly, the learner removesf/r and replacesit with more general hypothesesthat cover both P1 and P2. Note that in this case,there are two ways to make Ht general enough to cover both instances, the first (H r) ignoring the number of nuclei and the second (H) ignoring both the color and the number of tails. This only occurs for specific-to-generalmethods when two or more objects are involved since this opens the possibility for multiple mappings between objects.For example, one can map the left object in I/1 onto the left object tn P2 and the right objectin F/1 onto the right object in Py However, one can also map the left object inI/1 onto the right object inP2 and the right objectin I/1 onto the left object rn P2. Each such mapping can lead to different hypotheses.In contrast, if only a single object is involved, only one such mapping will be possible,eliminating the need for search. Also note that the new hypothesesare no more general than they need to be to cover the instances.That is, the minimally general hypothesesrequired to account for aII the positive data are preferred over more general hypotheses that may later prove unwarranted. Most AI learning systems incorporate this principle of conservatismwhen generating hypotheses. Although positive instances lead the system to formulate more general hypotheses,negative instances let it eliminate competitors. For example, the fact that H s coversthe nonexample N2 suggeststhat this hypothesis is overly general in at least one respect. Since only movement toward more general descriptions is allowed, the only option is to remove H s. Since H 2 doesnot coverNz, it is retained for further expansion.Note that this stratery relies heavily on the assumption that all instances are correctly labeled. Most machine-learning methods (but not all) rely on accurate data, making it difficult for them to handle noise. After processingthe first three instances,the learning system next encountersPs. Since the only remaining hypothesis (Hr) fails to cover this positive instance,the system generates more general descriptions. As before, two new hypothesesare created, the first (H ) referring only to the tails of the cell bodiesand the second(H ) referring only to their colors.When a secondnegative instance (Nz) is observed,the system notes that Haincorrectly coversthe cell. As a result, it removesthis hypothesis from consideration but retains H s, since the latter comectly fails to cover the instance. One must also check to ensure that new hypothesesdo not cover earlier negative instances,but H s fares well on this count as well. The learner would continue in this mode, producing more general descriptions when positive instances require this action and eliminating hypotheses when negative instances are incorrectly matched. No learning occurswhen correct predictions are made since the existing hypothesesare performing as desired. Note that the system has no explicit means for deciding when it has acquired the final conceptdefinition, but at each point it will have in memory the most specifichypotheses that account for all the data. Table 1 summarizes this specific-to-generalmethod for learning from examples.
Table 1. Searching from Specific to General Hypotheses Let H be the current set of hypotheses. Initialize H to the first positive instance p, and let the set of observed negative instances N - {z}. If p is the next positive instance, then: 1. For each hypothesis h e H that does not match p, replace h with the most specific generaltzatton(s) of h that will match p. 2. Remove from consideration all hypotheses that are more general than some other hypothesis in H. 3. Remove from H all hypotheses that match a previously observed negative instance n e N since they are overly general. Step 1 is usually accomplished by finding all mappings between p and h. If n is a new negative instance, then: Add n to the set of negative instances N. Remove from H all hypotheses that match ru since they are overly general.
elimination of existing descriptions.However, the roles played by positive and negative instancesare reversedin the generalto-specificapproach as shown in the example below. For this example, the set of instances are shown graphically in Figure 7, and the developmentof the searchtree is traced in Figure 8. In this approachone begins with the most general hypothesis possible.This is simply a cell with two bodiesand no additional features and is shown as hypothesisHl at the bottom of Figure 8. After this initializattonthe systemprocessesthe first object in Figure 7, which happens to be the positive instance Pr. Since H l correctly coversthis instance, no learning takes place at this point. However, it is useful to begin with a positive instance since this will constrain search in successive steps.
Pr
Nr
Nz
Pz
@ @ @ @
Methods. Although the general-to-speGeneral-to-Specific cific method differs from its alternative in the direction of search,the basic structures of the two methods are quite similar. As with the specific-to-generalmethod, new instances positive and negative instancesof diseasedcells. sometimes lead to new hypotheses and sometimes lead to the Figure ?. Additional
LEARNING,MACHINE
xPz
469
Table 2. Searching from General to Specific Hypotheses
Qd QO Q'
Let H be the current set of hypotheses.Initiali ze H to the most general possiblehypothesis, and let the set of observedpositive instancesP - {p}. If n is a negative instance, then: 1. For each hypothesis h e H that matches n, replace h with the most general specialization(s)of h that will not match n. 2. Remove from consideration all hypothesesthat are more specificthan some other hypothesis in H, 3. Remove from H all hypothesesthat fail to match a previously observedpositive instance p E P, since they are overly specific.
Nz
Nr
Step 1 is usually accomplishedby finding all differencesbetw een n and somep that is associatedwith h. It p is a positive instance, then:
Nr
( P r)
Figure 8. Searchingfrom specificto generalhypotheses.
The next instance (Nr) is negative. The initial hypothesisis general enough to match this object (indeed,the initial I/ will match any object),so it must be made more specific.There are two ways to accomplish this and still cover positive instance PL. In some casesthere will exist only one way in which to generate a more specifichypothesis.Winston (15) has used the term "near misses" to describenegative instancesthat lead to this situation, and he has emphasizedtheir role in reducing the search for conceptdescriptions.However, one cannot usually rely on their occurrence,so most learning systems have the ability to learn from "far misses" as well, albeit converging on the ultimate conceptdescription more slowly. The first involves adding the features of "one tail" and "two nuclei" to the different cell bodies; the other involves adding "one tail" and "one nucleus" features to the same cell body. These new hypothesesare labeled as H2 and f/3 in the figure. Note that in both casesit is necessaryto add two features in order to rule out matches against Nr. Also note that these hypothesesare no more specific than necessaryfor this purpose; this is the principle of conservatism in operation again. Supposethe system next encounters the negative instance N2 shown in Figure 7. Since both H2 andf/3 incorrectly cover this instance, both must be made more specific.In both cases there is only one way to accomplish this, each involving the addition of a single feature. The resulting conceptdescriptions are shown as H a and H s in Figure 8. Note that both of these cover the first positive instanceP1, which has been retained in memory as a constraint. Finally, the system encounters the new positive instance P2, which matches against Hs but not against H +. This suggests the first hypothesis is overly specific; and since one can only move toward more specificdescriptions, it is removed from consideration. This leaves H s as the only hypothesis, though the system would continue to make revisions as new data were gathered. As before, the method has no means to know when it has
Add p to the set of positive instancesP. Remove from H all hypotheses that fail to match p, since they are overly specific.
finished learning. Table 2 summarizes this general-to-specific method for learning from examples. Observe that this approach will generate the most general possible descriptions that cover the data, whereas the specific-to-generalmethod will produce the most specificpossible descriptions.Thus, one may want to employ one method or the other, depending on whether one desires optimistic or pessimistic rules. Given a sufficiently rich sampling of the instance space,however, both methods eventually convergeon the same conceptdescription. Combiningthe Approaches.As mentioned earlier, one can combine the specific-to-generaland the general-to-specificapproachesfor searching the spaceof concept descriptions.The combination provides some advantages that neither method exhibits in isolation. For example, Anderson and Klein (18) have employed a combined approach in which one begins with specific rules or hypotheses,generates more general descriptions as new positive instances are observed, and then producesmore specifichypotheseswhen these prove to be overly general. This scheme provides a form of backtracking (qv) without the need for memory of the searchpath, and they have also used it to generate disjunctive descriptions. Mitchell's version-spacemethod (9) combinesthe two techniques in a quite different manner. This method retains two sets of hypotheses-the most specific set of descriptions that cover the data (S) and the most general set of such descriptions (G). When new positive instancesare encounteredthat are not covered by any element of S, the method uses the first algorithm describedaboveto transform the S set into more general descriptions. Similarly, when new negative instances covered by some element in G come into play, the secondalgorithm above leads to more specificversions of the G set. One notable difference is that one no longer need retain either positive or negative instances. The S set summarizes the positive data and is used to eliminate overly specificmembers of the G set, and the G set summarizes negative instances and is used to detect overly general members of the S set. The version-spaceapproach has two interesting features. First, one knows when the learning task has been completed; this occurswhen the ,Sand G sets convergeon a single-concept description. Second,although the members of S and G have identical forms to the hypothesesdiscussedabove,their interpretation is somewhat different. Rather than representing hy-
470
IEARNING,MACHINE
tems. Quinlan (19) has called these TDIDT systems,which for top-down induction of decision trees. As the name Humidity Windy Class stands Temperature Outlook suggests,these learning systems do not represent conceptsas High False Hot Sunny conjunctionsof conditions but rather as decisiontrees. In addiTrue High Hot Sunny tion, these systemsconstruct their trees in a top-downfashion, + False High Hot Overcast and they are nonincremental in that they require all instances + False High Mild Rain be present at the outset. to + False Normal Rain Cool The earliest work in the TDIDT tradition was carried out True Normal Cool Rain by Hunt, Marin, and Stone (20), but Quinlan's ID3 (19,21) is + True Normal Cool Overcast the best known of these systems.The input to ID3 is a list of False High Mild Sunny + positive and negative instances of some concept, with each False Normal Cool Sunny + False Mitd Normal Rain instance represented as a list of attribute-value pairs, like + True Normal Mild Sunny those shown in Table 3 (the ID3 system is limited to handling + High True Mild Overcast attribute-value representations, since it relies on knowledge + False Hot Normal Overcast of attributes and their values in generating its decision tree). True High Mitd Rain The output is a decisiontree like that shown in Figure 9, with tests at each node for sorting instances down alternative " Data similar to that reportedin Ref. 19. branches. Terminal nodes contain the class of objects that have been sorted by all earlier decisionsin the tree, and which and sets the S G of members pothesesabout the concept itself, act as boundaries on the spaceof descriptions that are consis- are not further discriminable. Quinlan's system begins with only the top node of a nettent with the data. As more instances are gathered, these work and grows its decision tree in a top-down manner, one boundaries becomemore constraining, until eventually they branch at atime. At each point IDg uses an information theoverthe Basically, eliminate all but one concept description. (qu) retic evaluation function to determine the most discriminating apa constraint-satisfaction employs sion-spacemethod attribute; this score is based on the numbers of positive and proach to learning from examples, in contrast to the simpler negative instances associatedwith the values of each attribsearch-basedmethods describedabove. ute. For example, the instances in Table 3 contain four attriblearning on research utes-outlook, temperature, humidity, and presenceof wind. Most Trees. Decision Constructing from examples has employed some variant on the approaches Upon inspecting these instances,ID3 decidesthat the outlook attribute does the best job of distinguishing between the posidescribed above. However, it would not be fair to leave this sysof tives and negatives. As a result, the system creates three topic without mentioning another quite different class
Table 3. Sample Data for ID3"
outlook
sunny
overcast
windy
humidity
normal
+ Figure 9. A sample ID3 decision tree'
LEARNING,MACHINE
branches at the top level of its decisiontree, one for each value of the outlook attribute. The instances are then sorted down the appropriate branches, and the system checks to see whether all instances at a given node are positive. If so, this node is marked as terminal and labeled as leading only to positive instances.An analogousstep is taken when all instances sent to a node are negative. In Figure 9 this occurswith the overcastvalue of the outlook attribute, which contains only examples of the concept. However, if both positive and negative instances are shipped to a node, the tree-building processis applied recursively to this subset of the data. This occurs with both the sunny and rain values in the example. For the sunny subset the next most discriminating value is the humidity attribute, whereas the windy attribute has the best score for the rain subset.In both casesthe resulting sets contain only one type of instance, causing all nodesto be labeled as terminal and thus halting the tree-building process. The TDIDT approach differs from the methods described earlier along a number of dimensions.First, decisiontrees can easily represent many forms of disjunction, and systems like ID3 have no trouble acquiring such conceptsdespite the difficulty experienced by other methods examined. Second, the TDIDT scheme can be easily adapted to handle noise, and Quinlan (19) has run extensive experiments that show this capability. This results from the nonincremental nature of the algorithms, which let one compute statistical measures(such as f2). Another advantage is that ID3 and its relatives carry out very little search, relying instead on an evaluation function (i.e., the information theoretic discriminant measure in TDIDT) to select the best attribute at each point. This is in marked contrast to the breadth-first searchescarried out by Mitchell's version-spacemethod (9) and similar techniques. As might be expected,the TDIDT approach suffers from a different set of limitations. For instance, it is limited to attribute-value representations, whereas the other schemesexamined can deal with relational descriptions.Also, the nonincremental nature of TDIDT systems make them quite inefficient at incorporating new instances since they must recompute their trees from scratch when new data are encountered.In other words, trade-offs exist between the two approaches,and neither is superior to the other in any absolute sense. TheAq Algorithm. The TDIDT approachis not the only nonincremental method for learning from examples. Michalski
and his colleagueshave developeda family of programs based on the Aq algorithm, on which a variety of learning tasks (90,91)have been tested,including someinvolving noisy data. Instead of producing a decisiontree, the algorithm generatesa concept description stated as a disjunction of conjunctions, which is equivalent to a set of production rules. The Aq method starts by selecting someseedobjectfrom the positive instances and finding an optimal conceptdescription that covers this instance but none of the negative instances. The algorithm then removes all positive instances that are coveredby the description, selectsa new seedfrom the remaining set, and generates another concept description based on the new seed. This process continues until all positive instances have been covered by at least one of the generated descriptions. Michalski's algorithm carries out a beam search (qv) through the spaceof descriptions and, like Quinlan's method, it uses an evaluation function to constrain its search. The default-preference criterion favors conjunctive descriptions that cover the maximum number of positive instances.Therefore, if the algorithm cannot find a single conjunctive concept description, it generatesa disjunctive description with a small number of conjunctions.The Aq method carries out considerably more search than IDB and its relatives, but it can also handle relational descriptions, and Michalski and his collaborators claim that its concept descriptions are easier to understand than decision trees. They have also describedincremental versions of the Aq algorithm (gl,g2). Anafytic Approachesto Concept Learning.Nearly all research on learning from examples has focused on structural definitions of concepts,but humans clearly also employ knowledge of different functions that objects and actions might assume in recognition and learning. A brief return to the ARCH example clarifies this point. Figures 1 and 2 present positive and negative instancesof the ARCH concept,and it was fairly easy to formulate a structural description of ARCH from these examples. But now consider the positive and negative instancesin Figure 10. Structurally, neither object has much in common with the earlier ones, but in functional terms the object on the left makes a fine ARCH, although the rightmost object fails miserably. For a system to grasp such categorical distinctions, it must have knowledge about how arches are used and about why one might build them. Recent research in machine learning has started to address
+ Figure
471
10. More instances of ,,arch" -the
need for functional definitions.
472
L E A R N I N GM , ACHINE
this problem, examining ways to transform functional defini- that proves the training example is a positive instance of the tions into structural ones using a minimum of instances.This goal concept.The terminal nodesof the resulting explanation approach has been called analytical learning by some re- tree must be operational. Figure 11 presents such an explanasearchers,though it has been labeled explanation-basedlearn- tion for the positive instance of CUP given in Table 4. Note ing by others. Learning methods are termed analytical in the that the top node in the explanation tree refers to CUP, sensethat they use considerabledomain knowledge to reason whereas each of the terminal nodes refers to propositions in about why an objector event is a positive instance of a concept. the training example. The secondstep involves transforming They are explanation-basedin the sensethat they construct these terminal nodes into a set of sufficient conditions under explanations of why an object satisfies a functional definition, which the explanation will hold, i.e., finding the most general and use these explanations to construct a structural descrip- version of the existing proof consistent with the domain thetion. ory. This can be accomplishedby regressing the goal concept The analytic approach requires a slight redefinition of the through the explanation tree, replacing constants with varitask of learning from examples. Although the output is still ables when appropriate and unifying variables that are resomeintensional descriptionof the concept,only a single posi- quired to match against the same term. Table 5 presents the tive is given as input. In place of the data used by empirical operational definition that would be generated in this case. methods,the learner requires somedomain theory, stated as a This is an especially simple example since the explanation is set of domain axioms and rules of inference that can be used to only two levels deep, but more complex cases are normally generated. explain how an instance satisfies the concept.In addition, the learning system needs some test for determining when an opThe analytic approach to learning from examples has a erational definition has been formulated. Usually, this innumber of advantages over empirical methods. First, it provolves restating an initial functional definition in terms of vides a logical justification for the concept description that is structural features given in the training example. formed, although empirical approachesmake "unjustified" inTable 4 presents an instantiation of this task in which the ductive leaps from the data. Second,explanation-basedmethconceptto be learned is CUP; this example has been borrowed ods can learn from a single positive instance; negative infrom Mitchell, Keller, and Kedar-Cabelli (22). Note that the stancesare not required at all. Analytic methods also handle positive instance contains a number of irrelevant features, disjunctive concepts, since they find only sufficient (rather such as Sam being the owner; these are not retained in the than necessary)conditions on the concept.A new explanation final-concept description. Also note that the domain theory will be required when another disjunct is encountered(23),but right rule in least with CUP the side and that one contains at there is no assumption that the conceptcan be describedconit contains rules that mention features from the training ex- junctively. Finally, the approachhandles noisy data since the ample in the left side. Both are necessaryin order to generate explanation processwill be unable to explain misclassifiedinan operational definition of the concept. stancesand will thus discard them. The basic approach involves a two-step process.First, one However, the analytic approach to learning from examples usesthe domain theory to construct an explanation ("explana- also has some disadvantagescomparedto empirical methods. tion" is really used to mean "proof" of correctnessof a specific For one, it requires significant domain knowledge and thus solution to a problem given the axioms of the domain theory) can be applied in fewer situations. Whereas analytic methods require little search through the spaceof conceptdescriptions, they require significant search through the spaceof explanations. There are clear trade-offsbetween empirical and analytTable 4. Example of the Analytic Approach ical learning schemes.One direction for future researchis the Given: developmentof combinedempiri callanalytical methods,which Goal concept: The concept of a "cup." may give the best of both worlds. Training example Before closing this discussion of learning from examples, Part-of(object- 1, concavity- 1) yet another differencebetween the two methodsis pointed out. Color(object-l, red) Concavity(concavity- 1) Upward-pointing(concavity- 1) Owner(object-1, Sam) Part-of(object- 1, bottom- 1) Bottomftottom-1) Flat(bottom-1) Light(object-1) Part-of(object, handle- 1) Handle(handle-1) Length(handle-1, 5) Domain theory Open-vessel(o) & stable(o) & liftable (o) + cup(o) Part-of(a,b) & bottom(b) & flat(b) - stable(o) Part-of(o,c) & concavity(c) & upward-pointing(c) + open-vessel(o) Part-of(o,h) & handle(h) & light(h) - liftable(o) Operationality criterion: The concept must be defined in terms of predicates used in the examPleFind: An operational description of the goal concept that covers the training
example.
c uP( oBJ EC T - 1)
opEN - v ESSEL( O BJ EC T 1)
s T ABLE( O BJ EC T - 1)
PART-OF(OBJECT-1, c oryq4ylrY- I ) c o NcAVITY(CONCAVITY-1) uPwARD-Po lxriNG (c o NCAvITY-I )
LIF T ABL E(O B J E C T-1 )
LIG H T(O B J E C T-r) HANDLE-1) PART-oF(ogircr, H IN D l n( H A N D L E -l )
eenr-uo$SBiP(E51';BoJ:r)oM-r) FLAT(BOTTOM-1)
Figure 11. Explanation for an instance of "cup'"
MACHINE LEARNING,
473
in which the system may improve gradually its representation language. This is equivalent to changing the spaceof rules one The task of heuristics learning for symbolic integration may be is sear.nittg and, on the surface at least, appearsto be a much stated as follows: more challenging problem. Little work has been done in this integration: for Given: A problem space area, but Utgoff Q4) and Lenat (25) have made an interesting Initial states: start on the problem. f Another simplifying assumption that nearly all conceptI r2sin(x) dx learning researchersmake is that the conceptto be acquired is coss(r)&c J all or none. In other words, an instance either is an example of A set of operators: the concept or is not; there is no middle gfound. However' almost none of one's everyday concepts are like this. Some opl: IrfAldn:rIff*la* birds fit one's bird stereotype better than others, and some op2: Iud,r:u't)-lrd'" chairs are nearer to the prototypical chair than others. (Is a r -op3: J.f'tr) f(x)f-l(r) Dodo a bird? Is a Platypus a better bird? If a person sits on a log, is it a chair? Is it a better chair if stubby legs are added op4: J sin(r)d,x: -cos(r) and a secondlog is used as a backrest?) Unfortunately, all of Test for goals: No integral sign in the expression the existing concept-learning systems rely fairly heavily on A search strategy: Breadth-first search the sharp and unequivocal distinction between positive and Find: Heuristic conditions for each integration operator. negative instances,and it is not clear how they might be modified to deal with fuzzrly defined concepts such as birds and chairs. This is clearly a challenging direction for future research in machine learning, one where the functional apEmpirical learning methods move from specific data to some proach to concept classification enjoys some advantages over general rule or description, and in this sensethey are clearly the purely structural aPProach. doing induction (seeInductive inference),whether they search the description space in a general-to-specificor a specific-to- Learningand ProblemSolving general direction. In contrast, analytic learning methods plays an intransform some general description (e.g., a functional defini- The recognition and formation of conceptsoften such intelligence, of aspects many crucial in (e.g., role &r operational strumental tion) into someother general description to achieve plan actions and problems solve to ability the as focus to example one).Most analytic techniques use a training their attention and limit the search for explanations, but this goals. As humans gain experiencein a domain, they improve is not required in principle. Thus, analytic methods can be th"ir ability to solve problems in that domain, and one would probviewed as learning by deduction rather than by induction. like machine-Iearning systems that also improve their (qv) problem solving on AI research Most skills. lem-solving implications are its This is an important distinction, but all of has focusedon methods for searching someproblem space,and not yet understood. within this framework there are three clear roles for learning. SomeOpen Problemsin Learningfrom Examples.A number Alt three rely on the notion of an operator for moving through of problems remain to be addressedwith respect to learning states toward a goal in a problem space. The first approachinvolves learning heuristic conditions on from examples. Most of these relate to simplifying assumptions that have typically been made about the concept-Iearn- operators in order to direct the search process. The second involves acquiring macro operators in order to increase the ing task. For instance, many researchers have assumed that (i.e., classified). correctly are size of the steps taken through the problem space.The third all instances no noise is present no which in situations approach entails analogical transfer of expertise acrosssimireal-world many are there However, rule has perfect predictive power, and heuristic rules that are lar problem domains. First, we focus on the heuristics-learnonly usually correct must be employed. Some simple statisti- ing task, and in the next section we address learning by cally buffered learning methods [such as Quinlan's (21)] can be analogy. adapted to deal with noisy data sets, whereas the incremental Taskof HeuristicsLearning.In order to understand the namethods (such as the version-spacemethod) seem less adaptable. Biological systems, on the other hand, appear to learn ture of heuristics (qv) and how they may be learned, recall that incrementally in the presenceof considerablenoise, but it re- search involves states and operators. A problem is defined in mains an open problem to develop tractable computational terms of an initial state and a goal, and operators are used to methods for accomplishing both objectives simultaneously. transform the initial state into one that satisfies the goal. Trade-offs exist between an ability to deal with noise and the Search arises when more than one operator (or more than one number of instances required for learning, but it would be instantiation of an operator) can be applied to a given state, requiring consideration of different alternatives. Some conuseful to know the exact nature of such relationships. A related simplification is the assumption that the correct straints are given in terms of the legal conditions on each operator, but these constraints are seldom sufficient to reduce language for representing the conceptis known in advance.If repreincorrect search to tractable proportions for interesting problems. In or incomplete an employs a learning system order to accomplish this, the learner must also acquire heurissentation for its concepts,it may be searching a rule spacethat does not contain the desired concept. One approach is to con- tic relevance conditions on the operators. Table 5 states the heuristics-Iearning task for the domain of struct as gooda rule as possiblewith the representation given; a system that can deal with noise can handle incomplete repre- symbolic integration, which has been studied by Mitchell, sentations in this manner. A more interesting approachis one Utgoff, and Banerji (26) and by Porter and Kibler (27). Given a
Table 5. Learning Heuristics for Symbolic Integfation
474
MACHINE LEARN|NG,
problem spacefor integration, one must find heuristic conditions on each integration operator. Consider the sample problem r r2sin(r) &c J and its solution as shown in Figure 12. This problem involves two applications of integration by parts, an especially nasty operator that always has at least two instantiations. The optimal solution path for this problem proceeds down the Page, and undesirable steps are shown to the side. An ideal set of heuristics would guide the problem solver down the optimal path, ignoring the side Paths. AssigningCredit and Blame. The heuristics-learningtask is simplified by the nature of problem-solving operators.In general, the operators tend to be independent of each other, and this suggests the problem-reduction (qv) approach to heuristics learning: 1. Divide the task into a number of subproblems, one for each operator. Z. Formulate the heuristic conditions that determine the circumstanceswhen an operator is useful. g. Recombinethe rules into a complete heuristic search system. Nearly all work on heuristics learning has taken this basic approach. Note that this scheme transforms the heuristicslearning task into a number of learning-from-examplestasks,
13f s sin(") - [ 13f 3 cos(x) dx
! z2 sin(r) dz
du :
sin(r)
-r2
cos(r) + / 2r cos(r) dr
-rz
co s(r) +z I z cos( z) dr
- r 2 c o s ( r )+ z ( r
! z2 sin(x) dx
sin(r) - /s;n(z))
-12 cos(x)+zr srn(z)-zI
*x,2 co s(r\ +zr
dz
srn(z)
si n (r ) + 2 cos( r )
Integration oPerators: opl: / ,' f(r) dr + r I f(t) dz o p 2 :I u d u + u v ) - [ u d u op3: /'(") * I@) f'-t(") op4: I sin(r) dr + -cos(r))
Figure 12' A solution path for an integration problem'
one for each operator. Thus, the approach requires positive and negative instances for each operator, and these must be computed in some fashion since typically there is no tutor to provide them to the system. In fact, the subtask of generating positive and negative instances for each operator is closely related to the creditassignment problem, a classical difficult problem in machine learning from the early days of Samuel and his numerical parameter-tuning method for learning to play better checkers (2S) (see Checkers-playing programs). The problem occurs in situations in which the learner receives feedback only after it has taken a sequenceof actions. In order to improve its performance, the learner must assign credit to desirable actions and blame to undesirable ones, but this may not be easy. For instance, if one loses a chess game, the final move is seldom responsiblefor checkmate; usually some other (much earlier) move or set of moves led to this state, but identifying which move(s)may be very difficult (seeComputer chessmethods).In any case, actions deserving credit can be viewed as positive instances of an operator, and actions deserving blame can be viewed as negative instances. Given this framework, Iearning from examples is easily seen as an idealized case of heuristics learning in which a single operator is involved and for which the solution path is but one step long. No true search control is necessary for the performance component, since feedback occurs as soon as a single "move" has been made. Credit assignment is trivialized, sincethe responsiblecomponentis easily identified as the rule suggesting the "move." However, the general problem is a significant one, and learning from examples can be ""ty viewed as an artificial domain designedfor studying the charactertzation problem in isolation from other aspects of the learning pro.Lss. Heuristics learning is considerablymore difficult than learning from examples since the learner must generate its own positive and negative instances, the credit-assignment problem is nontrivial, and multiple concepts(oneper operator) are acquired concurrently. Within the problem-reduction approach to heuristics learnirg, three basic solutions to the credit-assignment problem have been explored: learning from solution paths, learning while doing, and the learning-apprentice approach. Consider each of them in turn. The first approach relies on waiting until a complete solution path to some problem has been found. Moves along the solution path are desirable since they lead the system toward the goal, and.moves branching off the solution path are undesirable since they lead away from the goal. Thus, the method of learning from solution paths involves two steps: marking every *oi. along the solution path as a positive instance of the responsible operator and marking every move leading directly off tn. solution path as a negative instance of the re,porrribl" operator. Note that this second decision is risky goal by since it is possible that a side path may lead to the from learn that systems anothe, ,o.rt". For this reason, most some or search breadth-first on paths rely complete solution .q,rirrulent schemeto ensure that side paths do not also lead to th; goal or at least do not do so as efficiently. Also note that moves two steps off the solution path are ignored in this approach;the blame lies with the operator that led offthe path in lft" first place, not to operators that applied afterward. For example, consider the solution path for the integration problem shown in Figure Lz. Moves along the solution path have been marked with a plus sign, and moves leading one
MACHINE LEARNING,
475
negative instances of the responsible operators. Once these moves have been labeled, they can be used to determine heu' ristic conditions on each of the operators.The task is reduced to the problem of learning from examples. Thus, there will exist a space of "concept descriptions" for each operator, in which the concept to be learned is "those states under which the operator should be applied." This spaceis partially ordered according to generality, and one can search the space using any of the methods described above. Once the conditions for each operator have been identified, they can be used to direct search so the performance system prefers desirable moves to undesirable ones. However, the task of heuristics learning does place some constraints on the method that is employed.In particular, the learning system must be able to generate both positive and negative instances of its operators. This posesno problem for general-to-specificsystems since they begin with overly general heuristics that lead naturally to search.Neither doesany problem arise for bidirectional approachessuch as Mitchell's version-spacemethod (13), since these can use the general boundary in proposing moves. In contrast, specific-to-general methods are naturally conservative, preferring to make errors of omission rather than errors of commission. Such an approach works well if a tutor is present to provide positive and negative instances, but it encounters difficulties if a system must generate its own behavior. In essence,d problem solver must attempt to solve new challenging problems to drive the learning, but these new problems do not arise by spontaneous generation. A general-to-specificmethod can composeproblems from different instantiations of overly general operator conditions, but a specific-to-generalmethod can only compose problems that the system can already solve well with the existing operators, unless it incorporates a means generating more general hypotheseswith data justification for the problem-compositionphase. Ohlsson (31) has reported a mixed approach in which specific rules are preferred, but very general move-proposingrules are retained and used in casesin which none of the specificrules are matched. However, in their pure form, specific-to-general methods do not seem appropriate for heuristics learning. The majority of research on heuristics learning has focused on empirical methods, but there has also been recent work on analytic approaches to this problem. In this framework one still requires a solution path from which to generate positive instances for each operator, but analytic methods are used to identify the heuristic conditions. Interestingly, one need not construct an explanation in these cases-the solution path itself sufficesas the proof that a move was desirable (22,23,36) since it lies along the path to the goal state. Nor doesone need additional domain rules since the operators themselves play this role. One must only reason backward from the goal state, using the legal constraints on each operator to determine the features of each previous state which allowed the final operator in the sequenceto apply. This processis applied to each operator along the solution path, generating a macro operator that is guaranteed to lead to the goal state. The method is very similar to that employedby Fikes, Hart, and Nilsson (37) in their early STRIPS system.An addedattraction is that one neednot worry about misclassifying side paths that actually lead to the FromInstancesto Heuristics.As shown above,the processof goal by another path; analytic methods do not use negative assigning credit and blame to moves made during the search instances in generating concept descriptions, so this is not a processis equivalent to labeling these moves as positive or problem. Mitchell et al. (2O have applied this approach to
step off this path have an associatedminus sign. It is important to real tze that the same operator may sometimes apply to a given state in multiple ways. Each of these is called an instantiation, and it is quite possible for one operator instantiation to be labeled as desirable and another to be marked as undesirable. Instantiations dependon the arguments to which an operator is applied. In chess,for instance' moving a rook requires stating which rook to move and what its destination square should be-one instantiation may be correct and another disastrous. Thus, the integration-by-parts operator can be instantiated against state 1 in two different ways. One of these leads to the goal state, and the other leads away from it. (Actually, the negative instance also leads to the goal state, but by a more circuitous path. If one desiresto learn heuristics for efficient integration, it makes senseto label the instantiation leading down this inefficient path as a negative instance.) Other positive and negative instances of an operator may occur elsewhere along the path; in this problem there are two other instances of the integration-by-parts operator, both applied to the expression-r2cos(r) + 2 I x cos(r) dr. Researchin this tradition has been caruied out by Anzar (29), Mitchell, Uteoff, and Banerji (26), and Langley (30). One limitation of the "learning-from-solution-paths" approach is that it encounters difficulty in domains involving very long solution paths and extensive problem spaces.Obviously, one cannot afford to search exhaustively in a domain such as chess. In response, some researchers have explored methods that assign credit and blame while the searchprocess is still under way. These techniques for learning while doing include schemesfor noting loopsand unnecessarilylong paths, dead ends, and failure to progress toward the goal. For example, instantiating the operator for integration by parts can eventually lead one back to an earlier state, and this provides a clear opportunity for learning to avoid unproductive actions even before the goal has been achieved.Systemsthat incorporate such "learning-while-doing" methods include Anzai's HAPS (20), Ohlsson'sUPL (31), Langley's SAGE.2 (32), and Minton et al.'s PRODIGY (23).Ironically, with the exception of PRODIGY, these systems have all been tested in simple puzzle-solving domains, where the "learning-from-solutionpaths" method is perfectly adequate. Therefore, a promising research direction would involve applying these and other methods to more complex domains with long solutions and extensive search spaces. The third credit-assignment method involves observing an expert and using his actions to distinguish desirable moves from undesirable ones.Mitchell et aI. (22) have called this the learning-apprentice approach, and it has natural applications to the semiautomated construction of expert systems(qv). The advantage of this scheme is that it lets the learning system avoid excessivesearch and at the same time provides immediate feedback about the desirability of moves. The disadvantage is that the learning system must rely on a tutor to lead it down the optimal-solution path. In many ways the learning apprentice approach transforms the heuristics-learning task back into the simpler task of learning from examples,but this is sometimes useful. Other work in this paradigm has been carried out by Brazdil (33),Neves (34),Kibler and Porter (35), and Minton Qg.
476
MACHINE LEARNINC,
learning search heuristics for symbolic integration, whereas Minton (36) has applied it to game-playing domains. Carbonell has explored a related approach in his work on problem solving by analogy (38)' During its attempt to solve a problem, Carbonell's system retains information not only about the operators it has applied but also about the reasons they were applied. Upon coming to a new problem, the system determines if similar reasons hold there, and if so, attempts to solve the current problem by analogy with the previous one. Mitchell's, Minton's, and Carbonell's methods all analyze the solution path in order to take maximum advantage of the available information. In the next section, analogical methods are discussedin more dePth. Open Problems in Heuristics Learning. Heuristics learning can be viewed as the general case of learning from examples, and many of the open problems in this area are closely related to those for concept learning. For instance' one can imagine complex domains for which no perfect rules exist to direct the search process. In such cases one might still be able to learn probabilistic rules that will lead search down the optimum path in most cases.This situation is closely related to the task of learning conceptsfrom noisy data. Similarly, one can imagine attempting to learn search heuristics with an incorrect or incomplete knowledge-representation language; some work in this direction has been pioneered by Lenat's RLL language and EURISKO system (25). There are many problem-solving domains in which some moves are better than others but for which no absolute goodor bad moves exist. As with Iearning from examples, most of the existing heuristics learning systems assumethat "all-or-none" rules exist for operator relevance conditions. Thus, even if one could modify the credit-assignment methods to deal with such continuous classifications, it is not clear how one would alter the methods of encoding relevance conditions for operators in these systems. The learning-apprentice systems are starting to work toward such complex realistic learning situations, but much of the research remains to be done. learning by AnalogY
the difference between that which solution accomplishes and that which the new problem requires. The processis guided typically by standard means-endsanalysis (qv) (16) in the spaceof solutions rather than in the spaceof possible world states (seeRef. 40 for a much more detailed exposition). 4. If the transformation proves impossible,perhaps due to an insurmountable difference between the new and old problems, select a new candidate analog problem or abandon the analogy process in favor of direct problem-solvittg methods. As illustrated in Figure 13, the transformational-analogy processexploits past experience,a processof great utility if the types of problems solved earlier are any portent of new problems likely to be encountered. Now, consider an example in which transformational analogy has proven useful. The problems illustrated in Figure L4 have been borrowed from the work of Anderson and Klein (18), but a new experiment was run with high school sophomores at the very start of their geometry course.Each student was given several problems to solve, the first always being the "RONY" problem in Figure 14: Given that the points R, O, N, and Y are colinear, and RO - ^l[Y, prove that RM : OY . Among the set of other problems given to each student (sometimes immediately following RONY, sometimes toward the end), was the analog problem with angles: Given that angle BAC is congruent to angle DAE, prove that angles BAD and CAE are also congruent. Analogical transfer required noticing the structural similarity, converting line segmentsto angles, and confirming that line-segment addition could be replaced by the (equally sound) angle addition in the proof structure. And what were the results? Student took I0-L2 min to solvethe RONY problem, but the 70Vowho noticed the analory required only 2-g min to solve the angle analog. The other 307otook once again L0-L2 min. In a later experiment students were given a variant of the angle problem (after solving the RONY problem) in which the large angles BAD and cAE were given as congruent and were required to prove that the outer angles BAC and DAE were congruent. Here, less than
A problem once solved can provide useful guidelines and sugg".tiott. for solving related problems. The previous section showed how search heuristics could be abstracted, but here past problem solutions are used directly to guide the construcliott of tt"* problem solutions. Analogical problem solving emulates the human ability to exploit past experienceor to follow the solution of a worked-out example problem to expedite problem solving in new but closely related situations (39)' Transformational Analogy. Consider first the most direct method of transferring information from past solutions to the new problem: the transformational-analogy method' When a new problem is encountered, the method requires the following: 1. Search episodic memory (qv) of past problem instances for orr" o" *o"" that match closely the current problem description. 2. Recall the solution associated with that problem description (or, ifmore than one, the set ofalternative solutions)' 3. Tlansform the recalled solution by an incremental process ofdirected perturbations on the recalled solution, reducing
Partial mapping
Previously solved problem
Solution to old problem
Solution to new problem
1 . M a t c h o l d p r o b l e ms i m i l a rt o n e w o n e ' 2 . R e c a l lf i n a l s o l u t i o nt o t h e o l d p r o b l e m ' 3. Transformrecalledsolutionto satisfy the constraintsof the new problem' *Use match to guide transformation' * l g n o r e t h e s o l u t i o np r o c e d u r e( d e r i v a t i o n ) ' Figure
13. The transformational-analogy
process.
LEARNING, MACHINE
477
tations of transformational analogy exhibit similar behavior, though one can trade speedof finding an analogy by breadth of search, enabling the system to find more remote analogies.
RO= O N = ON R O + O N = ON RN= OY
0.8
=-
Figure 14. Analogicalreasoningin geometry. 30Voof the students noticed the analogy, but those who did exhibited the same five-fold speedupin problem solving. However, when students were given both the RONY and the first angle problem to solve prior to the new angle problem, up to 80Voestablished the analogy. From such experiments one learns that analory makes human problem solving more effective but that the problems must be fairly closeto each other to have a reasonableexpectation that an analogy will be established.Computer implemen-
The Derivational-Analogy Method. In formulating plans and solving problehs, a considerable amount of intermediate information is produced in addition to the resultant plan or specific solution. For instance, formulation of subgoal structures, generation and subsequent rejection of alternatives, and accessto various knowledge structures all typically take place in the problem-solving process.But the solution transformation method outlined aboveignores such intermediate information, focusing only on the resultant sequenceof instantiated operators corresponding to external actions, and disregarding, among other things, the reasons for selecting those actions. The general idea of derivational analogy is depictedin Figure 15 and examined in greater detail in Ref. 38. The derivational processenables one to draw more distant analogies without violating essential aspectsof a problem, as the justifications for each step in the solution processare preserved, and only if the same justifications hold in the new situation is that step proposed as part of the derivational transfer. The extra bookkeeping work in maintaining the justification structure [essentially a small TMS (41)] is rewarded by the higher quality analogies the system is able to draw. A Case for Case-BasedReasoning.The vast majority of present-dayexpert systemsencodetheir knowledge as a large, amorphous set of domain-specific rules (42-46) (see Rulebased systems).The "knowledge-engineering" task is defined as one of extracting from the human expert the set of rules that comprisehis or her expertise in a particular, well-defined domain. The task is by no means easy. Quite the contrary, it can take years of laborious efforts by teams of domain experts and AI researchersin an iterative processof formulating, evaluating, reformulating, discarding, and refining a set of rules to
P a r t i a lm a p p i n g s Derivations
Previous problem2
I
I " Replaied" derivations I
Previous problem 1
I I
Solution to problem2
Solution to problem 1
o T h e d e r i v a t i o ni s m a p p e da n d r e p l a y e d n , ot just the solution. . M u l t i p l e p a s t p r o b l e m - s o l v i negp i s o d e sc a n b e i n t e g r a t e d . . Recallof previousproblemsentails partial match or (partialor exact) match of initial segmentof derivation. o Learnstrategies,not just generalizedplans. Figure
15. The derivational-analogy
process.
478
LEARNING,MACHINE
developthe knowledge base of a particular expert system. Observing this phenomenon, Edward Feigenbaum uttered his now famous proclamation "In the knowledge lies the power." How right he was! Fortunately, however, the tacit assumption that domain knowledge must necessarily be represented as Iarge sets of context-independentrules is proving to be only an early engineering decision, and a very limiting one at that. The knowledge must be captured, but the question remains as to the best means of acquiring and representing it in a computationally effective manner. Human experts are incredibly poor at producing general deductive rules that accountfor their behavior. When forcedto do so by insistent knowledge engineers, they try hard and produce faulty rules. When later faced with a problem in which the rule fails, the typical response is "Well, I didn't think of that situation, but perhaps I can fix the rule . . or ." This ad-hoc iterative process,slow and add a new one . frustratingly inefficient as it may be, usually convergeson an acceptable knowledge base. However, a much more efficient and humane approach is to let the experts do what they do best: solve problems in their domain of expertise. The only added burden is a reporting requirement. Each problem-solving step, including referencesto static domain knowledge (qv) or to heuristics of the domain, must be reported explicitly, along with the reason why such knowledge was used. This process provides external derivational traces that a derivational-analogy inference engine can use to solve similar future problems in an effective manner. Although the derivational method was originally conceived as a means to reason and learn from the system's oWnpast experience,it works equally weII as a means to reason and learn from the experienceof a more knowledgeableexternal source,such as a human expert or a worked-out problem example in a textbook. Case-basedreasoning is particularly prevalent in law -at least in the British and American systemsof jurisprudenceand in medical diagnosis and treatment. The idea of casebasedreasoning in expert systemsis not new. Schank (47), fot instance, advocatesthis method as superior and closer to human reasoning than present expert systems. Doyle (48) proposesthe notion of emulating the human master-apprentice processas a means whereby the latter (human or computer) can acquire expertise by replicating the reasoning processes for the former. The derivational-analogy processis an effective computational mechanism for providing expert systems with the ability to reason from cases,whether the casesbe past experience or externally acquired knowledge. Thus, as case knowledge expands,so doesthe ability to solve more and more problems in the chosendomain of expertise. However, human experts can solve problems progressively more quickly and effectively with repeated experience. Whereas case-basedreasoning may reflect accurately a crucial intermediate stage in the learning processand may account for problem-solving behavior in infrequently recurring situations, some knowledge is gfadually compiled into more general processesabstracted from the concrete cases.That is to s&y, for the most routine, recurring probleffis, the derivational-analogy processshould produce general plans that can be instantiated directty. The following section explores learning techniques combining analogical transfer and the learning-from-examples method. Acquiring Generalized Plans.Integrating example-based Iearning with analory provides the ability to acquire general
plans (seePlanning) from instance solutions. This processrequires that solutions derived from a commonanalogical parent form a set of positive exemplars, and unrelated or failed solutions form a set of negative exemplars. These sets are given to a general inductive engine (17) or preferably to an incremental one such as Mitchell's version-spacemethod (26,49)which abstracts a generalized plan from the recurring common aspects of these solutions. Later, the generalized plan can be instantiated directly-or refined further if more instance solutions are derived. Figure 16 summarizesthis process,which is discussedat greater length in Ref. 40. For instance, if the initial problem was to plan an automobile trip from Boston to Los Angeles and analogical variants include auto travel between other remote cities in North America, the generalized plan retains all the common characteristics (such as requiring maps, money, time, a working cat) etc.) and abstracts away the varying ones (such as the source and destination cities, the compass direction of travel, the phase of the moon, etc.). The cluster of derivational (or transformational) children provides the positive exemplars, but where do negative exemplars come from? One source of negative exemplars is members of other clusters, i.e., problemsthat were effectively solved by different means. A more useful source of near-miss negative exemplars are failed analogies, i.e., analogiesproposedbut not carried through to a solution. For instance, in the automobile-travel example, support the problem of traveling from Boston to London arose. This matches fairly closely previous members of the cluster, but on attempting to find a route between the cities, one quickly discovers that the Atlantic Ocean is an insurmountable barrier. Thus, the general plan is constrained to apply only to cities in the same land mass-in much the same way that Winston exploited near-miss examples to infer crucial discriminant
Clusterof with solutions a common derivational ancestor
Generalization
6(Ti, f)
ATi, TiTn suchthat ?; € C<+Ti T i e C*Tp
e C and e C
M e m b e r s o fa c l u s t e r : + i n s t a n c e s instances Membersof other clusters : instances) (or failed analogiesserveas t o a n i n d u c t i o ne n g i n e . Figure
16. Generalizing
plans from analogically
related solutions.
MACHINE LEARNING, properties in inducing the physical description of an arch (15,50) LanguageAcquisition
:
A fourth major area of machine-learning research has dealt with the acquisition of langu age.The overall task of language acquisition (qv) is very complex and involves many levels, including learning to recognizeand generatewords, learning the meaning of words, learning grammatical knowledge, and learning pragmatic knowledge. Each of these subproblemsis interesting in its own right, but since the majority of AI work on language acquisition has dealt with grammar learning, we focus on that issue here. Other reviews of computational approaches to language learning can be found in McMaster, Sampson,and King (51), Anderson (52), Pinker (53), Langley (54), and Hill (see Language acquisition). LearningGrammarsfrom SampleSentences.Someof the earliest work in machine learning addressedthe problem of grammar acquisition, and this is still an active area of research in the field. The basic task is simply stated: Given an initial set of grammatical sentencesfrom some langu egQ,find a procedure for recognizing all other grammatical sentencesin that language.The induced grammar may take many forms, including rewrite rules, &r augmented-transition network (qv), or a production system (see Natural-langu age processing for an extensive discussion of grammar formalism and parsing strategies). Note that one is given only legal sentencesfrom the Ianguage to be learned, and that no "negative instances" are provided. Solomonoff (55), Knowlton (56), Garvin (57), and Horning (58) carried out early researchon this problem. Wolff and Berwick have described more recent work in this tradition, and we consider their results below. Wolff (59) described SNPR, a proglam that acquires g3ammatical knowledge in a very data-driven manner. The system begins with a sequenceof letters and generatesa phrase-structure grammar (qv) (stated as rewrite rules) that summarizes the observedsequence.SNPR is not provided with any punctuation or with any pausesbetween words or sentences;it must determine these boundaries on its own. The SNPR system carries out a hill-climbing search through the spaceof possiblegrammars using two operatorsone for forming disjunctive classessuch as noun and another for defining chunks or conjunctive structures, such as dog. General grammatical categories such as verb and noun are defined extensionally as sets of simpler categories,whereas conjunctive categories such as dog or run are defined by the union of features that discriminate them from other categories. SNPR also included operators for generalization (by discarding some data) and recursion, but that is not described here. The system employs a numeric evaluation function to determine which of its operators should be applied in a given situation. This function measurestwo features of the grammar that would result-the compressioncapacity, or the degreeto which a given grammar compressesthe original data, and the size of the grammar. At each point in its learning process SNPR selectsthat step which gives the greatest improvement in compressioncapacity per unit increase in size of grammar. Thus, this evaluation function directs the system's search through the spaceof possible phrase-structure grammars. One of the interesting aspectsof Wolff's system is the manner in which its two operators interact. The system begins by
479
forming chunks for pairs of symbols such as th and ch. Whenever a chunk is created, the component symbols are replaced by the symbol for that chunk. The processcan then be applied recursively to generate hierarchical chunks; this leads to chunks for words such as the, eat, and chased. However' chunks can also be used to form disjunctive classes such as noun and verb. When this occurs, SNPR substitutes the symbol for this new class for all occurrencesof its members;thus, dog and cat would be replaced with the noun symbol. At this point something quite interesting can occur: the system can form chunks in terms of these disjunctive classes,generating terms such as prepositional-phrase (PP) and noun-phrase (NP). Thus, the system begins with a representation involving individual letters and gradually bootstraps itself into a grammar-based representation. Berwick (60,61) has described LPARSIFAL, a grammaracquisition systern that differs substantially from Wolffs SNPR. LPARSIFAL represents its grammatical knowledge as a set of rules, but one quite different from SNPR's rewrite rules. The program inputs a sequenceof legal English sentences,but these sentencesdiffer from Wolffs in that each one consists of separate words, and the sentencesthemselves are separated from each other. No meanings are associatedwith either words or sentences. LPARSIFAL is based on Marcus's (62) wait-and-see approach to syntactic parsing. In this framework grammatical expertise is stored as condition-action rules that match against two data structures-an input buffer and a stack of partially constructed parse trees. The system contains a number of operators, such as creating a node and pushing it onto the stack, movin g a node from the stack to the buffer, and attaching an item in the buffer onto the stack. Theseoperators are applied to an input sentence in sequenceuntil that sentence has been completely parsed. Berwick's system operator begins with a knowledge of Chomsky's X-bar theory (63) and an interpreter for applying grammar rules to parse sentences.When given a new sentence, LPARSIFAL attempts to parse it using its existing rules. If it reachesan impasse,the system attempts to create a new rule that will handle the problem-causingsituation. The conditions of the new rule are based on the state of the parse when the problem was encountered, including the top of the stack and the contents of the input buffer. Upon adding the new rule to memory, the system checks to see if any existing rules have identical actions. If a rule with the same action is found (an oversimplification, in fact, the rules must also have the same X-bar context, but the details of this context are not discussedhere), LPARSIFAL comparesthe two condition sidesto determine what they hold in common. The resulting mapping is used to construct a more general rule with the same action that replaces the two previous rules. Differing conditions are dropped from the resulting general rule or, in some cases,lead to the creation of syntactic classeslike nouns and verbs. Thus, LPARSIFAL's method for combining rules is similar to the specific-to-general method consideredin the context of learning from examples. However, since Berwick employs a simple attribute-value representation, he does not have to worry about search through the spaceof rules. As a result of this simplifying assumption and of ignoring semantic distinctions, LPARSIFAL needs no negative instance to eliminate competing hypotheses. Upon reflection, LPARSIFAL's approach to grammar ac-
LEARNING, MACHINE
quisition is reminiscent of another class of learning problems-the task of heuristics learning. One can view Berwick's system as beginning with a set of operators for parsing sentences, along with legal conditions stated in terms of X-bar theory. However, in order to parse sample sentences,the system must search. When the goal state (an empty input buffer) is achieved,LPARSIFAL assignscredit to eachmove along the solution path, and createsspecificheuristics for each situation. Berwick has transformed the grammar-learning task into the task of learning search heuristics, a counterintuitive (but apparently useful) approach (he reported the first version of LPARSIFAL in 1979 when very few results had been achieved in heuristics learning). In fact, Ohlsson (31) has describeda very similar method for heuristics learning, which he has applied to puzzle solving.
The
big
dog
chased
the
red
ball
Figure 17. A sample parse tree.
ball)), in which parentheses indicate the level of the tree. However, any given semantic network can be translated into a number of such trees, and LAS used its knowledge of the senPairs. Although tence's main topic, the graph-deformation condition, and conLearningGrammarsfrom Sentence-Meaning the grammar-learning task described above has many inter- cept-word links to determine a preferentially unique tree. esting aspects,it differs from human language acquisition in Given the parse tree for a sentence,it is a simple matter to an important respect.Rather than simply learning grammars generate a fragment of an ATN that will parse that sentence. for parsing sentences,the human learner acquires grammars For instance, given the parse tree in Figure 17, LAS would for mapping sentences onto their meanings. Moreover, the transform this structure directly into the (initial) ATN shown chitd language data suggeststhat the human learner doesnot in Figure 18. Since the parse tree has three branchesat the top hear sentences in isolation; the sentences usually describe level, LAS would generate a top-level ATN with three linkssome event or object in the immediate environment. This ob- one for the first structure (The (big) dog), one for the second servation leads to a different formulation of the grammar- structure chased, and one for the third, (the (red) ball). Since Iearning task: Given a set of grammatical sentencesfrom some the first and third components themselves contain internal language, along with the meaning for each sentence,find some structure, LAS would build a sub-ATN for both of these, each procedure for mapping sentencesonto their meanings or vice with three links, and so forth until the terminal nodes are versa. reached. This view of grammar acquisition differs significantly from After it has constructed an initial ATN, LAS attempts to the first one examined. Grammatical knowledge must contain incorporate new parse trees with as little modification as posmore than information about sentencestructure; it must also sible. For instance, given the new sentence "The small eat relate this structure to meaning. This alternative view of chased the long string," the system would note that its ATN grammar learning leads to quite different models of the learn- would parse this quite well if only certain classeswere exing process.Kelley (64), Siklossy (65), and Klein and Kuppin panded. In this casethe class ADJ1 - {big} must be extended (66) carried out the earliest work in this "semantic" tradition. to ADJ1 -- {big, small}, the class NOUN1 : {dog} must be More recent systems have been described by Hedrick (67), extended to NOUN1 - {dog , cat}, and so forth. In addition to Reeker (68), Anderson (52), Carbonell (69), Selfridge (70), expanding word classed, LAS employs two other learning Sembugamoorthy (71) and Langley (54). mechanisms. First, when the system finds two word classes Anderson developed LAS (.52),a program that learns to that share a significant number of elements, it combinesthem understand and generate sentences in both English and into a single class. Second,if LAS finds two sub-ATNs to be French. LAS represents gTammatical knowledge as an augmented-transition network (ATN), with both semantic and syntactic information stored on each link. The system accepts legal sentencesand their associatedmeanings as input, with meaning represented in terms of a semantic network. In addition, LAS is provided with the main topic of each sentenceas well as the words associatedwith various concepts.Finally, the program makes two assumptions about the nature of grammar: that some concepts (Iike shapes)play the role of nouns and the graph-definition condition, which roughly states that if two words occur near each other in a sentence, the conceptsassociatedwith those words must occur near each other in the meaning of that sentence. These sourcesof information are sufficient to enable LAS to determine a unique parse tree for any given sentence-meaning pair. For instance, supposethe system is given the sentence "The big dog chased the red ball" and its associated N2 : {ball} Nl - {dog} V - {chased} meaning. One can represent this meaning in terms of a semanART2 - {the} ART1 - {the} tic network (qv), and one can transform this network into a A2 - {red} AI : tbig) parse tree like that shown in Figure 17. This can also be repreFigure 18. An initial ATN based on a single sentence. sentedas the list structure ((The (big) dog) chased (the (red)
LEARNING,MACHINE
sufficiently similar, it combines them into a single subnetwork. A special caseof this processactually leads to recursive networks for parsing noun phrases.These steps occasionally lead the system to learn overly general ATNs, and LAS has no way to recover from these errors. Consider Langley's AMBER (54), a cogrlitive simulation of the early stages of chitd grammar acquisition. Like LAS, this system acceptssentence-meaning pairs as input, using a semantic network to represent meaning. AMBER also shares LAS's requirement that the meanings of content words (such as ball and bounce) be known and that the main topic of each sentencebe available. However, AMBER differs from Anderson's system by representing gTammatical knowledge as production rules that generate sentences from meaning structures. Although Langley's system does not assume the graph-deformation condition, an analogous constraint arises from the system's strategy for generating sentences. AMBER uses its knowledge of the main sentential topic to transform its semantic network into a tree structure and then proceedsto generate an utterance to describethe structure. In doing so, it employs the notion of goals and subgoals.The toplevel goal is to describe the entire tree, but to achieve this goal, the system creates subgoals to describe nodes lower in the tree. At the outset AMBER can handle only one subgoalat a time, leading the system to generate one-word"sentences." Much of the system's learning consistsof acquiring rules that let it deal with multiple subgoals and then identifying the relative order in which those subgoals should be achieved. However, AMBER never returns to a goal once it has been deactivated;the system may thus omit words (as do children), but this means it will never generate sentencesthat violate the graph-deformation condition. The AMBER system begins with the ability to say one content word at a time. Based on differencesbetween these utterancesand the sample sentencesit is given, the system generates new rules that let it generate combinations of content words in the correct order. AMBER also usesa discrimination processto determine the conditions under which it should produce function words like is and the suffix itg; this processis closely related to the general-to-specificmethod for learning from examples that we describedin an earlier section.In both casesthe acquired rules must be relearned a number of times before gaining enough "strength" to take over control from the default rules. Taken together, these mechanismsreplicate a number of child language phenomena,including word omissions, the gradual disappearanceof such omissions,and the order in which function words are mastered.
481
information. This may seem odd, since these models are only given legal sentencesas examples.However, AMBER and its relatives do not learn rules directly at the sentencelevel but focus instead on the parts of sentences.Moreover, these systems are learning to map sentencesonto their meanings (and vice versa), and this lets them make predictions that may prove incorrect. Berwick's LPARSIFAL (60,61)could also have generatednegative instancesby noting which actions failed to allow a successfulparse. However, the system's search was already constrained enough that it did not use the additional information. To clarify the point, consider an example in which AMBER predicts that ing should occur after the word bounce. If this doesnot occur in the adult utterance, the system can label this situation as a negative instance and use it to direct the learning processand thereby acquire appropriately constrainedsuffix-attachment rules. Positive instances can be generated in an analogousfashion basedon successfulpredictions.AII this does not mean that children receive negative instances from discrepanciesarising between predicted and actual sentences paired with their meanitrgs, and one can use such negative instances to constrain the processof grammar acquisition. Learningby Discovery Most of the methods we have examined involve some form of an external tutor or internal problem-solving traces that provide the information necessaryfor learning. However, humans encounter many situations in which they must discover reguIarities in their environment through observation and experimentation. This is the task confronting the scientist trying to discovernew facts and formulate new theories; this analogy is used in the discussionof machine discovery.Of course,scientific discovery is a complex process,involving activities ranging from the design of experiments and the design, construction, and the use of measuring instruments to the generation and testing of explanatory theories. Here, we limit our treatment to two of the discovery problems that have received recent attention within the machine-learning community-the formation of classificatory taxonomies and the discovery of empirical laws describing regularities in observeddata.
TaxonomyFormationand ConceptualClustering.Before the scientist can discover empirical laws and formulate theories, he must first decide on some classification schemefor the objects under study (see Clustering). For example, chemists made tittle progress until they could distinguish between different elements, such as gold and lead. Later progressoccurred NegativeInstancesin Grammar Learning.Consider the role after substancespartitioned these into classessuch as metals, of negative instances in grammar learning. Many learning inert gases,and acids. Similarly, theories of evolution rested methods rely on negative instances to direct their search on taxonomies formulated by early biologists such as Linthrough the spaceof hypotheses.For instance, specific-to-gen- naeus. Figure 19 illustrates the task of taxonomy formation using eral condition-finding methods employ such instancesto eliminate overly general hypotheses.Simil arly, general-to-specific two-bodiedcells describedearlier. Given the 13 cells shown in condition-finding methods use negative instances to deter- the figure, one must generate some taxonomic hierarchy that mine how overly general descriptions should be made more groups these cells into classes,subclasses,etc. There are many specific.Negative instances are heavily used in learning from ways to organize these data, but some may be preferred to examples,where they are provided by a tutor and in problem others. The figure shows one such partitioning, marking two solving and, where they arise from failed attempts to apply an major classes with solid rectangles and marking four subclasseswith dotted rectangles. operator or to transfer a solution by analogy. The earliest work on automated taxonomy formation was Of the grammar-learning systemsdiscussed,only Langley's AMBER actually employs negative instances, but Reeker's not carried out by AI researchers but rather by statisticians PST (68) and Anderson'sALAS g2) have also usedthis type of and biologists who developedthe methods of cluster analysis
482
LEARNINC,MACHINE
Figure 19. The problemof taxonomyformation. (seeClustering) and numerical taxonomy. The algorithms input some set of objects and their associateddescriptions and generated a hierarchical classification tree that summarizes the data. Typically, these methods use attribute-value representations for objects,viewing these as points in an n-dimensional space.The similarity between two objects or two clusters is measured by their distance in this space, and these methods attempt to find the taxonomic schemethat maximizes intracluster similarity and minimizes intercluster similarity. Table 6 presents one version of the numerical taxonomy approach. However, many variations exist with different measures of distance,and these often lead to different partitions of the same data. The work on numerical taxonomy formation is quite interesting from an AI perspectivesince it clearly takes a heuristic approach,one that dependson the distance metric and on selecting the relevant dimensions along which to classify the input instances.However, Michalski and Stepp (73) have argued that numerical taxonomy methods suffer from two limitations. First, these methods generate only extensional definitions of categories, and one would like much more concise intensional definitions with predictive power. Second, the methods use only the objects themselves in evaluating alternative clusters, and one would like to use the intensional defi-
nitions of objects clustered thus far as part of the evaluation criterion (e.9.,preferring simpler to more complexdescriptions and preferring greater predictive power to overly specific concept descriptions). In response,Michalski and Stepp (73) have formulated the related task of conceptual clustering. In this task one is still presented with a set of objects and their associateddescriptions, and one must still generate a hierarchy containing clusters of objects. However, one must also generate intensional descriptionsfor those clusters, and competing clusters must be evaluated according to the quality of their associateddescriptions. They have argued that the resulting clusters should be more conceptually coherent than those generated by the simpler traditional methods. Consider the taxonomic hierarchy presented in Figure 20, which summarizes the clusters given in Figure 19. However, the new taxonomy goesbeyond the simple extensional definitions we had before. In addition, it provides an intensional description for each concept in terms of the defining features for that class. (This structure doesnot represent a searchtree through the space of concept descriptions; it represents the output of a conceptual clustering system.) Thus, one major category covers cells with one nucleus in each body, and the other coverscells with two nuclei in one body and one nucleus in the other. The subclassesof the first category include additional conditions about the body colors, and the subclassesof the second category refer to the number of tails. Note that, using these conceptdefinitions, predictions can be made about other cells that have different groupings of the same features that have not yet been observed. Although the conceptual-clustering task bears some similarity to the problem of learning from examples, there are three important differences. First, in conceptual clustering there is no tutor to place objectsinto classes;the learner must solve this clustering problem on his own. Second,the resulting 'conjunctive' clustertaxonomy involves disjunctive classes;a ing task would be one in which only a single object was observed and would not be very interesting. Third, conceptual clustering systems must form conceptsat multiple levels of
Table 6. A Numerical Taxonomy Method 1. Find the two closestobjectsand create a cluster that contains them as members. 2. Replacethe clustered objectswith the new cluster, treating it as a new object whose coordinatesare the weighted arithmetic average of its members' coordinates. 3. If all objectsare coveredby a single cluster, then halt; otherwise, go to step 1.
observed 4 predicted 0
observed 3 predicted I
observed 3 predicted I
Figure 20. A taxonomic hierarchy.
observed 3 predicted 0
LEARNING, MACHINE
description; in addition to describing each concept,the learner must impose somehierarchical organi zattonon these concepts. Methods for ConceptualClustering.There are a variety of approachesto the conceptual-clustering (qv) problem, though only one of them is reviewed in detail here. Fisher's RUMMAGE system (74) is selectedsince its basic method is easy to communicate. Table 7 provides an English paraphrase of the clustering mechanism. RUMMAGE assumesthat objects are described in terms of attribute-value pairs, and it uses this knowledge to form potential clusters. The system constructs its taxonomy in a top-down fashion, at each point selectingone attribute to divide objects into clusters using the general-tospecific learning-from-examples method to generate an intensional description for each cluster. Each candidate clustering is evaluated on two measures:maximal simplicity and minimal overlap, roughly analogous to intracluster and intercluster measuresfrom numerical taxonomy. The attribute producing the simplest intensional descriptions with minimal overlap among clusters is selected,and its values are used to create the initial branches in the taxonomy. Objectsare sorted down these branches depending on their values, and the process is applied recursively to generate lower level clusters. This processcontinues until the quality of the cluster descriptions falls below discrimination threshold. In many ways the RUMMAGE algorithm is similar to that used by ID3 to construct decision trees from examples. The main difference lies in the evaluation usedby the two systems. ID3's evaluation function requires instancesto be groupedinto positive and negative classes,whereas RUMMAGE generates descriptions of each class and evaluates these instead. However, both systems construct trees in a top-down fashion, and both avoid significant search by selecting the "best" attribute at each level. Neither is guaranteed to find the optimal tree, but both are efficient compared to other more comprehensive learning systems. RUMMAGE differs significantly from Michalski and Stepp's CLUSTERI2 (73), one of the earliest conceptual-clustering systems.Fisher's program uses its knowledge of attributes and their values to generate potential clustering. This model-driven approach is efficient but limits RUMMAGE to forming monothetic classification schemesin which only one attribute is used to index each category. Thus, the system could not produce the taxonomy in Figure 20 since two features are introduced at each level. In contrast, CLUSTERI? uses an iterative method (similar to hill-climbing) to search the space of possible clusterings, using concept descriptions (generated by a learning from examples technique) to direct
Table 7. RUMMAGE's Approach to ConceptualClustering 1.C 2. For each attribute sort objects according to the values of that attribute. 3. For each value of an attribute generate a concept description for objects with that value. 4. Select that attribute with the "best" descriptions (the simplest and least similar). 5. Create branches from the cument node for each value of this attribute, and sort objects down these branches to the new nodes. 6. Apply the method to the resulting subsets of objects, recursively selecting attributes and constructing subtrees until their quality falls below threshold.
483
the searchprocess.This approachis much more expensive,but Michalski and Stepp's system can generate polythetic hierarchies, in which conjunctions of features define each class. Thus, CLUSTERI? could generate the taxonomy shown in Figure 20. As with other learning tasks, methods for conceptualclustering can vary along a number of dimensions. For instance, although both RUMMAGE and CLUSTERI? construct taxonmies in a top-down fashion, one can imagine systems that create conceptual hierarchies from the bottom up. In fact, the numerical taxonomy method examined earlier operatesin exactly this fashion. Simil arly, methods can vary in their approach to forming clusters; we have already seen that RUMMAGE uses a model-driven method based on knowledge of attributes, whereas CLUSTER/2 uses an iterative, successive approximation approach.In contrast, the numerical taxonomy method uses a best-first search (qv) schemeusing intercluster distancesas its evaluation function. The spaceof clustering methodsis a large one; seeFisher and Langley (74) for a fuller treatment of the possibilities. Although most work on conceptualclustering has assumed that all dat a are present at the outset, one can also imagine systems that operate in an incremental fashion. In fact, Lebowitz (75) has describedUNIMEM, 8r incremental system for generating conceptual hierarchies that constructs trees in a top-down fashion but in addition retains the ability to reorganize its taxonomy as it observesnew objects.Fisher (74) has describedCOBWEB, another incremental system that uses a probabilistic representation for concepts.Most learning systems assumethat conceptsmust be describedby a set of necessary and sufficient conditions, but COBWEB instead storesthe probability that a given feature will occur for an instance of a concept. The system uses these probabilities to direct its search through the spaceof conceptual hierarchies. Incremental, probabilistic methods for concept learning probably have advantages over traditional approaches,and they will draw more attention in the future. In addition, most existing methods are limited to attribute-value representations, and the field needs to be explored for extensions that will handle more complex relational descriptions. Stepp (76) has describedan extension to CLUSTER/2 that addressesthis issue. A new direction for research lies in using functional knowledgeto direct the searchfor taxonomies.Nelsan (77)has argued that children's very early conceptsare often functional in nature. For example, a ball is something that one can bounce, and a chair is something that one can sit on. Only later, Nelson claims, are structural features added to these concepts.This suggeststhat a child's goals play an important role in the way he or she organizeshis view of the world. This suggestsa potential connection to explanation-basedmethods for learning from examples, which transform functional concept definitions into structural ones. One might be able to apply these methods to the conceptual-clusteringtask, yielding systems with a quite different flavor than have been explored to date. DiscoveringQualitative Laws. Scientific discovery involves the formulation of qualitative laws, often followed by more precise quantitative laws. For instance, early chemists found that certain classesof substances(such as acids and alkalis) reacted with each other, whereas other substancesdid not. Thesequalitative relations precededthe discoveryof quantitative regularities, forming the framework within which the lat-
484
IEARNING,MACHINE
ter were stated. Relatively little work has been done on qualitative discovery within the folds of machine learning, but two systems are considered here-Lenat's AM (25,78) and Langley, Zytkow, and Simon's GLAUBER (79). AM operatesin the domain of number theory, starting with about a hundred initial conceptssuch as set membership, cardinality, set union, etc. The system is also provided with several hundred heuristics for proposingnew conceptsand conjectures, for gathering data, and for deciding which conceptsare "interesting." For example, one heuristic marks conceptswith only a few examples (but more than a singleton set) as interesting. If examples of a conceptare too hard to find, AM proposes more general versions of that concept; if they are too easy to find, it proposes more specific versions. Similarly, equivalent conceptsthat are discoveredby different paths are marked as interesting and thus are given preference as the building blocks for yet newer concepts.Lenat's system carries out an agenda-driven best-first search through the space of mathematical conceptsand conjectures,directing this search with its measure of interestingness. When AM was provided with the basic objects and operations of number theory, it rediscovereda number of familiar concepts,including integers, addition, multiplication, factors, and prime numbers. In addition, it conjecturesthat any integer can be expressedas a unique product of primes (the unique factori zation theorem) and that any even integer can be represented as a sum of two primes (Goldbach'sconjecture).These conjecturesare qualitative laws that relate conceptsgenerated by the system. Unlike Lenat's AM, the GLAUBER system(80)beginswith very little knowledge of its domain. This program inputs a set of facts, such as the tastes of chemical substancesand the reactions in which they take part. From these data the system generates classesof substances(such as acids, alkalis, and salts) and qualitative laws that relate these classesto each other. These laws may contain universal or existential quantifiers and may be combined to express more complex qualitative laws. GLAUBER carries out a best-first search through the space of classes based on commonly recurring relations. For instance, if the substanceHCI reacts both with NaOH and with KOH, the system would consider defining a new class of substances(say alkalis) with NaOH and KOH as members.Upon doing so, it would also formulate one or more qualitative laws based on facts that contain those substances(such as "HCl reacts with alkalis"). This processis applied recursively to form other classes(say acids) and more abstract laws (such as "acids react with alkalis"). The resulting laws make predictions, and if enough of these predictions are observed and never contradicted, GLAUBER includes a universal quantifier on the classes(e.g.,"for all acidsand for all alkalis, acidsreact with alkalis"). Although they differ along many dimensions,both AM and GLAUBER carry out search through a spaceof conceptsand qualitative laws, and both use heuristics to direct that search. In somesenseboth systemsare also forming taxonomies,since they cluster objects and generate laws that "define" those classes.However, they differ from conceptual clustering systems in that these "definitions" describe relations between classes,rather than describing each class in isolation. In this sense they move beyond simple clustering into the realm of qualitative discovery.
DiscoveringQuantitativeLaw. Another important aspectof discovery involves the postulations of quantitative laws that summarrze numeric data. Again, relatively little work has been done in automating quantitative discovery processesin AI. BACON.4 (81) is perhaps the best known example.Given the values of symbolic and numeric variables (e.g.,the pressure, volume, and temperature of a gas), the system formulates empirical laws that relate the numeric variables (e.g., PV/T : 8.32).BACON.4 has rediscoverednumerouslaws from the history of physics and chemistty, including the ideal gas law, Coulomb'slaw, Snell's law of refraction, Black's heat law, the law of constant proportions, and conservation of momentum. In discovering these laws, the system also postulated a number of intrinsic properties, such as mass, index of refraction, specificheat, and atomic weight. BACON's discovery method consists of a number of interacting techniques. The system begins by gathering data in a systematic fashion, varying one independent term at a time and examining the values of dependentvariables. After gathering a set of values, BACON looks for monotonic relations between terms, uses these to define new terms, and recurses until it finds terms with constant values. After finding laws that hold in a given context, the system varies another independent term, using the constants found at the previous level as dependent terms at this higher level of description. This processcontinues until all terms have been incorporated into somelaw. In caseswhere BACON encountersnominal (symbolic)independent terms, it postulates intrinsic properties based on the values of some dependentterm and looks for a law involving the new property. The first law found in this manner is tautological, but the same intrinsic values are carried over to other situations, leading to empirically meaningful relations. Each intrinsic property has an associated set of conditions under which its values are retrieved. In caseswhere generalizing these retrieval conditions is not justified by the data, the system may still note common divisors among the inferred intrinsic values. This method provesquite useful in chemistty, where common divisors historically suggested a number of concepts,including atomic weight. BACON.4's method for finding constant terms is sufficiently simple that it can be describedhere by three straightforward heuristics: 1. If term X has near-constant values, formulate a law involving X. 2. Else, if X increasesas Y increases,considerthe ratio XIY and go to step 1. 3. Else, if X increasesas Y decreases,considerthe productXY and go to step 1. Table 8 presents a simple example of BACON's application of this method in discovering Kepler's third law of planetary motion. This law can be stated as .Ds - kP2, where D is the distance of a body from its prim ary and P is the period of that body. The table presents Borelli's original data for Jupiter's satellites, which contain a substantial amount of variation. BACON.4 begins by noting that D and P increase together, leading it to consider the ratio D lP . This term is not constant, but its values decreaseas those of D increase;this leads BACON to definethe productDzlP. Again, the values of this term
LEARNING,MACHINE Table 8. Discovering Kepler's Third Law of Planetary Motion Moon
A B C D
Distance D 5.67 8.67 14.00 24.67
Period P L.769 3.571 7.155 16.689
DIP
3.203 2.427 1.957 t.478
DzlP
18.153 21.036 27.395 36.459
D3lP2
58.15 51.06 53.61 53.89
are not constant, but its values increase as those of DIP decrease.As a result, the program considersthe term D3lP2.The values of this term are constant (within the acceptablerange of 7.1Vo),so BACON formulates a law to this effect. The same method can be used to discover a variety of numeric laws. More recent empirical discovery systems move beyond BACON's abilities by formulating conditional laws that hold in different situations. In particular, Falkenheiner and Michalski's ABACUS (82) combines BACON-like methods for numerical discovery with condition-finding methodslike those used in Michalski's AQ1l system (83). The ABACUS system first finds laws that cover some subset of the data and then searchesfor a symbolic description that describesthe conditions under which the law holds. The program repeats this processuntil it has covered as much of the data as possible. They have also explored new methods for effectively searching the spaceof numeric laws. Unlike BACON, the ABACUS system can find complex laws solely on the basis of observational data.
485
Designing and implementing AI systems for explanatory discovery will not be easy, but many of the building blocks are present, and this seemsa promising direction for research. ConcludingRemarks This entry is an examination of a range of techniques studied by researchersin machine learning-learning from examples, learning search strategies, language acquisition, and machine discovery-that lay the foundation for symbolic approachesto machine learning. A number of common themes emerge from this examination. Much of learning can be viewed as search through a space of concept descriptions, and we considered various methods for directing search through this space. Learning from examples can be viewed as a simpler version of the more complex tasks of learning search heuristics and conceptual clustering, in that credit assignment is simplified and direct feedbackis present. For each method that we examined, open issues remain to be explored, including the ubiquitous need for employing functional or causal information to direct the learning process. Despite its recent emergence,machine learning has developed a variety of well-defined problems that promise to keep researchers occupied for years to come. One major goal for research involves developing integrated architectures for problem solving and learning that can addressmany different learning tasks. Anderson's ACT system (89) falls into this class of architectures, and the SOAR theory of Laird, Rosenblooffi,Newell (4) is another recent example. However, machine-learning researchers need to explore the spaceof such architectures, just as earlier researchers have explored the space of methods for learning from examples and heuristics learning. In this way, those in the field may ultimately cometo understand the nature of learning and the role it plays in intelligent behavior.
Descriptionand Explanation.Discovery is a complex phenomenon,and it certainly includes more than the mechanisms of taxonomy formation and empirical discovery discussed above. In particular, it includes the process of constructing explanations (qr) that account for empirical laws. These include both structural explanations, such as the atomic theory, and processexplanations, such as the kinetic theory of gases. Becauseof the complexity of the problem, few machine-learning researchers have addressed these issues, though some BIBLIOGRAPHY work has been done on structural models (84,85).Of course' 1. J. Holland, Adaptation in Natural and Artificial Systems,Univerthere are different levels of explanation, and one can argue sity of Michigan Press, 1975. have BACON found by the type laws of that even numeric 2. J. H. Holland, EscapingBrittleness: The Possibilitiesof Generalsome explanatory aspects.However, it seemsthat something PurposeLearning Algorithms Applied to Parallel Rule-BasedSysmore is involved. tems, in R. S. Michalski, J. G. Carbonell,and T. M. Mitchell (eds.), Recent research suggestssome directions in which to look Machine Learning, An Artificial Intelligence Approach, Vol. II, for a theory of explanatory discovery. One of these is the area Morgan Kaufmann, Los Altos, CA, pp. 593-624, 1986. of explanation-basedlearnirg, which is describedin an earlier 3. G. E. Hinton, T. J. Sejnowski,and D. H. Ackley, Boltzmann Masection.This work providesa clean definition of what is meant chines: Constraint Satisfaction Networks that Learn, Technical by "explanation," and this definition may prove useful in modReport CMU-CS-84-119,Carnegie-MellonUniversity, Computer eling explanatory discovery. However, explanation-based ScienceDepartment, 1984. learning involves the transformation of a functional definition 4. J. E. Larid, P. S. Rosenbloom,and A. Newell, "Chunking in into a structural one. In scienceone must perform the inverse SOAR: The anatomy of a general learning mechanism," Mach. mapping, i.e., infer the explanation from its observedexternal Learn. L, LL-46 (1986). manifestations. Thus, one must decide that gasesare similar 5. R. E. Fikes and N. J. Nilsson, "STRIPS: A new approach to the to billiard balls, and that the heat of a gas alters the velocity of application of theorem proving to problem solving," Artif. Intell.2, those balls. Before one can construct systems that will infer 189-208 (1971). such processmodels,one must be able to represent the process 6. D. J. Mostow, Transforming Declarative Advice into Effective models themselves. Fortunately, recent work on qualitative Procedures:A Heuristic SearchExample, in R. S. Michalski, J. G. physical models (86,87) provides a framework for such an efCarbonell, and T. M. Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach, Tioga, Palo Alto, CA, pp. 367-404, fort. Moreover, research on reasoning by analogy (40,4I,88) 1983. might be extended into the development of methods for mapping macroscopicphysical models (bouncing billiard balls) 7. D. H. Sleeman, Inferring Student Models for Intelligent Computer-Aided Instruction, in R. S. Michalski, J. G. Carbonell,and onto microscopicexplanations (the kinetic theory of gases).
485
LEARNING,MACHINE
T. M. Mitchell (eds.),Machine Learning, An Artifi.cial Intelligence Approach, Troga,Palo Alto, CA, pp. 483-510, 1983. 8. G. S. Kahn, Knowledge Acquisition: Investigations and General Principles,in Mitchell, T. M., Carbonell,J. G. and Michalski, R. S. (eds.),Machine Learning: A Guide to Curcent Research,Kluwer Academic,Hingham, MA, pp. 119-t22, 1986. 9. T. M. Mitchell, "Generalization as search,"Artif. Intell. 18, 203226 (1982). 10. T. G. Dietterich and R. S. Michalski, A Comparative Review of SelectedMethods for Learning Structural Descriptions, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning, An Artificial Intelligence Approach, Tioga Press, Palo Alto, CA, pp. 41-82, 1983. 11. J. G. Carbonell, R. S. Michalski, and T. M. Mitchell, "Machine learning: A historical and methodologicalanalysis," A.f Mag.,6979 (Fall 1983). t2. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning, An Artifi,cia,lIntelligence Approach, Tioga, Palo Alto, CA, 1983. 13. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),MachineLearning, An Artifi.cial IntelligenceApproach,Yol.II, Kaufmann, Los Altos, CA, 1986. L4. R. Forsyth and R. Rada, Machine Learning, Applications in Expert Systems and Information Retrieual, Wiley, Halsted Press, New York, 1986. 15. P. H. Winston, Learning Structural Descriptions from Examples, Technical Report AI-TR-231, MIT, Cambridge,MA, 1970. 16. A. Newell and H. A. Simon, Humq,n Problem Soluing, PrenticeHall, EnglewoodCliffs, NJ, t972. L7. R. S. Michalski, A Theory and Methodology of Learning from Examples, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach, Tioga, Palo Alto, CA, PP.83-L34, 1983. 18. J. R. Anderson and P. J. Kline, A Learning System and Its Psychological Implications, Proceedings of the Sixth International Joint Conferenceon Artifi.cial Intelligence, Tokyo, Japan, pp. I6-ZL, 1979. 19. J. R. Quinlan, "Induction of decisiontrees," Mach. Learfl. 1,81106 (1986). 20. E. B. Hunt, J. Marin, and P. J. Stone,Experimentsin Induction, Academic Press,New York, 1966. ZL. R. Quinlan, Learning Efficient Classification Procedures and Their Application to Chess End Games, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artifi'cial Intettigence Approach, Tioga, Palo Alto, CA, pp. 463-482, 1983. 22. T. M. Mitchell, R. M. Keller, and s. T. Kedar-cabelli, "Explanation-Based Generalization: A Unifying View," Mach. Learn. l, 47 -80 (1986). 23. S. Minton, J. G. Carbonell, C. Knoblock, D. Kuokka, and H. Nordin, Improving the Effectivenessof Explanation-BasedLearning, Proceedingsof AAAI -86, Philadelphia, PA. 24. P. E. Utgoff, "Adjusting Bias in ConceptLearning," Proceedingsof the International Machine Learning Workshop, Allerton Park, IL, pp. 105-109,1983. 25. D. B. Lenat, The Role of Heuristics in Learning by Discovery: Three Case Studies, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning, An Artificial Intelligence Approach, Tioga, Palo Alto, CA, 1983. 26. T. M. Mitchell, P. E. utgoff, and R. B. Banerji, Learning by Experimentation: Acquiring and Refining Problem-Solving Heuristics, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds'), Machine Learning, An Artificial Intelligence Approach, Tioga, Palo Alto, CA, PP. 163-190, 1983. 27. B. porter and D. Kibter, Experimental Goal Regression:A Method
for Learning Problem Solving Heuristics, Machine Learning, Vol. 1. 28. A. L. Samuel, SomeStudies in Machine Learning Using the Game of Checkers,in E. A. Feigenbaum and J. Feldman (eds.),Computers q,nd Thoughf, McGraw-Hill, New York, pp. 71-105, 1963. 29. Y. Anzai, Learning Strategies by Computer, Proceedingsof the Canadian Society for Computational Studies of Intelligence, Toronto, Ontario, pp. 181-190, 1978. 30. P. Langl"y, "Learning to search:From weak methods to domainspecificheuristics," Cog.Scl. 9, 2L7-260 (1985). 31. S. Ohlsson, A Constrained Mechanism for Procedural Learning," Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, pp. 426-428, 1983. 32. P. Langluy, Learning Effective SearchHeuristics," Proceedingsof the Eighth International Joint Conferenceon Artificial lrutelligenee,Karlsruhe, FRG, pp. 4L9-421, 1983. 33. P. Brazdil, Experimental Learning Model, Proceedings of the Third AISBIGI Conference,Hamburg, FRG, pp. 46-50, 197834. D. M. Neves, A Computer Program that Learns Algebraic Procedures by Examining Examples and Working Problems in a Textbook, Proceedingsof the SecondNational Conferenceof the Cana' Intelligence, pp. dian Society fo, Computational Studies "f 1 9 1 - 1 9 5 ,1 9 7 8 . 35. D. Kibler and B. Porter, "Perturbation: A Means for Guiding GeneralizationoProceedings of the Eighth International Joint Conferenceon Artifi,cial Intelligence,Karlsruhe, FRG, pp. 415-418, 1983. 36. S. N. Minton, "Constraint-Based Generalization," Proceedingsof AAAI-84, Austin, TX, pp. 25I-254, 1984. 37. R. E. Fikes, P. E. Hart, and N. J. Nilsson, "Learning and executing generalizedrobot plans,"Artif. Intell.3,251-288 (L972). 38. J. G. Carbonell, Derivational Analogy: A Theory of Reconstructive Problem Solving and Expertise Acquisition, in R. S. Michalski, J. G. Carbonell,and T. M. Mitchell (eds.),MachineLearniog, An Artifi.cial Intelligence Approach, YoL II, Kaufmann, PP. 371-392,1986. 39. J. Clements, Analogical Reasoning Patterns in Expert Problem Solving , Proceedings of the Fourth Annual Conferenceof the Cog' nitiue ScienceSociety,Ann Arbor, MI, 1982. 40. J. G. Carbonell, Learning by Analogy: Formulating and GeneralrzingPlans from Past Experience, in R. S. Michalski, J. G. Car' bonell, and T. M. Mitchell (eds.),Machine Learning, An Artificial Intelligence Approach,Tioga, Palo Alto, CA, pp. 137-162, 1983. 41. J. Doyl€, "A truth maintenancesystem,"Artif. Intell. L2r23L-272 (1979). 42. E. A. Feigenbaum, E. A. Buchanan, and J. Lederberg,On Generality and Problem Solving: A Case Study Using the DENDRAL Program, in D. Michie (ed.), Machine Intelligence, Vol. 6, Edinburgh University Press, 1971,pp. 165-190. Assistant, in J. 4g. J. McDermott, XSEL: A Computer Salesperson's Hayes, D. Michie, and Y-H. Pao (eds.),Machine Intelligence,Vol. 10, Ellis Horwood, Chichester,UK, pp. 325-337, t982. 44. J. McDermott, R1: A Rule-Based Configurer of Computer Systems, Technical Report, Carnegie-Mellon University, Computer ScienceDepartment, 1980. 45. E. Shortliffe, Computer Based Medical Consultations: MYCIN, Elsevier, New York, 1976. 4G.D. Waterman, F. Hayes-Roth,and D. Lenat (eds.),Building Expert Systems,Addison-Wesley,Reading,MA, 1983. 47. R. C. Schank, "The current state of AI: One man's opinion:' AI M a g . 4 ( 1 ) ,1 - 8 ( 1 9 8 3 ) . 48. J. Doyle, "Expert systemswithout computers,"AI Mag.5(2),59* 63 (1984). 49. T. M. Mitchell, Version Spaces:An Approach to Concept Learni*g, Ph.D. Dissertation, Stanford University, December 1978.
LEARNINC,MACHINE b0. P. Winston, Learning Structural Descriptions from Examples, in P. Winston (ed.), The Psychologyof Computer Vision, McGrawHill, pp. 157-2A9,1975. 51. I. McMaster, J. R. Sampson,and J. E. King, "Computer acquisition of natural language: A review and prospectus,"Int. J. ManMach. Stud.8, 367-396 (1976). b2. J. R. Anderson, "Induction of augmented transition networks," Cog.Sci. l, L25-157 (1977). 53. S. Pinker, "Formal models of language learning," Cognition 7, 2L7-283 (1979). 54. P. Langley, "Language acquisition through error recoveryi' Cog. Brain Theor. 5,2LL-255 (1982). 55. R. Solomonoff,A New Method for Discovering the Grammars of Phrase Structure Languages," Proceedings of the International Conference on Information Processing, UNESCO, Paris, June, 1959,pp. 285-290. 56. K. Knowlton, SentenceParsing with a Self-OrganrzrngHeuristic Program, Ph.D. Dissertation, MIT, Cambridg*, MA, L962. 57. P. I. Garvin, "The automation of discovery procedurein linguistics," Language 43, L72-L78 (1967). 58. J. J. Horning, A Study of Grammatical Inference, Technical Report No. CS 139, Computer ScienceDepartment, Stanford University, 1969. 'ol,anguage acquisition and the discovery of phrase 59. J. G. Wolff, structur€," Lang. Speech23r 255-269 (1980). 60. R. Berwick, Learning Structural Descriptions of Grammar Rules from Examples," Proceedingsof the Sixth International Conference on Artificial Intelligence,Tokyo, Japan, pp. 56-58, 1979. 61. R. Berwick, Computational Analogues of Constraints on Grammars: A Model of Syntactic Acquisition, Proceedingsof the 18th Annual Conferenceof the Associationfor Computational Linguistics,Philadelphia, PA, pp. 49-53, 1980. 62. M. P. Marcus, A Theory of SyntacticRecognitionfor Natural Langua,ge,MIT Press, Cambridg", MA, 1980. 63. N. Chomsky, Rules and Representations,Columbia University Press,New York, 1980. 64. K. L. Kelley, Early Syntactic Acquisition, Technical Report P3719,The Rand Company,Santa Monica, CA, L967. 65. L. Siklossy, Natural Language Learning by Computer, in H. A. Simon and L. Siklossy (eds.),Representationand Meaning: Experiments with Information Processing Systems, Prentice-Hall, EnglewoodCliffs, NJ, pp 288-328, L972. 66. S. Klein and M. A. Kuppin, An Interactive, Heuristic Program for Learning Transformational Grammars, Technical Report No. 97, Computer SciencesDepartment, University of Wisconsin, 1970. 67. C. Hedrick, "Learning production systemsfrom exampl€s,"Artif. Intell. 7, 2L-49 (1976). 68. L. H. Reeker, The Computational Study of Language Acquisition, in M. Yovits and M. Rubinoff (eds.),Aduancesin Computers,Academic Press,New York, L976. Proc. 17th 69. J. G. Carbonell, "Towards a self-extending Meet. Assoc. Cornputat. Ling. 4,3-7
(1979).
70. M. Selfridge, A Computer Model of Child Language Acquisition," Proceedingsof the SeuenthInternational Jaint Conferenceon Artifi,cial Intelligence,Vancouver' 8.C., pp. 92-96, 1981. 7L. V. Sembugamoorthy,A Paradigmatic Language Acquisition System: An Overview, Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo, Japan, pp. 788-790, 1979. 72. J. R. Anderson, A Theory of Language Acquisition Based on General Learning Principles," Proceedings of the Seuenth International J oint Conferenceon Artificial I ntelligence,Vancouver, B.C., pp. 165-170, 1981. 73. R. S. Michalski, and R. E. Stepp,Learning from Observation:Con-
487
ceptual Clusteritg, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning, An Artifi,cial Intelligence Approach, Tioga, Palo AIto, CA, pp. 331-364, 1983. 74. D. Fisher and P. Langl"y, Approachesto Conceptual Clusteritg, Proceedingsof the Ninth International Joint Conferenceon Artificial Intetligence,Los Angeles, CA, pp. 691 -697 , 1985. 75. M. Leb owitz, ConceptLearning in a Rich Input Domain, Proceed' ings of the International Machine Learning Workshop,Monticello, IL, pp. L77-L82, 1983. 76. R. E. Stepp and R. S. Michalski, ConceptualClustering: Inventing Goal-Oriented Classifications of Structured Objects,in R. S. Michalski, J. G. Carbonell and T. M. Mitchell (eds.),Machine Learn' ing, An Artifi.cial Intelligence Approach, YoL II, Kaufmann, Los Altos, CA, pp. 471-498, 1986. 77. K. Nelson, "Some evidencefor the cognitive primacy of categorization and its functional basis," Merrill-Palmer Quart. Behau.Deuel. t9,2t-39 (1973). 78. D. B. Lenat, Automated Theory Formation in Mathematics, Proceedings of the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge, MA, pp. 833-842, L97779. P. Langley, G. Bradshaw, J. Zytkow, and H. A. Simon, Three Facets of Scientific Discovery, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence,Karlsruhe, FRG, pp. 465-468,1983. 80. P. Langley, J. Zytkow, H. A. Simon, and G. L. Bradshaw, The Search for Regularity: Four Aspects of Scientific Discovery, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning, Vol. 2, Troga, Palo Alto, CA, 1984, 425-470. 81. P. W. Langley, H. A. Simon, and G. L. Bradshaw, Rediscovering Chemistry with the BACON System, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning, An Artifi,cial Intelligence Approach, Tioga, Palo Alto, CA, pp. 307-330, 1983. 82. B. Falkenhainer, Proportionality Graphs, Units Analysis, and Domain Constraints: Improving the Power and Efficiency of the Scientific Discovery Process, Proceedings of the Ninth IJCAI, Los Angeles,CA, pp. 552-556, 1985. 83. R. S. Michalski, "Pattern recognition as rule-guided inductive inference," IEEE Trans. Patt. AnaI. Mach. Intell. 2,349-361 (1980). 84. E. A. Feigenbaum, B. G. Buchanan, and J. Lederberg,On Generality and Problem Solving: A Case Study Using the DENDRAL Program, rn Machine Intelligence Vol. 6, Edinburgh University Press,Edinburgh, pp. 165-190, L97I. 85. P. Langl"y, J. M. Zytkow, H. A. Simon, and G. L. Bradshaw, The Search for Regularity: Four Aspects of Scientific Discovery, in Machine Learning, An Artificial Intelligence Approach, Vol. II, R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Kaufmann, Los Altos, CA, pp. 425-469,1986. 86. K. Forbus, Qualitative Reasoning about Space and Time, in D. Gentner and A. Stevens (eds.), Mental Models, Erlbaum, Hillsdale, NJ, pp. 53-73, 1983. 87. J. DeKleer and J. S. Brown, Assumptions and Ambiguities in Mechanistic Mental Models, in D. Gentner and A. Stevens (eds.), Mental Models, Erlbauh, Hillsdale, NJ, pp. 155-190, 1983. 88. J. Larkin, F. Reif, and J. G. Carbonell, "FERMI: A flexible expert reasonerwith multi-domain inference,"Cog.Sci. 11, (1987). 89. J. R. Anderson, The Architecture of Cognition, Harvard University Press, Cambridg", MA, 1983. 90. R. S. Michalski, "Synthesis of optimal and quasi-optimal variablevalued logic formulas," Proceedings of the 1975 International Symposium on Multiple-Valued Logic, Bloomington, Indiana, 1975,pp. 76-87. 91. R. S. Michalski and J. B. Larson, "Selectionof most representative training examplesand incremental generation of VLr hypotheses: The underlying methodology and description of programs ESEL
488
LIFER
and AQ11," Technical Report 867, Computer ScienceDepartment, University of Illinois, Urbana-Champaign, 1978. 92. R. E. Reinke and R. S. Michalski, "Incremental learning of decision rules: A method and experimental results," in J. E. Hayes, D. Michie, and J. Richards (eds.), Machine Intelligence 11, Oxford University Press, Oxford, UK, 1986. J. CnneoNELLANDP. LINGLEv Carnegie-Mellon
LIFER A top-down, left-to-right-driven, natural-Ianguage parser (see Parsing), controlled by a transition tree grammar, LIFER was designed by Hendrix in 1977 to interface natural-language (seeNatural-language interfaces) front-end with LADDER, a distributed databasesystem developedat SRI lsee G. Hendrix, "LIFER: A natural language interface facility," SIGART Newslett., 6L,25-26 (1977),and G. Hendrix, E. Sacerdoti,D. Sagalowicz, and J. Slocum, "Developing a natural language interface to complex data," ACM Trans. Databa.seSys. 3, 105L47 (1e78)1. A. HaNYoNGYuHeN SUNY at Buffalo
INTELLIGENCE LIMITSOF ARTIFICIAL The question of what intrinsic limits constrain the AI enterprise (which can be defined as the attempt to construct electronic systems exhibiting human or superhuman levels of capability in areas traditionally regarded as mental) has been debated within very wide limits. On one side one finds a substantial community of researchers who believe firmly that such systemswill prove possible.Their common,but not universal, assumption is that the organic brain is in effect a complex electrochemical system operating in some, doubtless highly parallel, but essentially computerlike fashion, and hence gives direct proof of the reahzability of intelligence by mechanism; uide the flat-footed maxim, current in much of the AI research community, that the brain is a meat machine. Opposing this view one finds the assertion that mental processesare essentially indecomposable,lie outside the narrow reach of scientific reductionism, and that their indecomposability sets fundamental limits to any attempt to duplicate intelligence by mechanism. From this point of view, the hlstory of AI research to date (1), consisting always of very limited successin particular areas, followed immediately by failure to reach the broader goals at which these initial successesseem at first to hint, gives empirical proof of the presenceof irreducible wholes fundamentally incapable of being comprehended, much less duplicated, by the narrowly technical proceduresof AI researchers. This philosophical debate concernsthe existence of funda' mental limits to the AI enterprise, which however is only one of several kinds of potentially significant limits that needto be considered.Even if no such fundamental limits existed, that is, even if a hypothetical infinitely fast computing engine possessedof infinite amounts of memory could in principle duplicate all aspectsof human mental capability, it would still re-
main necessary to ask just how many essentially different capabilities are required for intelligence (qv) to result, and how much computation and data storage its duplication would require. Suppose,for example, that it could be shown that the minimum computational resource required to duplicate some human mental function is implausibly large, relative either to the extreme limits of physically reahzable computation or to the largest computers likely to be constructed over the next decadesor centuries. In this case construction of significant artificial intelligenceswould be blockedby inescapablepractical limits, even if fundamental limits did not exist. Finally, even if no such cornputational factorsproved to limit the possibility of AI, one would still want to assessthe existing state of the fietd and project the rate of progress likely to result from application of its present intellectual tools to the profound problems with which it must wrestle. The next five sections develop points relevant to the three kinds of limits defined in the preceding paragraph. A final section discussescertain other concerns,implicit in the debate betweenthe enthusiastsof AI and their'opponents,which may explain someof the vehemencethat has crept into this debate. The Questionof FundamentalLimitsto the Constructabilityof Artificial lntelligences lssue.In his deA Very Brief Commenton the Philosophical servedly famous 1950 article, Alan Turing proposedto replace amorphous philosophical debate about whether machines could "really" think by the more pragmatic question of whether they could imitate the behavior of thinking beings well enough to make the assumption that they are "thinking" the most comfortable basis for continuing interaction with them (see Turing test). The practical force of Turing's argument seemsoverwhelming. If at some future time peoplefind themselvessurrounded by artificially producedbeings capable of performing the same variety of daily tasks, physical and intellectual, that one would expect of a person,and in particular capableof conversing on an unrestricted variety of topics in an entirely easy,flexible manner, AI will have been attained. This is not to deny the possibility that humans in this situation may chooseto regard themselves as a kind of nobility, distinguished in view of their long and imperfectly understood biological pedigree from more fully understood and easily repairable/replaceable creatures. Such an attitude can even find objectivejustification in the reflection that as long as any significant aspects of human function remain incompletely understood,humanity incorporates a pool of capabilities, tested by long evolution, which deservesprotection and cautious nurture proportional to its long history and mysterious potential; these strong points also apply to whales and snail-darters. Nevertheless,in the real presenceof robots exhibiting human levels of flexibility and capability, the question as to whether these beings "really" thought or merely "appearedto" think and feel would lose pragmatic force, though of courseits ideological importance might grow, perhaps even greatly. It makes less sensefor this entry to pursue this debate than to assessthe probability that such a situation will really arise. The Brainas a BiochemicalComputer. Part of the confidence with which AI researchers view the prospects of their field stems from the materialist assumption that "mind" is simply a name for the information-processing activity of the brain and that the brain is a physical entity that acts according to the
INTELLIGENCE 489 LTMITS OF ARTIFICIAL laws of biochemistry in a manner uninfluenced by any irreducible "soul" or other unitary, purely mental entity incapable of analysis into a causal sequence of elementary biochemical events. Compelling evidencefor +ho equation of mental function with the physical activity of the brain is easily drawn from many branches of science,and in particular from experimental neurobiology. For example, discrete lesions at the rear of the cerebral cortex produce discrete blind spots (scotomas) in the visual field, which turns out to communicate in 1-1 continuous fashion with the family of sensory neurons comprising the retina of the eye. Similarly, stimulation of points on the upper central portions of the cortex (temporal motor area) will produce elementary twitching motions of particular muscles.Physical manipulation of nervoustissue can also generate and/or remove sensationshaving profound motivational significance, for example, direct application of an excess of potassium to the cutaneous nerves causessharp pain. Conversely, application of Novocaine to an appropriate branch of the facial nerve blocks dental pain in particular areas, thus permitting dental manipulations that would be unbearably aversive were the nerves communicating this sensation of pain not "turned off." These elementary remarks, plus thousands of far more precise observations obtained by direct recording of the electrical activity of individual neurons, show that neuronal activity reflects external stimuli and behavior (even intended behavior before its overt expression) in detailed and quantitative fashion, dt least for those sensory and motor systems for which such correlations can be expecteda priori to be understood most easily. As might also be expected,detailed understanding of the manner in which neuronal activity reflects and governs a living creature's interactions with its environment is most complete for the simplest animals, particularly those whose nervous systems consist of relatively few neurons that, being particularly large, are relatively easy to identify and examine individually. A typical but particularly well-studied example of this is the marine snail Aplysia californica, whose nervous system consists of roughly 20,000 neurons divided into nine separate ganglia within which hundreds of individual cells have been specifically identified. Fairly detailed understanding of the patterns of neuronal activity and interconnection governing many of the most typical and vital reactions of this simple creature has been attained. For example, much is known about the manner in which its nervous system controls heartbeat, respiration, gill withdrawal reflex, releaseof ink in responseto a senseddanger, feedirg, reproduction, and so on. Moreover, Aplysio is capable of certain rudimentary types of learning (including sensitization, which progressively increases reactions to certain stimuli, and habituation, which progressively reduces other reactions), and the biochemical basesfor these forms of neuronal plasticity have been at least partly elucidated. Finally, the nervous activities controlling sensation and behavior in Aplyslo have been shown to be inherent properties of the nervous system, which persist even when this system is dissectedout of the body of Aplysla and maintained artificially in a suitable nutrient bath, provided that the afferent signals expectedalong certain sensorynerves are supplied electrically after the sensory organs that would normally give rise to them have been removed. In this last casethe analory with a robot's computer brain detachedfrom its body and running in an artificial environment of input and proprioceptive signals is overwhelming. Some may object to facile extrapolation from the reflexive
behaviors of a simple 20,000-neuroncreature to the vastly more sophisticatedactivity of the roughly hundred billion neurons of the human brain. Nevertheless,the (admittedly highly incomplete) biological evidence available thus far seems to favor just such an extrapolation: Living creatures of whatever complexity seem to share a common neuronal biochemistry in much the same way that they share a common genetic code. Thus, what neurobiological evidencethere is hints strongly that no difference of fundamental principle separates the brain from any other other form of computer, or can be expected to limit the range of possibilities that AI research can legitimately explore. Quantitative EstimatesConcerningthe Brain's Computing Power. Even the resolutely mechanistic conclusiondrawn in the preceding subsectionleaves open two possibilities,either of which could still rule out the possibility of attaining humanlike levels of mental capability by artificial means. In the first place, the mass of computational activity performed each second by the living brain and/or the mass of information available to the brain for use during these computations might be so large as to make electronic duplication of the brain's activity implausible. Moreover, even were this not the case, the algorithms that regulate the computational activity of the brain might be so marvelously subtle as to frustrate their rediscovery by AI researchersfor a very long time. The next task is to examine these possibilities. The human brain consistsof approximately 1011neurons, though this estimate is uncertain to within a factor of 10. Neurons typically (though not invariably) communicate by transmitting discrete electrical spikes (action potentials) to a population of follower neurons. As far as is known, the precise amplitude and shapeof such a spike and the precisetime of its arrival within an interval of 2 ms or so are physical details the nervous system is not able to exploit. This allows one to model each spike as a single information-carryin g "bit" that can be present or absent in a neuron's output stream. One can therefore regard a neuron as producing output information at a rate of approximately 100 bits/s. This leads to an estimate of 1013 bits/s, give or take a factor of 100, for the internal "bandwidth" of the brain. The computational activity of individual neurons involves a considerablevariety of mechanisms still very imperfectly understood. Nevertheless, a considerable mass of experimental evidence supports the following general picture. Information is transmitted by a neuron to its follower neurons at interneuron junctions called synapses.A single neuron can have 10,000 synaptic inputs, though in some cases many fewer, and in other casesas many as 100,000inputs are known to converge on single neurons. Thus, the total number of synapsesin the brain can be estimated as 1015,though this estimate is uncertain by a factor of roughly 100. Input signals transmitted to a neuron (generally chemically) acrossa synapsetrigger a wide variety of reactions. A common effect, and one that seems certain to be of particular importance for the fastest computations performed by the brain, is modulation of the ionic conductivity of the affected neuron's membrane, which either raises the voltage of a portion of its interior (excitation) or lowers this voltage (inhibition). The affectedneuron then combines the voltage changes generated by such synaptic effects (after attenuation in spaceand time in a manner determined by the detailed chemistry and geometry of the neuron and its synapses)and, if the resulting combined (e.g.,summed) volt-
490
LIMITSOF ARTIFICIALINTELLICENCE
age exceedsa reaction threshold, the neuron generatesan out- amounts of calculation, and might therefore represent a very put spike, which is then transmitted to all its output synapses. significant obstacle to the easy advance of AI. The largest Other forms of synaptic input are known to have slower but general-purposesupercomputersare not likely to attain perlonger-lasting biochemical effects than the ionic effects that formance levels of more than 1012arithmetic operations per probably support the bulk of the brain's information-trans- secondduring the next decade.(However, specially designed muting activity. Stimulation of certain synapsescan, for ex- systolic arcaysmight attain higher speedsfor particular operample, trigger enzymatic activities within a neuron that mod- ations.) Though rotating memories capable of storing 1012 ify its biosynthetic activities in significant ways, for example, bytes do not appear entirely infeasible, electronic memories by increasing or decreasing its susceptibility to subsequent seementirely unlikely to exceed1012or even 1011bytes within fast excitatory or inhibitory stimuli acting ionically. Depend- a decade.The estimated internal communication bandwidth of ing on the chemical effects involved, such synaptic modifica- the brain, roughly 1012bytes/s, seems somewhat easier to tion of faster synaptic responsescan exert an effect for rela- match artificially, for example, by a switching network of 104 tively short periods (e.g., 50 ms) or for periods of several ports each capableof handling 100 Mbyte/s. The very largest supercomputersystemslikely to be develseconds,minutes, or days, perhaps even permanently. Other synaptically triggered enzymatic reactions can initiate se- oped over the next decadeor two may still fall far short of the quenced biochemical changes that, for example, enhance a tu* information-processing capabilities of the brain, perhaps neuron's subsequentelectrical responsefor several tens of mil- by a factor of 106 or more. However, differences in the alliseconds but then inhibit its responsesfor a longer period, gorithmic effectivenesswith which this computing power is leading to complex, patterned alternations of behavior. The employed can outweigh even so large a factor. varied single-neuron behaviors that can be engenderedby the wide spectrum of enzymatic actions that have been demon- The Kind of "Program" the Brain is Likelyto Employ strated experimentally have been systematically explored in simple animals such as Aplysia, some of whose neurons are Were the computational activity of the living brain regulated known to have highly individualized patterns of continuing, or by algorithms that are both exceptionally effective and also subtle enough to defy rediscovery,the difficulty of duplicating periodic, or burst activity. these algorithms might represent another significant limit to of range a wide such summarrze to easy not Though it is progress of AI. However, it seemsunlikely that such algothe the representing numbers few patterns by a response synaptic play a role in the functioning of the brain, so that information-processingpower and storage capacity of a single rithms seem likely to favor artificial sysconsiderations algorithmic neuron, the following estimates do not seemwildly unfair. One This advantage could help artifisystems. natural over tems of strength long-term byte may well suffice to represent the advantage in raw comthe substantial overcome systems cial adFour accuracy. sufficient with synapses each of a neuron's preceding paraditional bytes can then be taken to give a sufficiently complete puting power ascribed to the brain in the to justify such seems that argument graphs. neurological The representation of the short-term biochemical state of both of postnatally all effects sets If one follows. is as conclusion a synapcoruesponding sides of a synapseand of the state of the hints evidence neuroembryological aside, information Iearned given a to up history stimulation its by tic gap, 4s determined (genetically determined) innate picture the of following the at to guesses lead one quantitative rough very Such moment. (including its learning capabilities). estimate the long-term memory available to the brain as (very capabilities of the brain tissue, particular subpopulations nervous a developing Within data roughly) 1016bytes and the amount of shorter term morphological and biochemical specialized x on take of cells 4 as synapses its of each of neededto char actertzethe state (Such cell specialization is of coursethe basic characteristics. be then can neuron each of activity logical The 1016bytes. in general.)Almost regardedvery roughly as a processthat combines10,000input mechanismof embryologicaldevelopment number of specialtotal the yet concerning known is nothing each 100 times bytes with roughly 40,000synapse-statusbytes however, what develop; that subpopulations neuronal ized (analog) rearithmetic of second;one can guess the amount is seemsconthere evidence physiological and morphological arith(again 107 elementary roughly) very be to quired for this several thounumber these that assumption the metic operations per neuron per second,suggesting that the sistent with populations cell these of Each of thousands. tens or sands neucomputing rate needed to emulate the entire brain on a generating thereby extent, genetically determined grows to a operaarithmetic 1018 as *on-ty-neuron basis may be as high system. nervous portion the of small or large a rates (Of computation lower much course, tions per second. Cell migration over large or small distancesand genetically might suffice to represent the logical content of the brain's determined temporal sequencingof growth phasesamong varactivity, if this could be discovered.) ious neuronal subpopulationsalso play a role in determining ofestimates Even though it is not inconceivable that the final tissue morphology and neuronal connectionpatterns. As increased be to have might paragraph fered in the preceding neurons grow thin projections (axonsand denby facto6 of 10s or even 10a,it seemsmuch more likely that they specialize, extend as little as a few micrometers or as can that they overstate the usable arithmetic and memory storage ca' drites) in length. The paths along which these neurometer a as much inspecanatomical pu.ity of the brain by large factors. Indeed, grow seem to be determined by such biochemprojections nal it appear make activity neuronal of liotr and direct recording of the "growth cones"present at the abitity the both that the degree of precision in the wiring of the brain is ical factors as react to chemicals present on the to growing axon a of tip low and that (perhaps in consequence)the brain typically emThese reactions seemto result touch. they cells the of surfaces calculasimilar ploys hundreds of neurons to perform closely affinities and adhesionsand to be supplementedby lionr whose results are then only used in some coarsely aver- in selective chemical gradients present in developing tissue. aged manner as mental activity proceeds.Nevertheless, the more diffuse gpowth of small spots of tissue to which particular phased The 1016 or estimates, 1018arithmetic operations per secondand growth cones have positive or negative affiniaxonal of sorts stupendous for allow 101?bytes of memory available, still
LIMITS OF ARTIFICIAL INTELLIGENCE
ties can cause these growth conesto move sharply in particular directions, allowing intricately interwoven neuronal morphologiesto develop. Once the developing projectionsfrom a given neuronal subpopulation have reached their target tissue, similar chemical mechanisms may be used to recognize various subpopulationspresent in that tissue and to guide the formation of connections having specific strengths among the members of particular "immigrant" and "native" neuron subpopulations. Moreover, if the sourcetissue sending projections to a target tissue is large enough for its geometric extent to have informational significance, cells within both these areas can be marked, perhaps by chemical gradients, in a manner representing their location, and such markings can then strengthen or weaken the affinities axons with given origins have for cells at correspondinglocations in an extendedtarget tissue. Such geometrically conditioned affinities would allow separate neuronal areas to connect to each other in geometrically regular and informationally significant spatial patterns, which various learninglike postnatal growth processescan then refine. Basic developmental mechanisms of roughly the kind just sketched seem to define the innate structures present in the brain just after birth. Combinatorially complex forms of information processingseem unlikely to be used, in part because there is little evidence in neurocortical morphology of the detailed sychronization and very precise connection patterns neededto sustain such algorithms, and also becausethey seem unlikely to have arisen in the course of organic evolution, which typically proceedsby progressive adaptation and enlargement of existing structures rather than by sudden leaps. In contrast, artificial data analysis systems can often make enormously effective use of delicately balanced patterns of data processingand motion, which often speedsup the generation of neededintermediate or final results by many orders of magnitude. Algorithmic considerations seem therefore to favor artificial systems over natural. LimitsSet by QuantitativeTheoryof ComputationalComplexity Currently available detailed mathematical modelsof nervous system activity serve more to generate speculationconcerning the operation of sensoryfunctions such as vision, tactile sensation, and hearing than to represent the brain's ability to deal with more discrete or symbolic material, that is, to reason.The most remarkable, and perhaps fundamental, part of this is the brain's ability to orgartuzeinformation presented in relatively disorderedform into internally organized structures on which sophisticated, coherent courses of symbolic and real-world action can be based. It is the present lack of this ability that makes it necessaryto program computers rather than simply to teach them; teaching would be vastly more convenient and would bring the era of AI very closeif it becamepossible.To clarify this basic distinction, note that the ability of computers to accept, retain, and utilize fully structured material is already enormously superhuman. For example, a computer can acquire and proceed to use the very complex set of rules for compiling a programming language in just a few seconds; nothing in the biological world other than the transmission of a full set of genes during conception matches this enormous rate of information transfer. On the other hand, although a computer can easily acquire and retain the whole text of the EncyclopaediaBritannica (even by reading its pages successively), computers are at present incapable of making any ac-
491
tive use of the information these volumes contain, since even the most sophisticated currently available text analysis algorithms fall far short of what is neededto extract procedurally useful information from this encyclopedia, which does not have anything like the degree of rigorous order and standardization that characterizes computer programs (see Naturallanguage understanding). If the basic obstacle posed by the need to program in detail could be overcome,computers could ingest the information contained in all the world's libraries and use this information with superhuman effectiveness.Accordingly, a basic goal of AI researchhas been the discoveryof principles of self-organization robust enough to apply to a wide variety of information sources.Any such organizing principle would have to allow coherent structures capable of directly guiding some form of computer action to be generated automatically from relatively disorganized,fragmented input. The present state of AI research is most fundamentally characterized by the fact that no such robust principle of selforganization is as yet known, even though many possibilities have been tried. Indeed, high hopes for the successof one or another apparently promising general principle of this type have characterized successiveperiods of research in the history of the subject. A typical attempt of this kind, particularly intriguing becauseof the great generality and potential power of the mathematical tools that it proposesto employ, has been the attempt to use formalisms drawn from symbolic logic (qv) as the basis for a self-organtzation capability. Mathematical axioms and theorems are mutually consistent fragments of information that can be accumulated separately and indefinitely; mathematical proofs based on these axioms and theorems are highty structured wholes that arise from these fragments according to the simple, well-understood principles of formal logic. If they could be generated automatically, these proofs, or various prooflike structures easily derivable from them, could be used almost immediately to produce many other symbolic structures, including computer programs. Here a door to the most ambitious goals of AI seemsto swing open. Unfortunately, this prospect, like all others that have been explored to date, has proved to be blocked by fundamental considerationsof computational efficiency. The modern quantitative theory of computational infeasibility deriving from the work of Godel and Church (seeRef. 2) allows one to prove rigorously that enormous computational costswill always make it impossible for programmed systems to answer certain general classesof questionsin all cases.The original Church-Godel result is qualitative rather than quantitative and can be summed up in a short unsoluability statement: There can exist no computer program P capable of examining every other program Q and determining correctly, in finite time, whether A will run forever or halt eventually. Sincemany other combinatorial problemscan easily be proved equivalent in difficulty to this basic unsolvableproblem, they are just as unsolvable.Recently,more quantitative work along these same lines has shown that there exist signifrcant classes of mathematical problems that, although algorithmically solvable in the sensethat one can write programs capableof solving each of the problems in such a class, are nevertheless intractable, since most of the problems in these classescany computational costs that rise with enormous rapidity as the program classes are progressively generalized in directions that eventually carry them over into the Church-Godel zone of complete unsolvability. As this happens, seemingly small Iooseningsof the constraints that define a particular class of
492
LIMITSOF ARTIFICIATINTETLIGENCE
problems always increase enormously the cost of dealing with the generalizedclass. Problems in computational logic, whose efficient solution would provide very general and powerful tools for the development of AI, illustrate these general remarks. Any mathematical statement can be written in a convenient yet perfectly rigorous way using the simple notations of predicatelogic. For example, the predicate statement (FOR ALL x, !, z, u, u, w) lReal(r) & Real(y) & Real(z) & Real(u) & Real(u)& Real(ra)l (/ ) implies [ ( r + u ) 2+ u + u ) 2+ ( z + w ) z l t r z = (x2 * y2 + zz)ttz+ (u2 + u2 + wz)ttzl
Ferrante and Rackoff (4) proved in L975 shows that the running time even of the fastest possible algorithm capable of deciding the truth or falsity of every statement s of Tarski form must rise exponentially with the length of s for some (though not for all) such statements s. Thus, in unfavorable casesthe minimum running time of such algorithms could be billions (10e) of years, making their existencea matter of theoretical interest rather than of practical significance.Theorems of this same sort apply to many other classesof mathematical statements having decision problems of roughly the same degree of inherent difficulty as the Tarski class and imply even higher degrees of computational difficulty for more general statement classes.For example, although the full class of statements of Tarski form becomesundecidable if applied to integers rather than real numbers, the subclassof statements involving only arithmetic addition, subtraction, and comparison operations (but no multiplications or divisions) remains decidableeven if applied to integers. However,here again one is closeenough to the zoneof absolute unsolvability for computational costs to rise prohibitively. More specifically,a theorem of Fisher and Rabin (5) showsthat these costsmust bejust as large as the Tarski case costs describedabove. These general statements of computational infeasibility play the same role in computer sciencegenerally and AI particularly that the first and second laws of thermodynamics play in physics and engineering, that is, they set limits to what it is reasonable to attempt. Although they do not at all rule out the possibility of AI, they do suggestthat it cannot be attained by programming any unitary mechanism of complete generality from which all that is neededwill follow by simple specialization.Instead, it may be necessaryto developa relatively large number of artificial systemsthat mimic particular types of reasoning and mental functions in casesspecialized enough to have particularly efficient treatment, and by systems whose"coverage:'although broad enoughto be very useful, is less comprehensive than is assumed by naive mathematical statements of the problems they address. The individual functions thereby produced would then have to be integrated into a software structure capable of a very advanced level of function, which hopefully would also assist substantially in its own further development. Painfully detailed manual development of very many separate subcomponents of a highly complex total system capableof exhibiting a high level of intelligent function will only be avoided if some relatively uniform principle allowing computers to learn in human like fashion is somehowdeveloped.At present one has no real inkling of how this might be done, though the preceding model of neural function suggeststhat it ought somehow to be possible. It is equally unknown whether this present incapacity is a consequenceof grossly insufficient computing power, as some of the estimates made above seemto suggest, or simply reflects the fact that those simple yet efficient mechanical learning techniques have not yet been found whose discovery will enable much more rapid advance.
captures the geometric fact that a broken line in three-dimensional spaceis always at least as long as a straight line connecting the same end points. [In the preceding formula, clausesof the form Real(r) expressthe fact that the variable x designates a real number.l Becauseof their great generality, predicate formalisms like that seen in the preceding formula provide very interesting testing grounds for AI research.Any method that allowed the truth or falsity of large classesof formalized statements of this kind to be decidedautomatically and efficiently would also allow one to perform many other operations, including the automatic composition of many kinds of computer prograffis, the planning of grasping positions and motions for robot arms, and many other geometric and spatial analyses.However, a considerablebody of rigorous theoretical analysis now rules out this possibility. Specifically, it has been shown that algorithms for deciding the truth of entirely general predicate statements cannot exist, nor can there exist algorithms capableof performing any entirely general processof formal reasonirg, construction, or problem solving equivalent in difficulty to the task of classifying entirely general predicate statements as true or false. Indeed, the existence of such algorithms is directly ruled out by the basic Church-Godel theorem referenced above. On the other hand, algorithms capable of deciding narrower but still quite interesting subclassesof predicate statements do exist. For example, a famous theorem of Tarski (3) assertsthe existenceof an algorithm capable of deciding any statement concerning real numbers that can be written using only the four elementary arithmetic operations (addition, subtraction, multiplication, and division), comparisonsbetweenreal numbers (e.9.,clauses of the form tc > y), the elementary Boolean connectives(and, or, implies, not), and the standard predicate quantifiers (for all r, for somer). However, the task this algorithm accomplishes lies close enough to the Church-Gddel zone of unsolvability that even apparently slight generalizations of this problem prove to be algorithmically unsolvable.For example,the same decisionproblem for the classof statements having exactly the same structure but in which variables designate whole numbers (integers) rather than arbitrary real numbers (which for Limitationsof the PresentStateof Knowledgein Al technical reasonsare somewhateasier to deal with) is unsolvSince principles of self-organization allowing generation of able. (3) useful symbolic structures from more disorganized broadly real for problem Moreover, since the Tarski decision input would be crucial to the progressof AI, fragmentary and of decapable algorithm &try unsolvable, nearly is arithmetic discovery of such principles has been much the at aiming ciding the truth/falsity of any statement of the form described work of progress in this direction have always Signs emphasized. prohibitive, computamust require enormous, and indeed particular excitement. Unfortunately, all such efgenerated of theorem a Specifically, case. tional resources in the worst
uMrrs oF ARTtFtclAttNTEtLtcENcE
493
features (such as corners, straight edges, circles) that can be detected directly or by statistical [e.9., Hough transform (qv)J methods. Another promising object recognition technique is computation of invariants of local shape (rotational invariants) for the edges of two-dimensional figures and for the "ridges" (curves along which at least one ofa surface's extrinsic curvatures is large) ofthree-dimensional objects.Any sharp color or reflectivity boundaries present on the surfaces of painted or otherwise marked three-dimensional objects can also be used. To the extent that it is possible to defi.ne invariants stable against the disturbing effects of observational noise, changes in illumination level, viewing angle, specularity, and so on, this technique can support recognition even of heavily obscured objects, and allows the use ofhashing techniques that greatly reduce the cost of identifying objects selected from large vocabularies of potential candidates. Beyond this, sophisticated use ofcolor (seeColor vision) and texture (seeTexture analysis) cues available on object surfacesmay prove possible. Here, however, one comesto a point at which the human (or mammalian) visual system displays a sophistication that researchers seem far from being able to match, even after several decadesof determined effort. In some remarkable way the eye is able to integrate the evidential weight of fragmentary clues and to make use not only ofdotted and dashed lines but also ofcomputationally elusive texture boundaries, vague differences ofshading, and curves that are very badly broken up by obscuring objects (e.g., foliage) and complex shadow patterns. All this can be done in a manner resistant to the confusing effects of very large changes in illumination pattern, intense specularities, image blurring, and the myriad other effects all too painfully familiar to the vision researcher. FiSensoryFunctions.These include analysis of images (com- nally, all this is possible for scenescontaining large numbers puter vision), analysis of natural language made available in of objects,some unfamiliar, seen in a great variety of apparent written form, and analysis of continuous speech. sizes, from severely distorting angles, and in the absence of Analysisof lmages.In spite of a great deal of work on the binocular information. first steps of image processing(e.g., deblurring, edge detecAt the present time researchers have little understanding tion) (seeImage understanding;EdSedetection;Vision, early), of how all of this is accomplished and at what computational one is still far from being able to duplicate the human visual cost. system'sremarkable ability to detect objectsin the presenceof However, it is clear that image processing tends to be very large amounts of visual disguise. Nevertheless the ability to expensive computationally (e.g., initial analysis of an image identify objects within scenesis steadily improvirg, particu- often requires examination of between 250,000and 1,000,000 larly for scenescontaining only objects whose geometry and separate image pixels), so that substantially faster processors coloration is known in advance. Even if large parts of the than are now available may assist the development of this objectspresent are obscured,such scenescan be handled more very challenging subject. These processorsmay include speeasily than entirely general images (e.g., images of outdoor cial-purpose chips able to apply basic image analysis operascenescontaining shrubbery). This reflects the fact that the tions at high speed. problem of identifying known bodies and determining their Robot systems equipped with tactile sensors (qv) acquire orientation (the "model-based"vision problem) can be formu- "tactile images" of much lower resolution than visual images lated in entirely objective terms. In contrast, the problem of but can be analyzed using techniques like those applicable to imposing useful perceptual groupings on entirely general visual images. scenesis at least partly psychological;that is, to solve it one Recognitionof ContinuousSpeecD.The ability to interpret needsto match the functions of the human visual system well continuous speech (see Speech recognition; Speech underenough for introspection to serve as an accurate guide to the standing), that is, to hear continuously varying soundwave way in which a robot vision system will react to a scene. patterns generated by speakers of a familiar language and to Among the many methodsbecomingavailable for handling transform them into roughly equivalent symbolic sequences of the easier model-basedvision problem are direct matching of phoneme (qv) indicators (or into standard word spellings) is a curves having fixed geometric position on known object sur- basic capability of the human auditory and nervous system. faces;use of projective invariants of object silhouettes;probing The history of efforts to give computers a comparable ability techniques applicable for objectsknown to be presentedin one provides a nice illustration ofthe possibilities and difficulties of a finite number of allowed positions (e.g.,objectslying on a facing AI research focused on sensory areas. table top or conveyorbelt) or on which one or more characterisProcessing of speech begins with spectral analysis of an tic features can be reliably located;geometricreasoningusing impinging sound system to extract energ'y intensities in a forts to date have run aground on the computational cost difficulties outlined in the preceding section. This fundamental fact constrains the immediate perspectives of the field severely. Of course,the many intriguing techniques developed during 20 years of AI research do not lack application; indeed, their applications can be expected to grow steadily in scope and number. However, in the absenceof any unifying principle of self-organizatrort,these applications must be seen as adaptations of diverse ideas rather than as systematic accomplishments of a still mythical AI technology.The successof such applicationsstill dependsfar more on clever specialalgorithms and on code reflecting particular application content than on use of the still impoverishedgeneral-purposetools of AI. Moreover, since specialization is still generally vital to success,it is hard to charactertze the extent to which success in any one application should be read as representing advance of the AI field as a whole; to the degree that an application comesto depend on special techniques,data layouts, and algorithmic approaches,one can no longer rightly regard it as evidence for the viability of a general approach distinguishable from artful programming in general. Nevertheless,some of the more specialtzedresearch efforts inspired by the more general aspirations and notions of AI researchhave succeeded modestly in mimicking lirnited but interesting aspectsof mental capabilities such as vision (qv) and natural-langu age understanding (qv). To clarify this assessment,the present status of work along various significant lines is summaruzedbelow. It is useful to arrange this work under three main headings: sensory functions, motor control, and reasoning. More detailed entries on the various areas reviewed should also be consulted.
494
TIMITSOF ARTIFICIATINTELTIGENCE
range of frequency channels. These intensities define a family of physical parameters of the impinging speech signal that vary continuously through time and hence allow the received signal to be regarded as a continuous curve Co(t) in ne-dimensional space,where no (typically having a value in the range 5-20) is the number of distinct energy intensities (or other physical parameters of the incoming sound) extracted. These initial parameters can then be supplementedby adding various derivatives, smoothedderivatives, or other locally defined time-invariant functionals as additional parameters to produce a modified continuous curve Ct(t) that has a somewhat larger numbet r\ of parameters as an improved description of the incoming signal. This description can in turn be subjected to appropriate nonlinear transformation to normaltze it for such speaker-dependentvariables as pitch of voice, speech rate, and regional accent.This yields a parametertzedmultidimensional curve C@ suitable as input to the next, more symbolic steps of processing. The necessary transition to a symbolic stage of processing can be accomplishedin a variety of ways. A typical technique is to divide the n-dimensional space E', through which the curve C(D runs, into a collection of overlapping regions R1, . , R*, each of which correspondsto one of the basic phonemes p; recognizedas belonging to the language to which a given utterance belongs. Passage of the curve C through a region R; is then regarded as indicating that the corresponding phoneme pj has been pronounced. Since the regions Ri can overlap, the specific phoneme being pronounced (or, more properly, "heard") at any given moment is somewhat ambiguous. [Instead of phonemes,the basic symbolsinto which C(r) is (ambiguously)converted can be larger speechunits, e.g., full syllables, or syllable fragments consisting of a consonantpreceding or following a vowel.l Numerical probabilities for the presence of any given phoneme (or other primitive symbolic element) must therefore be computed,for example, by using a smoothly varying function fi positive only within .R;and then forming fj(C(t)) instead of the simpler Boolean quantity C(t) e Ri. This con'oertsthe incoming acoustic signal to a sequenceof the form
, t*9'),.
(2)
whose successiveelements are sets of symbols sj designating phonemes(or syllables, or syllable fragments). The members of each such set represent all the phonemes that designate sounds close enough to the incoming signal during a specific instant of time that they might have been pronouncedduring that instant. What remains is to disambiguate the sequence into a final perceived phoneme string
' succ..,Jl#;*.:#';,r; orwhose each
(3)
. .beronss tothe
correspondingset in the sequence(Eq. D.As cleverly pointed out by Cocke (6), this is like the problem of decodingan English-Ianguage message that has been ambiguously spelled out by dialing it on a standard telephone dial and by transmitting the resulting digits only (note that each digit transmitted then refers ambiguously to one of the three possible associated letters). Such disambiguation must of course rest on other knowledge concerning the phoneme (or syllable, or syllable fragment) sequencesthat can legitimately occur in the lan-
guage to which the utterance belongs. Several approachesto this goal are possible, among them: 1. One can use some form of (possibly multilevel) grammar to define the set of all allowed word sequences,and from this the set of all syllable, demisyllable, and phoneme sequences,which are legal (or probable) in the language of the utterance being analyzed.The computational problem then becomesthat of finding the grammatically valid phoneme sequencesequences(Eq. 3) consistent with the ambiguous input sequencedescriptor (Eq. 2). 2. One can proceed(at least for some of the phonemic or syllabic levels that would otherwise have to be describedby formal grammars) in purely statistical fashion. This can by done by regarding utterances in the language to be anaIyzed as outputs from a Markov source,whose characteristics can be ascertained by collecting data on the frequency with which a given phoneme follows a preceding sequence of one, two, or more known phonemes.Then the most acceptable interpretation (Eq. 3) of the ambiguous input sequence(Eq. 2) canbe defined as the most probable sequence consistent with (Eq. 2) and can be calculated by some dynamic programming procedure [e.g., the Viterbi algorithm (qv)l. A probabilistic approachof this kind can make good use of numerical measuresof likelihood associatedwith the various alternatives appearing in each of the sequences constituting (Eq. 2). Substantial research efforts mounted during the last few years have significantly increasedthe speedand robustnessof interpretation techniques of the kind just described. VLSI chips able to accomplishthe initial (analog) stepsof processing (spectral decomposition, signal filtering and differentiation, signat normahzatton) and perhaps even generation of the first Ievel phoneme stream (Eq. 2) should soon be available. The interpretation problem is eased substantially for words spoken in isolation, since in this case direct matching techniques able to span a word's whole extent are computationally feasible; devices capable of recognizing vocabularies of several hundred isolated words are already available commercially. For continuous speechthe stiil very onerouscomputational cost of disambiguating Eq. 2 into Eq. 3 has forced researchersto concentrate much of their attention on various heuristic (qv) schemesfor reducing this cost.However, neither powerful algorithms for accomplishing this efficiently nor any real analysis of the inherent computational difficulty of the problem is yet available. Techniquesapplied have ranged from systematic use of probabilistic techniques or relaxation-labeli"g ideas to entirely ad hoc schemesfor combining clues detected by muttiple-interpretation processesacting at numerous phonemic, syntactic, and even semantic levels' Neverlheless, one may be entitled to feel a certain optimism concerning continued progress of work in this area since the inherent time sequenceof the signals being analyzedis powerfully instrumental, as are the greatly increasedlevels of computational power made available by VLSI technology' Ana/ysisof Natural Language.simple formal grammars (e.g., context-free grammars) of the kind used to define the structure of programming languages serve remarkably well to define the basic structure of natural-language syntax. However, natural langu ageadmits a fat gfeater variety of syntactic constructs, special usages and idioms, fragmentary and semigrammatical usages, and specialtzed sublanguages and
LIMITSOF ARTIFICIALINTELLIGENCE
!-- ^h Jargonssuc as doctor's English, criminal argot, and Jivetalk. Compared with artificial language, natural language appears as an overgrown jungle whose effective description, even at the purely syntactic level, requires grammars whose symbols carTy many kinds of attributes (e.g., "count noun," "animate noun"), treat many words in special ways, and require elaborate, sometimes explicitly procedural handling (7). In spite of decadesof work on computational linguistics (qt), one is still far from possessingany computerizednatural-language analysis system that can deal robustly with the very wide range of phenomena (especially errors, nonstandard forms, and sentence fragments) appearing in informal English, or handle transitions to spectalizedsublanguagesflexiblY, or resolve ambiguities at all well. Although the boundary between syntax (see Parsing) and semantics (qt) is elusive since some of what is normally considered the semantic content of a language can be captured by refining the treatment of its syntax, attempts to treat the semantics of natural language automatically confront AI research with problems far deeper and seemingly less tractable than those of syntactic analysis. The semantics of a language imbeds its set of grammatical sentencesin a framework supporting some useful degree of formal or informal deduction going beyond the purely syntactic, thus making it possibleto use the overt text of a discourseto deducefacts not explicit in the text (see Discourse understanding; Inference; Reasonirg). Moreover, some combinations of grammatical sentences will be semantically disallowed, allowing certain otherwise ambiguous sentences to be disambiguated on semantic grounds. For example, without semantics the sentence"I noticed a man on the road wearing a dark hat" might admit an interpretation in which the road, rather than the man, was wearing the hat, as in "I noticed a man on the road leading to the North end of town." Semantic relationships allow resolution of many other ambiguities that natural-language syntax allows, for example, ambiguities of quantifier ordering ("A woman gives birth in the United States every 5 minutes") and anaphora ("John bought his groceries in several adjoining small shops.They cost 20 dollars.") Finally, semantic relationships serve to tie the successivesentencesof a continuing discourse together. Any fully satisfactory formalization of the semanticsof natural language must provide some way of resolving all the following very challenging probleffis, plus others: 1. It must provide a framework accommodatinga wide variety of informal deductionsthat go beyond the kinds of rigorous deduction allowed in mathematics. Among other things, one needsto allow controlledrelaxation of normal semantic restrictions in order to accommodateunusual sentences like "The long road and the slender tree sat around the wizard's table talking. The road was wearing a dark brown hat" in texts recognizedas fairy stories, even though roads wearing hats are ordinarily disallowed semantically. Note that it must be possible to handle such sentences even when they occur unexpectedly within ordinary text.
plans, knowledge, beliefs, and motives must also be provided, to say nothing of social phenomena such as embarrassment (seeBelief systems;Planning). 3. Inference within a semantic framework must generally be quite efficient, €.g., so that fast inferences can be used to disambiguate syntacticalty ambiguous sentencesand/or to resolve anaphoric referencesin Iengthy text streams. At present one has little idea of how to treat most of these issues, which collectively reach to the heart of the AI enterprise. For example, no "probabilistic" or "fuzzy" formalism beyond the well-defined but rigid semantic area mapped out by propositional and predicate logic (qv) has as yet demonstrated advantages sufficient to win it general acceptance.Moreover, the basic problem of what primitives a semantic formalism should use is surrounded by deep and ill-fathomed questions. One possibility is to somehowsimplify the capture of information concerning the very many conceptsappearing in naturallanguage discourse by reexpressing them in terms of some much smaller family of simpler primitives whose properties can then be expressedby a significantly smaller set of rules. (This simplification would in effect require finding some way of extending the analytic reductionism characteristic of theoretical scienceto the entire range of phenomenathat natural discourseaddresses.)Any expectation that this can succeed easily is discouraged by consideration of the slow pace with which sciencehas previously advancedinto entirely new fields and on the enormous computations sometimesrequired to apply general scientific laws to particular concrete cases.The opposite approach is to somehowbuild a semantic formalism that can handle the very many terms appearing in natural language as unanalyzed primitives it relates to each other by comprehensivesets of axiomlike formulas. Belief that this approach can succeedeasily or rapidly is discouragedby the formidable difficulties of steering proofs in predicate calculus systems that try to deal with more than a dozenor so carefully crafted axioms. Measured against these deeply rooted probleffis, existing techniques for dealing with natural-langu age semantics appear sketchy indeed. Semantic network systems attempt to organize the enormous variety of objects and predicates appearing in ordinary discourse by representittg them as nodes in graphs whose edgesrepresent various logical relationships felt by their proponentsto be particularly fundamental to common elementary inferences.For example, such edgesmay connect nouns A and B whenever A is a "kind of" B (e.g.,when A ((B" is "mammal") or when A is a "part of" B (e.9., is "man" and when A is "atm" and B is "man"). A secondaim of schemesof this sort is to accelerate simple semantic deductions by making the information they require directly available through short chains of pointers and by grouping related information neededfor the most common types of deduction under appropriate headings. The feasibility of attempts of this kind could only be demonstrated by exhibiting at least one readily extensible system able to cover some extensive domain of practical knowledge robustly, something no one has yet done successfully. Though many other mechanisms for systematizrng the semantic content of natural language have been proposed, it may perhaps suffi.ce to subject just two of these, namely Schank's t977 "conceptual dependency"(qt) (8,9) schemeand the "frames" proposal of Marvin Minsky (seeFrame theory), to
2lHl?il";":| ;ffJ:##,H':lilTi,T:"lH#1;.'lil: course, including all the commonsense facts of naive physics concerning such categories as "above" and "below," "inside" and "outside," "big" and "Iittle," and so on. Automated means for reasoning about such elusive matters as
495
496
TIMITSOF ARTIFICIATINTETTIGENCE
critical review. Schank's schemerepresents an attempt to reduce the myriad elements appearing in ordinary discourseto a much smaller set of semantic subcategories.It is not inconceivable that such an attempt should yield someuseful degree of syst ematization, even though a pessimist might view it as a futile effort to enlarge the applicability of scientific modeling by casual invention of a classification scheme.The categories proposedby Schank include "acts" (essentiallyverbs, which it is proposedto further subdivide as variants of purported primitive acts such as "propel," "ingest," "expel," "speak," etc.), "picture producers" (essentially nouns), "times," "locations," and so on. A related aim here is to classify all the inferences that attach to entities of these proposedsemantic categories. Minsky's "frames" and the associated"scripts" (qr) proposedby Schank define a more general (but accordingly more empty) framework for organizing commonsenseknowledge in a stereotypedform. Minsky proposesto classify all the logical entities (e.9., nouns) that can appear in a semantic network system into (a possibly large number of) fixed categories.With each such category a Minsky frame associatesa fixed-format record layout listing all the attributes an item of the given category might have together with all the values or categories of values each particular attribute can assume.For example, the frame for entities of category "restaurant" might have a "type" field with possible values "cafeteria," "full-service," "full-service-with-hostess,"and so on; a "food-style"field with possible values including "fish-and-chips,""Mexic dfr," "Chinese," "Thai," "Seafood,"and so forth. Categoriescan be deg categories, fined to be specializationsof more encompassin whose attributes they inherit; certain of the attributes of a category can be optional. Schank proposesto include records of another fundamental kind called "scripts" in semantic systems.Theseare to be used to describe categories of activity (rather than of objects, as with frames). Basically they list sequencesof subactivities, which can in principle be conditional on specifiedconditions. Frames and scripts are tied together by the fact that a script can specify the kinds of objectsexpectedto appear in the activities it describes (by including pointers to the corresponding frames), whereas the frames describing an entity type can reference scripts describing the activities typically associated with these entities. Taken per se, this mechanism is little more than a way of organizung some aspects of the data with which full-fledged semantic inference systems will have to deal, and does not answer the questions of how such an inference system is to be created,otry more than the inclusion of vaguely similar record types in programming languages such as Pascal and PLll answers the question of how to write complex compilers or symbolic manipulation systemsusing these languages.However, it can also be read as suggesting a semantic interpretation scheme having something of a "higher level syntax" flavor. Specifically, Schank's scripts can be viewed as higher level grammars defining a language of semantically plausible sentence sequences(whose rudimentary elements are clauses or other sentence fragments, already preparsed in some more standard syntactic sense). This "grammar" of scripts would allow much "nulling" of script elements, but then by using such a grammar to "parse" a text and immediately "unparsing" the result, with element nulling forbidden, one can hope to make explicit certain simple but very useful classesof normally implicit inferred elements. (Since grammars that allow large amounts of nulling tend to interpret given texts in
highly ambiguous fashion, application of a schemeof the sort describedmay depend on a rule that prefers the "shortest" or "simplest" semantic script-parseof a text to all others. Such a rule would amount to requiring that only those implicit elements necessary to a text's semantic interpretation could rightfully be inferred. Alternatively, the scripts driving the semantic interpretation processcould associateprobabilities with each elementary interpretation step, and somerule defining "most probable" interpretations could be used.)A "grammar of scripts" used in this way will necessarilybe context dependentsince semantic connectionswould have to be maintained between elements (e.g., explicit or implicit nouns or pronouns) recognized at one point of a text and matching occurrenceselsewhere.Hence parsing accordingto such a grammar might come to resemble the very inefficient processesof computational logic much more than the relatively efficient processesof ordinary syntactic analysis. It would be easier to take such rationalizing suggestions seriously if straightforward formalisms had been proposedfor use in this area, if the available formalisms had proved applicableoutside of very limited contexts,and if someinitial analysis of their computational cost were available. Though the literature contains many heuristic suggestionsand computational schemes,none of them seemsas yet to have gained any general degree of acceptance. This brief review of the difficulties that confront attempts to automate natural-language understanding (qv) underscores the wisdom of Turing's 1950 suggestionthat ability to conduct natural-seeming conversationsshould be regarded as a touchstone of progress in AI (see Turing test). Existing semantic analysis systems are fragile laboratory constructionsthat can deal only with narrowly restricted subject domains.The mechanisms thus far suggestedas bases for more comprehensive semantic systems are aII quite primitive. Since the problems with which they must deal seem to encompassalmost the whole subject matter of AI, only slow progress can be predicted. Motor Control, Modeling of Spatial Environments,Motion Planning.Review of these topics illustrates the point that areas of AI to which classical scientific and algorithmic techniques apply can be expected to progress more rapidly than areas that deal with deeper problems for which only less focused approachesare available. Many of the capabilities reviewed in this section are being explored in connectionwith industrial robotics (qv). Since many of the problems encountered are technical rather than fundamental, it is reasonable to expect steady progress,at a rate largely determined by the resourcesbrought to bear. However, it should be noted that work in this area creates very challenging problems of software systemsintegration, involves a complex mix of technologies, and is quite expensive.Studies in other areas of AI, such as computer vision (qv), may raise similar practical problems as they advance toward maturity. Researchin motor control aims to devise robots capableof exerting sophisticatedhybrid force and positional control over grasped objects and to construct robots that can walk, run, leap, and climb (seeAutonomous vehicles;Manipulators; Robots, mobile). Typical problems of manipulation are to tie a knot in rope, to thread a nut of imprecisely known shape and pitch onto a bolt, and to pick up a jumbled sheet of cloth and fold it neatly. Techniques adapted from conceptspresently beIonging to nonlinear control theory (which should be consider-
INTELLIGENCE 497 TIMITSOF ARTIFICIAL ably enriched by contact with robotics) should make sophisticated manipulation of rigid objects possible during the next few years. To do this, much work on such classicaltopics as the frictional and elastic reactions of bodies in contact will be required. Dynamic robot control, such as is involved in walking o1.1"rrtning, should also progress steadily over the next few years. However, this will require close study of the complex physical situations created as motor-actuated mechanisms having various geometries and dynamic behaviors enter into repetitive contact with supporting surfaces. The problems of dealing with nonrigid objects (e.g., cloth) are much less understood,and one lacks even a vocabulary for describing some of the basic operationsinvolved. How, for example, is a robot to find the edgesof a hanging sheet of cloth preparatory to folding it? Roboticists have not yet begun to grapple seriously with such probleffis, and it is not now understood whether these will permit of uniform attacks or require development of special analyses and approachesin a large number of different cases. With a few experimental exceptions,today's robots do not maintain any systematic internal model of their environment; the environment is typically known to them only as a sourceof tactile or visual interrupts, all senseof external-objectidentity being lost as soon as a graspedobject is set down or passesout of sight. To develop any deeperunderstanding of the environment, robots will require far more sophisticatedenvironmentmodeling software than is now available. Although the basic principles required for this are largely available from classical physics and geometry, it remains a considerablechallenge to devise algorithms capable of performing the required computations with acceptableefficiency. For example, even though the fields of computational geometry and geometric modeling have developedvigorously, there is still a lack of algorithms able to perform such basic operations as detecting intersections between curved surfaces rapidly. More sophisticated modeling operations are needed(e.9.,simulation of the paths along which one model object will roll or slide along a given surface) and of the frictional or other forces involved in such motions. These raise yet another range 6f probleffis, directly significant for AI, which are bound to tax the best efforts of numerical analysts, geometers, and students of mechanics. Doubtless much can be done here, but there is little reason why these problems will advancemore rapidly when viewed as problems of AI than they would when viewed as problems in geometry and mechanics. In particular, although some AI researchershave hoped to construct a semisymbolic"naive physics" (qv) that could calculate the qualitative outcome of common interactions between physical bodies more cheaply than is possibleby detailed physical/geometricmodeling, this idea is still in altogether too rudimentary a state for fast successto be likely. Considerableattention has focusedrecently on the problem of motionplanning for robot-controlledbodiesmoving in obstacle-filled environments. The problem here is to determine whether one or more objects of known shape moving in an environment containing obstaclesof other known shapescan pass from one specified position to another without colliding either with the obstaclesor with each other. In variants of this problem the obstaclesmay be moving and the controlled objects constrained to move at bounded rates or with bounded accelerations;or the geometry of the obstaclesmay be known only in part (but then sensorsable to detect object proximity must be available); or it may be required to calculate shortest,
or fastest, or most energy-efficient paths. Recent work along geometric lines has begun to elucidate this circle of problems, but doing so has required development of steadily more subtle algorithms drawing heavily on the computational geometer's bag of tricks. This is clearly an area of AI research that has advanced by moving closer to other more traditional areas of science.Such work suggeststhat at least for the present it may also be easier for other branches of AI research to progress in this relatively conservative fashion than by relying on the seemingly more general, but often more vacuous, symbolic search (qv) methods traditionally associatedwith the AI field. Reasoning,Planning,KnowledgeRepresentation, Expert Systems. Workers in AI have explored many formal schemesthat promised to produce useful structures automatically from less structured input. These have included graph search,the predicate logic (qv) mechanisms reviewed earlier, rule-based systems (qv), and the sequencing schemesused as inference engines in expert systems (qv). The most common methods of this sort are reviewed in the following paragraphs. Attempts to apply any of these schemes wholesale have invariably been defeatedby the same combinatorial explosion that makes universal application of predicate logic techniques infeasible. Graph Search. Many problems can be reformulated as the problem of finding a path between two known points in a graph. Planning and manipulation problems, both physical and symbolic, illustrate this. Such problems are described by defining an initial condition with which manipulation must begin, sometarget state or states that one aims to reach, and a family of transformations that determines how one can step from state to state. The problem of chemical synthesis (seeChemistry, AI in) is an example: The target is a compound to be synthesized, the initial state is that in which easily available starting substances are at hand, and the allowed manipulations are the synthetic reactions known to the chemist. The problem of symbolic integration is a second example: Some initially given formula F containing an integral sign defines the starting state, any formula mathematically equivalent to ,F' but not containing an integral sign is an acceptable target, and the transformations are those that calculus allows. In all such problems the collection of available transformations is a heap of relatively independent items that can be expanded freely. Hence, the construction ofa path through the graph defined by a collection oftransformations doesrepresent a situation in which structured entities, namely paths, arise via simple and uniform rules from something unstructured, namely collections of transformations. Early in the history of AI it was hoped that this construction could serye as a universal principle of self-organization. However, subsequentexperience has repeatedly shown that the size of the graphs needed to represent significant problems in this way can be astronomical, making brute-force search infeasible. To do better, some form of "guided" or "pruned" search must be used. Guided search might involve use of some auxiliary heuristic scoring mechanism (see Heuristics) able to predict the distance to a desired target fairly accurately without the precise path to the target being known. Another possibility is to generate some not fully accurate "roughed-out" preliminary path or plan, and then to try to produce a fully valid graph path by using this rough plan for guidance. No method for making either of these techniques work at
498
LIMITSOF ARTIFICIALINTELLIGENCE
all robustly has yet been developed. A perfectly accurate means of calculating the distance between an arbitrary graph node g and a desired target node f is mathematically equivalent to an algorithm for finding the shortest path to t from any such g. Hence, one can hardly expect such functions to be available except for problems specializedenough to be subject to completemathematical analysis a priori. Experienceseems to show that human attack on substantial problems,especially in problem domains that are all familiar, involves reaction to so extensive a range of problem and context features as to bar capture by any straightforward scoring heuristic. Guidanceby use of rough preliminary plans is frustrated by the present inability of computersto use any adequatenotion of similarity in combinatorial domains. In addition, the fact that the number of transformations potentially available, and hence the probability of having to search an exploding number of formal possibilities, tend to rise rapidly once partial solutions and means for amending them are allowed into a problem context. Pruned search involves either the use of problem symmetries to prevent wasteful exploration of graph paths that have already been searchedin some equivalent form, or the use of auxiliary rules able to predict that a given graph edge need never be traversed because no path involving this edge can reach the desired target node. Although such ideas have proved useful, intractable combinatorial searches generally remain even after such notions are applied, except in particuIarly fortunate cases where treatment amounts more to the use of special high-efficiency algorithms than to application of any very general "artificial intelligence" approach.Moreover, because of the featureless generality of graph-theoretic notions, the formulation of such problems in graph-theoretic terms tends to conceal rather than to reveal opportunities for search pruning. For all of these reasons, belief in the efficacy of entirely general graph search approaches has largely disappeared among AI researchers, even though graph-based techniques continue to be valued for their generality. Computer-managedplanning (qv) in AI contexts is generally accomplishedby reduction to some type of explicit or implicit graph search. The computer maintains internal models of the various situations ("states") that would arise as the result of its tentatively planned actions. These states are treated as the nodes of a graph whose edges are the actions that could lead from state to state. Since a path through such a graph has then an obvious interpretation as a planned sequence of actions, plans can be generated by specifying an initial and a final state (or by specifying attributes that define an acceptable final state) and by finding a path connecting these two states. As in all graph-theoretic situations, this method works well if the graph that needs to be searchedis relatively small (e.9.,consistsof no more than a few thousand nodes). For example, all sorts of simple "monkey-and-bananas" puzzles can easily be solved by this method. On the other hand, application of this method to more serious planning problems is often infeasible becausethe graphs involved (explicitly or implicitly) are enormous.As an example of this, consider the simple "nines puzzle," which consists of eight square piecesin a 3 x 3 frame to be moved between specified configurations. Here the graph of states consists of 9!, or 362,880, so even for so simple a problem brute-force graph search begins to becometaxing. For the corresponding4 x 4 puzzle,whosestate spaceinvolves 16!,or over 10t3,nodes,it is completely infeasible.
PredicafeSysferr?s. Attempts to generate proofs from collections of mathematical axioms and lemmas by systematic transformation of sets of formalized statements can be regarded as a specializedform of graph search.This is a domain in which heuristic guidance techniques (e.g., rules favoring short formulas over long or formulas differing little from a target formula F over formulas very different from F), problem symmetries, and search-pruning methods have been very extensively explored. Among these are 1. the basic resolution (qv) technique, which efficiently handles instantiation of the variables in a set of predicate clausesby making only those substitutions that arise from someclash between elementary clausesinvolving two identical predicates where only one is negated; 2. still more highly pruned variants of predicate resolution, applicable to sets of statements of particularly favorable form (e.g.,to collectionsof Horn disjunctions,i.e., those in which at most one predicate term occurs with a positive sign in each disjunction of the collection,all other disjoined predicate terms occurring negated); 3. resolution variants (e.g., pararrLodulation)that treat certain important operations (e.g., the equality operator) in special, particularly efficient ways; and 4. more specializedresolution-related schemes,for example, algebraic identity manipulation systems like that introducedby Knuth and Bendix (10), which exploit the special properties of statement sets consisting exclusively of equations. Beyond these relatively general techniques,researchershave devised a growing assortment of decision algorithms for various branches of mathematics, for example, the Tarski decision proceduresdiscussedabove,decision algorithms for purely additive integer arithmetic (Presburger),decisionproceduresfor the purely Boolean theory of sets (Behmann), the elementary unquantified theory of sets allowing the membership relator and the powerset operator, and various elementary parts of analysis, topology and geometry. However, the general theorems describedin Limits Set by Quantitative Theory of Computational Complexity (above)restrict the utility of all these techniques by asserting that their computational cost must always rise prohibitively with modest enlargements in the general classes of statements with which they deal. For example, rule-of-thumb estimates concerning typical applications of the popular and very general resolution (qv) technique often indicate that even after pruning and even if one starts with just 10 or so initial statements, something like a three-way branching in the possiblepattern of operations can be expectedto occur at each elementary inference step. It follows that discovery of a proof involving L4 successiveelementary stepsmay involve searchof as many as 314(-5,000,000) nodesof a tree of possibilities,& computation lying at the outer bounds of feasibility. Moreover, the branching ratio 3 appearing in this illustration can be expectedto rise either if the proof to be developedstarts with a somewhat Iarger set of initial statements (i.e., of "axioms" or "hypotheses")or if structurally speaking this set of statements is exceptionally powerful (in the senseof allowing highly varied inferencepatterns, as, e.g., in the caseof the axioms of set theory). It therefore seems likely that fundamentally new ideas will have to be discoveredbefore even the best known methods of
LIMITSOF ARTIFICIALINTELLIGENCE
this type becomecapable of producing proofs of as many as 20 elementary steps. All this is to say that even the best formal Iogic manipulation techniques presently known still lack the human mathematician's uncanny ability to produce long and complex proofs by expanding a simple heuristic notion into a relalively undetailed and probably not entirely accurate proof sketch, which is then further expanded and amended into a fult and accurate final Proof. Without such an ability, it may remain impossible to integrate the growing collection of known logic manipulation techniques into a general tool capable of routine application to a broad variety of symbolic analysis or synthesis problems. Moreover, this basic limitation must also be read as a limitation on the power of all other known symbolic manipulation techniques that are general enough to be relevant to the very fundamental problem of constructing formal mathematical proofs. ExpertSysfems.Many of the most active current attempts to commercialize ideas drawn from AI research have focused on so-calledexpert systems(qv). Since systemsof this kind are very much less general than deeper symbolic manipulation systems which aim at more significant levels of self-organization, such as predicate logic or graph searchsystehs, there is a much better chance of bringing them to acceptableefficiency levels. Expert systems typically concern themselves with small fixed sets of assertions relevant to a limited subject domain within which they aim to make simple but useful deductions. For example, the goal of a medical expert system might be to arrive at one of a finite number of possible conclusions drawn from a list such as "Penicillin should be prescribed," "Streptomycin should be prescribed,"and so on, possiblysupplemented by one or more explanatory diagnostic conclusionsdrawn from a list of possibilities such as "The bacterial agent of the disease is pseudomonss,""The bacterial agent of the diseaseis salmonella," and so on (see Medical advice systems).The internal core of such a system, its so-called inferenceengine,ordinarily deals only with elementary statements of fixed form drawn from a finite list of possibilities. In the medical example, these might include "inflammation is present," "fever is present," "the symptom site is lower abdomen," "the white-cell count is elevated," and so forth. Typically, expert systemsregard such assertionsas unanalyzed logical atoms subjectonly to elementary propositional manipulation or perhaps some elementary form of probabilistic manipulation, rather than to any more penetrating predicate reasoning. Hence the "expertise" the system embodies is actually expressible by a collection of straightforward propositional or probabilistic rules in which the elementary assertionsrecogntzedby the system appear as indivisible units, for example, "If inflammation is present and the white cell count is elevated and the bacterial agent of the disease is salmonella, then streptomycin should be prescribed." In more sophisticated expert systems,which supplement inference rules of this bald propositional form by allowing probabilistic rules, the inference engine will associate some"probability" or other numerical scorerather than a simple Boolean truth value, with each of the elementary statements it recognizesand with each of its inferences. The assertions manipulated by such systems typically divide themselvesinto three subclasses: 1. final conclusions,of interest to the end-user of the system, which are to be confirmed or rejected;
499
2. elementary items of evidence,concerningwhich the system queries the user interactively; and 3. intermediate assertions,which play an internal role in the inference engine's logical manipulations, but which can be externalized when the system is called upon to explain its remarks or deductions. The system queries its users progressively concerningall relevant elementary evidence items (1) and employs the answers supplied to draw elementary Boolean (or somewhat more sophisticated probabilistic) conclusionsconcerning intermediate propositions (3) and final propositions (1). Type 1 propositions are what the user wants as system output and are presentedto him in appropriate form and sequence. The most rudimentary systems of this kind need not differ much from those questionnaires, familiar from popular magazines, which ask their readers to answer yes or no to a list of fairly obvious questions, each of which contributes a score of so-and-so-manypoints plus or minus to the outcome of some such query as "Rate yourself as a parent." However, a substantial level of function can be hung on these rudimentary frameworks: Expert systems can include attractive natural-language and/or graphic interfaces (see Natural-language interfaces). Instructions for canying out any diagnostic proceduresor tests required to answer queries of type 2 can be stored in such systems and made available when the system user is asked the corresponding questions. Speciahzededitors, databases,visual aids, and modeling systems relevant to a system'sapplication domain can also be provided. Questions can be cleverly sequencedrather than simply being asked in fixed order. If evidencealready supplied allows such a question to be answered either definitively or with high probability or if it makes a question irrelevant to the type 1 final conclusionsat which an expert system aims, the question can be suPpressed. A system's user can be allowed to ask how particular final or intermediate conclusionswere arrived at, in responseto which the system can display its internal Boolean or probabilistic deduction steps,along with the built-in rules justifyitrg these steps, in forms calculated to aid user comprehension. In some application areas special deduction rules or other symbolic manipulations going beyond the merely propositional will be possible. For example, an expert system oriented toward chemical synthesesor analysesmay be able to manipulate structural descriptions of molecules;an expert system dealing with electrocardiogramsmay be able to ingest raw cardiographic data and apply sophisticated spectral analysis or other pattern-matching (qu) procedures to it. The power of expert systems that include special techniques of this sort may rise substantially above the level attainable by primitive Boolean inference. Overall, one can say that expert systems enhance their pragmatic applicability by narrowing the traditional goals of AI research substantially and by blurring the distinction between clever specialized programming and use of unifying principles of self-organrzationapplicable acrossa wide variety of domains. This makes their significance for future develop-
5OO
LIMITSOF ARTIFICIALINTETTICENCE
ment of deeper AI technologies entirely debatable in spite of their hoped-for pragmatic utility. Knowledg" Representation.The phrase "knowledge-based system" has becomepopular among scientists seeking to apply AI research,and the associateddictum that "finding appropriate representations of knowledge is one of the most basic problems of the AI field" (seeRepresentation,knowledge)has often been propounded. Unfortunately, it is hard to identify any data structures created by the AI research community that are other than superficial. Aside from clever internal implementations of such languages as LISP (which no one would consider knowledge representation in any specificsense),no structures more advanced than simple pointer networks seem to have been proposed. Of course, such networks are quite familiar from many other applications as "graphs" or simply "mappings." They involve nodes that are little different from the "records" of standard data processing.This contrasts strongly with other branches of computer science,in which many quite ingenious data structures have been developed.In these fields numerous successfulexamples have given the phrase "data structure desi grr" a mature technologicalmeaning; any way of storing one or more abstract data entities in a manner that significantly acceleratesthe speedwith which some well-defined battery of operations can be applied to these entities definesa significant data structure. Examples include B-trees, Avl-trees, Fibonacci heaps, compressedbalanced trees, and many others. The underlying aim of AI researchersin regard to knowledge representation is of course the same as that of other computer scientists, namely to find data representations that can be used to acceleratethe symbolic calculations that they would like to perform. However, progress toward this goal has stalled since no acceptableformulation of the abstract structures to be implemented or of the operations to be performed upon them has yet becomeavailable. The one possible exceptionis use of "semantic nets" (seeSemantic networks) for fast retrieval of items associatedwith other data items used as keys, a standard programming technique that AI researchactually has used in a manner no more sophisticatedthan is now common in database practice. Learning.As stressedabove,one of the most profound goals of AI is to make computers capable of learning (qv), that is, capable of using disor ganized information fragments to construct organi zed structures on which they can take action. Broad successwith this one point would be almost equivalent to full realization of the subject's aspirations. Unfortunately, almost nothing has yet been accomplishedtoward this bold goal. The disappointments encounteredare typified by the variety of schemesthat have been tried for allowing a computer to acquire the grammar of simple formal languages by exposure to sets of grammatical strings belonging to such languages. Various faintly encouraging theorems have been proved concerning the asymptotic convergenceof learning aIgorithms to a desired grammar given sufficiently large numbers of positive- and negative-sentenceexamples;however,the enormous number of candidate grammars that present themselveshave frustrated all practical use of this scheme.Related experiments include attempts to discoverthe simplest possible Boolean expression for a subset S of the set of all computer words of fixed length (whose bits can be thought of as representing true-false attributes of some class of objects or scenes).The input to such experiments are sets of positive and negative examples or information concerning "near misses" that can be given by stating the distance (measured in bits
wrong) of each sample word from the nearest member of S. However, beyond various fragmentary heuristics (qv), neither a practical approach to this problem nor any understanding of its inherent computational cost is available. Other more trivial data acquisition capabilities have been demonstrated and can be regarded as learning of a sort. For example,it is possiblefor a computer equippedwith an image digitizer to acquire pictures of objects successivelypresented to it, to calculate and store shape parameters for the boundaries of these objects, and subsequently to recogrrrzethe same objectswhen seen in other positions (at least, this is possible for favorable classesof objects).Perhaps this can be regarded as a rudimentary form of learning. Other techniques sometimes describedas automatic learning involve the use of dataderived statistics to adjust numerical parameters internal to a program. An even simpler possibility is to supply internal program constantsprogressively and interactively rather than all at program definition time. An example of this limited and artificial type of "learning" would be a string analysis program designed to be aware of the distinction of single characters (which it extracts internally from character data fed to it) between vowels and nonvowels but not told initially which characters are which. Such a program can trivially emit an inquiry about each newly encountered character, following which the character can be inserted into one of two internally maintained sets, making subsequent inquiry unnecessary. The reader may or may not wish to regard this as true learnirg, since in much the same senseone could view any menudriven program that elicits and stores information concerning its user's preferencesas a program that learns. A Commenton the Methodologyof Al. As might be expected of a young scientific discipline concernedwith new, profound, and enormously attractive problems, the methodologicallevel of researchin AI is often low. This contrasts with the situation in those other branches of computer sciencein which it has proved possibleto define reasonably specificand feasible computational goals in a manner independent of the techniques known at any given moment for trying to reach these goals. Where this has been possible,clear challengeshave comebefore algorithm designers (who then often have found sophisticated and sometimesquite unexpectedways of computing important quantities) and computational complexity theorists (who seek to clarify the options open to the algorithm designer by providing theorems concerning the minimum computational cost of particular operations).The systematic work flowing from this clarification of goals has substantially increased the maturity of other branches of computer science. Disappointingly, more primitive approacheshave persisted in AI research.Too many publications in this field simply describe the structure of some program believed by its authors to embody some function mimicking an aspect of intelligence, but aside from this having no definition other than the particular proceduresof which it consists.It is often impossible to determine just what such a program really computes, whether it does so with acceptableor catastrophic efficiency, or whether some other much more efficient technique might not have computed essentially the same thing. StiII more primitive but neverthelesscommon publications consist of tightly or heavily edited traces of someprogram's internal activity, accompanied by its author's comments on felt similarities between this activity and the author's personal theory of mental function, a form of report that often leavesits reader without much under-
INTELLIGENCE L I M I T SO F A R T I F I C I A L
501
of the conditional and then immediately terminating those processesthat correspondto failed conditions. After elimination of all iterations and conditionals, every program reduces to a sequenceof definitions of recursive procedures,and each such procedure reducesto a linear sequenceof simple assignments. It then becomespossible to regard any elementary assignment (e.g., r :- y + z) as an operation that tests some correspondingelementary relationship (i.e., x : y + e) and, if necessdty, assigns a value satisfying this relationship to any variable or variables appearing in the relationship and not Al and the Developmentof ProgrammingLanguages possessingany previously specifiedvalue. This has the advan- y * z can trigger either the As emphasized above, the most fundamental goal of AI re- tage that a relationship like r (tfr has no prior value) or y :- Jc- z (tf x search is discovery of principles facilitating the integration of assignmentrc:: y + z y not). If the call on a procedureP (or does prior but value structures. a has initialty fragmented material into useful organized from a call on P) is viewed as return properly, successful language prog1amming more the of aim This is also a fundamental designer, who seeksIanguagesthat make it easy to use small a kind of logical "conclusion" and the linear sequenceof stateindependent codefragments to define complexprocesses.Such ments At,. . . , An constituting the body of P is regardedas languages eliminate troublesome sourcesof programming er- the set of "hypotheses" of this conclusion,the definition of the ror and can increase proglamming speed very considerably. procedureP can be written as: For this reason, and becauseAI researchers have regularly @) At&Az&...&An+Po grappled with unusually complex programming probleffis, their work has been a particularly fruitful source of advanced This gives programs consisting of such proceduresthe flavor (though not the full reality) of sets of statements in predicate programming concepts. logic. The fact that multiple redefinition of a procedure or A few of the most significant ideas of this kind are worth proceduresis harmless in a language providing backtracking noting. The LISP language developedearly in the history of AI research introduced powerful means for defining entirely gen- reinforcesthe resulting resemblanceto predicate logic, since it eral and flexible data structures and, since these also could be allows "implications" of the form of Eq. 4to be inserted into a used to represent the particularly simple externals of the lan- program freely and in arbitrary number. Invocation of a backguage, provided an environment in which other still more ad- tracking procedure can simply create multiple parallel provanced programming languages could easily be implemented cesses,in each of which just one of the potentially relevant procedure definitions is invoked. These semantic reflections for experimental use. programmer underly the definition of the PROLOG language, which some to eliminate aims Rule-based progTamming concern with operation sequencingby allowing operations to AI researchers have recently come to view as a significant be executed whenever corresponding enabling conditions are addition to the older and better establishedLISP language. Though programming in one of the advancedprogramming met, for which purpose statements having approximately the reviewed in the preceding paragraphs is sometimes languages form describedas "application of AI technology," it should be realWhenever condition DO operation END tzedthat these languages only facilitate manual expression of procedural and declarative structures, but do not emcomplex are provided (seeRule-basedsystems). body any real principle of self-organization in and of them(qv) Backtracking simplifies the execution of complex exselves. Moreover, they all pay a price in efficiency for their plorations by allowing exploration to be routed along multiple generality: If used carelessly,all the most advancedof these parallel branches.The simplest way of providing this semantic languag€s, including the rule-based and PROLOG-like sysfacility is through a choiceoperation having somesuch syntactems, make it very easy to describecatastrophically inefficient tic form as computational processes.For this reason,the clean logical baONE-OFs sis of these languages is often disrupted by inclusion of irregular efficiency-enhancingmechanisms of very different flavor, where s is a set. When executedby a processp, this operation often making their effective use as full of pitfalls as ordinary can create as many independent copies of p as the set s has programming languages of lower aspiration. elements, and in each of these new processesa different eleSince the fundamental goals of AI research are far deeper ment of set s should be assignedas the value of the elementr. than those of programming language design, extensive eluciFinally, if the set s is empty when the ONE-OF operation is dation of its problems simply by design of some appropriate executed,the processp should be terminated, leaving sibling programming language is not to be expected. processes created by prior one-of operations to continue Automatic Programming.The term "automatic programexecution. Various AI languages more advancedthan LISP have em- ming" (qt) refers both to the fully computertzed generation of phasized use of these three semantic operations, plus others, programs from initial problem specificationsexpressedin enin various combinations. For example, in a language that pro- tirely abstract, logiclike terms and to automated improvement vides both recursion and backtracking, iterations (DO state- of program efficiency. (Efficiency improvement can be realized ments) and explicit conditional statements (IF statements)are by automatic transformation of less efficient into more effiboth superfluousfeatures. Recursionscan be used to reexpress cient algorithms or by automatic generation of detailed proiterations and conditionals can be expressedin terms of back- gram versions in efficiency-oriented programming languages tracking by creating a separateprocessto executeeach branch such as Pascalor Ada, starting from considerablymore concise
standing of what the program describedis really doing, or how it is doing it, or with what limitations. The unsatisfactory nature of alt this is frequently compoundedby the rudimentary syntax of the LISP notations in which such programs are commonly expressed,which readily confoundstrivialities with profundities. Until these signs of immaturity disappear, it will Le hard to regard the field as embodying much mature technoIory
502
TIMITSOF ARTIFICIALINTELLICENCE
"specifications" written in a programming language such as PROLOG or SETL having much higher semantic level.) Proofs in some ("intuitionistic") logical formalisms can be compiled automatically into (highly inefficient) programs. Ordinarily, however, the problem of generating programs from problem statements written in a formalism close to that of logic is very similar to the problem of generating proofs in logic automatically and hence is subjectto the pessimisticassessmentoffered at the end of the preceding subsection. Automatic improvement of program efficiency is a related problem that has attracted considerable attention, much of which has concentrated on the possibility of exploiting Iibraries of optimization tricks of the kinds most commonlyused by human programmers. One typical device of this kind is the use of formal differentiation,In this techniqueone keepsup-todate values of expressions,used within program iterations, that would otherwise have to be recalculated repeatedly at substantial computational cost;the expressionvalues required are then kept current by updating them, hopefully at substantially lower expense, whenever any one of their arguments changes. This is one of the most promising techniques for automatic program optimization at a very abstract level and can readily be seen to account for important aspects of the approach to manual development of efficient programs actually employed by programmers in many cases.However, systematic work on this method fty Paige and others (11)) during the last years has shown that effective application even of this particularly favorable approachraises too many deepproblems for its automatic application by any known method to be feasible. The difficulty is that even for programs that visibly fit the "formal differentiation" stereotype, efficiency improvement generally depends on knowledge of secondary logical constraints concerning possible program states at specified program points. These constraints are typically deep enough to defy automatic verification and complex enough for their full statement to discourage programmer involvement. Here again is a situation in which the computer's inability to deal efficiently even with intuitively simple sets of logical statements raises a significant obstacleto progress.Similar objectionsapply to other proposed techniques for automatic program improvement, many of which raise much the same problems of exploding combinatorial search of symbolic structures as are involved in automatic discovery of mathematical proofs;both the program texts that must be processedand the vocabulary of transformations applicable to such texts are considerably larger than the small examplesordinarily consideredin the researchliterature on automatic discovery of proofs. A consequenceof all this is that only very limited classesof transformations have found profitable application to automatic improvement of program efficiency. Normally such automatic optimization only pays for itself when a small number of relatively superficial techniques can be applied inexpensively to extensive computer texts, so as to eliminate wholesale inefficiencies introduced by prior steps of automatic processing, for example, by straightforward compilation or macroexpansion of source text. Program optimization of this practical form has more the flavor of large-scalesymbolic data processingthan with AI research (though partial affinity with someof the deepergoalsof AI researchcan be discerned).Even the intermediate-Ievel problem of automatically introducing data structures into program texts written in very high level
languages in order to raise program efficiency to levels that human programmers can routinely reach lies somewhat beyond one's present grasp. Moral Limits Successfulconstruction of artificial intelligences would affect the human environment profoundly. If artificial intelligences can be created at all, there is little reason to believe that initial successescould not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceedinghuman ability, or to generate plans and take action on them with equally overwhelming speed.Since man's nearmonopoly of all higher forms of intelligence has beenone of the most basic facts of human existence throughout the past history of this planet, such developmentswould clearly create a new economics,a new sociology,atrd a new history. Part of the opposition that certain humanist thinkers (L2-14) have made to the entire notion of AI stems from this fact. They express the amorphous unease of a much broader public. The fear is that the whole fabric of human society, which at times seemsterrifyingly fragile, ffiBy be torn apart by enormously rapid technological changes set in motion by AI researchas it begins to yield its major fruits. For example,it is possible to imagine that would-be dictators, small centrally placedoligarchies,or predatory nations could exploit this technolory to establish a power over society resting on robot armies and police forcesindependentof extensive human participation and entirely indifferent to all traditional human or humane considerations. Even setting this nightmare aside, one can fear a variety of more subtle deleterious impacts, for example, rapid collapseof human society into a self-destructive pure hedonism once all pressures,and perhapseven reasonsor opportunities, for work and striving are undermined by the presenceof unchallengeablyomnicompetentmechanisms. Certainly man's remaining senseof his own uniquenessmay be further impaired, and he may cometo seemin his own eyes Iittte more than a primitive animal, capable only of fleeting enjoyments. Successfulresponseto such developmentswhen and if they begin to acceleratewill require humanity to reaffirm its spiritual solidarity and to close ranks acrossclass, ethnic, and national boundaries. Conservative prudence needs to be combined with graceful and constructive adaptation to deep and rapid change. Once man is generally seen as an intelligent mechanism,and mechanismsas intelligent as man regularly flow forth from factories, what limits must be set to the manipulation either of man or his created mechanisms?What regulations and social assumptions will prove appropriate to a world in which work, except as hobby, has come to an end? Thesequestions,which even sciencefiction has as yet explored only occasionally, are likely to rush upon statesmen,philosophers, and theologians within just a few centuries (see also Social issuesof AI).
BIBLIOGRAPHY 1. H. L. Dreyfus,What ComputersCan't Do, Harper & Row,New York, t972. and Lan2. M. Davis and E. Weyuker,Computability,Complexity,
AND PERFORMANCE COMPETENCE LINGUISTICS, guages:Fund.amentalsof TheoreticalComputerScience'Academic Press,Orlando, FL, 1983. 3. A. Tarski, A DecisionMethod for Elementary Algebra and Geome' try,2nd,ed., University of California Press,BerkeleY,CA, 1951' 4. J. Ferrante and C. Rack off, The Computational Complexityof Logical Theories,Springer, New York, 1979' b. N. J. Fisher and M. O. Rabin, "Super-exponentialComplexity of -4I (L97q. PressburgerArithmetic," SIAM -AMS Proc. 7, 27 6. J. Cocke, IBM, Yorktown Heights, NY, private communication, 1980. 7. N. Sager,Natural Language Information Processing:A Computer Grammar of English and its Applications, Addison-Wesley,Readitg, MA, 1981. 8. A. Barr and E. Feigenbaum (eds.), The Handbook of Artificial Intettigence(3 vol.), Heuristech, Stanford, and William Kaufman, Los Altos, CA, 1982. 9. R. Schank and C. Riesbeck (eds.),Inside Camputer Understand' ing, L. Erlbaum, Hillsdale, NJ, 1981. 10. D. Knuth and P. Bendix, Simple Word Problemsin Abstract AIgebra, Computational Problems in Abstract Algebra (Proceedingsof an Oxford Conference), Pergamon, Oxford, U.K., pp. 263-297 , 1970. 11. R. Paige, Transformational Programming: Application to Algorithms and Systems, Proceedings of the Tenth ACM Symposium on Principles of Programming Languages, January 1983, pp. 73-87 . 12. M. Bod en, Artificial Intelligenceand Natural Man, Basic Books, New York, L977. 13. M. Brady, J. Hollerbach, T. Johnson, T. Lozano-Perez,and M. Mason (eds.) Robot Motion: Planning and Control, MIT Press, Cambridge, MA, 1982. 14. J. Weizenbaum, Computer Power and Human Reason: From Judgement to Cq,lculation,W. H. Freeman, San Francisco, CA, L976. General References D. Ballard and C. Brown, ComputerVision, Prentice-Hall,Englewood Cliffs, NJ, L982. M. Brady et al. (eds.),Robot Motion: Planning and Control,MIT Press, Cambridge,MA, 1982. C. L. Chang and R. C. Lee, Symbolic Logic and Mechanical Theorem Prouing, Academic Press,New York, 1973. E. Feigenbaum and J. Feldman (eds.), Computers and Thought, Krieger, Malabar, FL, 1981. J. Feldman, Memory and Change in ConnectionNetworks, Rochester University Computer Science Technical Report 96, Rochester, NY, December1981. F. Hayes-Roth,D. Waterman, and D. Lenat,Building Expert Systems, Addison-Wesley,Reading, MA, 1983. D. O. Hebb, The Organization of Behauior, Wiley, New York, 1949. D. H. Hubel and T. Wiesel, "Brain mechanismsof vision," Scientif. Arn. 24I, 150-162 (September1979). M. Jacobsen,DeuelopmentalNeurobiology, Plenum, New York, L979. E. Kandel, The Cellulq,rBosls of Beh,auior,W. H. Freeman, San Francisco,CA, 1976. S. Kuffier, J. Nicholls, and A. Martin, From Neuron to Brain, Sinauer, Sunderland, ME, 1984. J. Siekmann and G. Wrightson, Automation of Reasoning, Springer, New York, 1983. Y. Suen and R. De Mori, ComputerAnalysis andPerception,Vol. 3, C. Auditory Signals, Chemical Rubber Co. Press, Boca Raton, FL, 1982.
503
P. Winston, Artifi,cial Intelligence, Addison-Wesley, Reading, MA, 1984. P. Winston and R. Brown (eds.), Artifi'cial Intelligence: An MIT Perspectiue(2 vols.), MIT Press,Cambridg", MA, 1979. J. SurwARrz New York UniversitY
AND PERFORMANCE COMPETENCE LINGUISTICS, Dichotomy The Competence-Performance Languageand Speech.There is no conceptionmore central to linguistics than the dichotomy between "langu age" and "speech,"which was bequeathedto the field by the gleat turnof-the-century Swiss linguist de Saussure.The French words that Saussure used for language and speech,langue and pa' role, respectively, are still encountered today: Iangue represents the abstract system of structural relationships inherent in language, relationships that are held in common by all members of the speechcommunity, parole the individual act of speaking, which is never performed the sameway twice. SausSure compared language to a symphony. Langue represents the unvarying score,parole the actual performance,no two of which are alike (1). The langue-parole dichotomy was to give a modern rebirth by Chomsky under the names competenceand performo,nce, respectively (2). The competence-performancedichotomy lies at the center of "transformational generative grammar (see Grammar, transformational)" the linguistic theory introduced by Chomsky in the late 1950s, and today virtually all approaches to grammatical theory descendedfrom Chomsky's original work take the dichotomy as their starting points (see, e.g.,Refs. 3-5). In brief, competencerepresentsthe system of abstract structural relationships that characterize language, and performance the faculties involved in putting that knowledgeto use. Chomsky choseto coin the new terms, competence and performance, rather than stick with langue and parole since he wished to underscore two important differences between competenceand langue.' competencefor Chomsky encompassessyntactic relationships, despite Saussure'sconsignment of much of syntax to parole,' and competence is characterized by a set of generative rules, unlike Saussure's langue, which is no more than a taxonomic inventory of grammatical elements. LinguisticCompetence.Linguistic competenceis thus the name for the nonreducible core of language that forms the autonomous system characterizable by a formal grammar. Competenceis given an explicit charactertzation by a set of rules that generate the sentencesof the language and their associatedstructures. Such rules include those of the syntax (see Parsing), phonology (sound patterni.g) (see Phonemes), morphology (qv) (word formation), and those relating syntactic structure to semantic interpretation (see Semantics). Different competencemodels (i.e., different grammatical theories) differ as to the nature of these rules and their properties; however, the idea of a grammar as a model of competencecan be illustrated adequately by reference to the so-calledextended standard theory (EST) of the 1970s(7). In this model the derivation of a sentence involves, first, the application of the
504
LINCUISTICS, COMPETENCE AND PERFORMANCE
phrase structure rules, which produce the deep structure of the sentence-the level at which the basic syntactic relations between its elements are represented in their simplest form. Words, formed by the morphological rules of the lexicon, are inserted into the derivation at this point by rules of lexical insertion. The transformational rules then map the deepstructure onto the surface structure-the level at which the syntactic elements are arranged in their actual order of occunence. Rules of semantic interpretation apply to both the deep and surface structures to yield the semantic representation of the sentence.To complete the derivation, phonological rules convert the surface structure to the sentence'sphonetic representation. Figure 1 represents schematically the EST conception of linguistic competence. Many generative grammarians, followittg Chomsky, regard central aspects of competenceto be innately determined and thus true of language by biological necessity. "tlniversal grammdt," Chomsky's name for the innate componentof competence, embodies two types of universals: substantive and formal. Substantive universals are those constructsthat enter into a linguistic description, as for example,the syntactic categories ("noun," "verb," etc.) and phonologicaldistinctive features ("consonantal,""rounded," etc.). Formal universals are more abstract. They specify the formal conditions that every model of competencemust meet, including the nature of the rules that appear in grammars and the way that they may be interconnected.The "subjacency" constraint (3), which permits the derivation of sentence 1b from 1a but disallows the derivation of 2b from 2a, is an example of a formal universal: 1. a. You believe that John saw who? b. Who do you believe that John saw?
2. a. You believe the claim that John saw who? b. Who do you believe the claim that John saw? Competenceis often defined informally as a speaker's linguistic knowledge, but competencemust be used cautiously in that manner. For example, it is part of a speaker's"knowledge" that saying "I am hungry" eart convey a request to be fed (seeSpeechacts); likewise, one "knows" that it is proper to devoice one's vowels and consonantswhen speaking in a library. Nevertheless,such knowledge is not regarded as a property of competence,in the technical sense,since it is not statable within a system employing strictly linguistic primitives. The generalizations underlying such knowledge undoubtedly fall within the domains of cooperative communication and proper social behavior. LinguisticPerformance.According to the best-knowndefinition of performance, it represents "the actual use of language in concretesituations" (6). Such a definition is all-encompassing and would seemto include the use of language in interpersonal communication, in thought, in its entire social and cultural context, in literature and rhetoric-even in dreaming! However, in most technical discussionsthe term performance is used in a far more restricted way, and it is this restricted sensethat will be adopted here. The narrower conceptiondefines a theory of performance as a theory of grammar coupled with a theory characterizing the mechanisms for language processingin the production and comprehensionof sentences (e.g.,seeRef. 8). A theory of performancethus asks questions such as the following: How doesthe language processordraw on the intern alized mental grammar? Which grammatical levels (i.e., deep structure, surface structure, etc.) are most rele-
Phasestructure rules
LEXICON rules Morphological
Rulesof lexical insertion Semantic Rulesof semantic interpretation Transformational rules
Surface structure
Representation
Phonological rules
Figure
l. A model of linguistic
competence (conception of extended standard theory).
AND PERFORMANCE COMPETENCE L|NCUISTICS,
vant to processing?What are the independentprocessingcomponents of the language comprehension and language production systems?How do they relate to general cognitive mechanisms?What resemblancesand contrasts are there between the architecture of the two systems and what is the processing relation between them? What resemblancesand contrasts are there for the architecture of the processingsystems for auditory/vocal languag€s, for written language, and for visual/manual (signed)langu&ge,and in what ways is processing structure modality dependent? Generative g3ammarians typically explain complex linguistic phenomenain terms of the interaction of the grammar and the processor (i.e., in terms of the interaction between competence and performance). To take an early and wellknown example, Miller and Chomsky (9) noted that sentences with multiple center-embeddedstructures are invariably unacceptable,as in sentence3:
s05
Neither principle alone is sufficient to account for the unacceptability of sentence3 and the concomitant acceptability of sentence4. Dichotomy Evidencefor the Competence-Performance In recent years, evidencefrom a number of different quarters has supported the idea of an autonomous linguistic competence, that is, the existence of a grammatical system whose primitive terms and principles are not artifacts of a system that encompassesboth human language and other human faculties or abilities. The following paragraphs present such evidence from studies of grammatical patterning, language acquisition, neurology, and language processing (for a more thorough exposition of the material in this section, see Ref. 10).
Evidencefrom GrammaticalPatterning.The most direct evidence for the reality of linguistic competencecomesfrom the many-many relation that exists between grammatical form To account for this unacceptability in terms of principles of and communicative function. Put simply, there is no possibilcompetencealone would be to claim that sentence3 is struc- ity of deriving the particular shape that a grammatical conturally ill-formed. Yet such a claim has two undesirable conse- struction may take from the function that the construction quences:First, it would demand that a limit be placed by the servesin discourse.Consider, for example, the following three grammar on the depth of embedding, which otherwise is un- common syntactic devicesin English: the inversion of the auxlimited; second,it would render impossible an explanation of iliary verb and its placement before the subject,the omission the fact that sentencessuch as 4, which are plausibly derived of an understood you subject, and the displacement of a whfrom the same deep structure as sentence 3, are acceptable: word (what, who, how, when, etc.)from its deep-structureposition. These devices are illustrated in 5a, 5b, and 5c respec4. The rat died that was eaten by the cat that the dog chased. tively: 3. [The rat [the cat lthe dog chased] atel diedl.
Miller and Chomsky proposedto account for the difficulty of understanding sentence 3 in part by a principle of sentence comprehensionthat states (essentially) that sentencesare processedfrom "left to right" and that the processingmechanism cannot be interrupted more than once. Since any attempt to understand sentence 3 demands a double interruption of the processthat links a subject with its correspondingverb, processingdifficulties ensue. In other words, sentence3 is generated by the grammar, that is, it forms part of one's linguistic competence.Its unacceptability follows from the interaction of the grammatical principle of unlimited center-embedding with the perceptual (i.e., performance)principle stated above.
5. a. Are you having a good time? b. Go home now. c. What are you eating? Now consider four common discoursefunctions in human language: making a command, expressing conditionality, asking a question, and making an exclamation. As Figure 2 demonstrates, each of the three syntactic devices mentioned above can serve three or four of these discoursefunctions (seeRef. 11 for more discussion of this point): This great disparity between form and function appearsto be the general rule rather than the exception, a fact that (8) Displaced wh-word
(6) Inverted auxiliary
Conditional (b)
Question (c)
Figure 2. The disparity between form and function in language: 6. a. Don't you leavel b. Had John left (I would have taken his seat). c. Did he leave? d. Was he (ever) big!
7. a. Leave now! b. Leave (and you'll regret it). c. Leaving now?
8. a. Won't you give me a drink! c. What is it? d. How big he isl
506
AND PERFORMANCE TINGUISTICS, COMPETENCE
strongly suggests that there are principles governing structural regularity in language that cannot be considered byproducts of principles external to language. In other words, competencedemands a characterization on its own terms. The reality of syntactic constituent structure has been demonstrated experimentally as well. Several types of evidence bear out the linguists' claim that in a sentence like "John threw the ball," for example, the major constituent break is between John and threw and the secondarybreak is between threw and the. Experimentally elicited judgments about natural breaks in sentencestend to coincide with these linguistically determined assignments of constituent structure; constituents act as natural aids for perception and memory; and clicks inserted at random intervals into recordings of sentences tend to be perceived at major constituent boundaries. Acquisition. The casefor an autonoEvidencefrom Language mous linguistic competencehas received support from the fact that (in extraordin ary cases)linguistic abilities may be dissociated developmentalty from other cognitive abilities. For example, there are caseson record of children whose syntax is completely fluent, yet who are unable to use language communicatively; conversely, cases are attested in which a child's communicative intent is obvious, yet that intent cannot be phrased according to the grammatical patterns of the language being acquired (12).Children's errors, as well, point to the fact that the acquisition of grammar is not merely a byproduct of other development, in particular conceptual development. If it were the case,for example, that the child learned concepts first and then learned to map those conceptsonto syntactic categories and structures, one would predict that semantically atypical members of a syntactic category should be used erroneously as if they were members of a categorythat directly reflected their meaning. But errors of that sort are rare: children rarely utter such sentencesas "She naughtied" or "He is nicing to them," despite the fact that naughty and nice are actionlike adjectives,and one rarely finds such errors as "He is know it" or "WaShe love her," though know and loue are not action verbs. These facts seemto suggestthat the child has the specifically syntactic knowledge predicted by a theory of linguistic competence;when syntactic knowledge and conceptual knowledge conflict, the latter does not automatically override the former (13). Experimentation has shown that even very young children exhibit subtle grammatical knowledge that could not have been acquired by either induction or instruction. For example, four-year-oldsare able to judge that the pronoun him can refer to presid,entin the sentence"Bill met him just after the president arrived" and that he cannot refer to president in the sentence "He met Bill just after the president arrived." Since sentencesof this type are not particularly common, even in adult speech, and the structural factors determining coreference in them are highly complex, it has been hypothesized that the child brings into the language acquisition process a highly structured linguistic competencethat helps to shape language development(14).
This expectation appears to be borne out. In particular, the representation of competenceis a left-hemisphere function. Subjects whose right hemispheres are removed at birth develop all the syntactic skills possessedby normal speakers, and those who lose left hemispheresdo not do so. Either hemisphere can acquire the senseand reference of common words; however, only the left hemisphere can efficiently processthose aspectsof meaning determinedby syntactic configuration(15). The right hemisphere, on the other hand, contributes to the proper useof language.For example,subjectswith right-hemisphere damage have been known to lose the ability to use language metaphorically or indirectly, interpreting such sentencesas "He was wearing a loud tie" and "Can you pass the salt?" in their strictly literal senses(16). Along the same lines, there exists evidencefrom the form of language breakdown known as "aphasia" for an independently functioning linguistic competence.For example, aphasics often show syntactic deficits with only minimal accompanying deficits in cognition and articulation; indeed, such patients typically use alternative strategies to produce sentenceswith equivalent meanings, which suggeststhat their loss can be characterized by a breakdown of grammatical competence (17). Conversely,aphasiashave been recordedthat appear to involve preservation of the patients' syntactic abilities but the loss of the ability to utilize this structural information to recover the meanings of sentences.For example, patients with such a deficit have no trouble understanding sentenceslike 9a but are unable to use the structural information in 9b and 9c to decidewhich of the animals is being pursued and which is the pursuer: 9. a. The bone was buried bY the dog. b. The cat was chasedby the dog. c. The dog was chasedbY the cat. Untit recently, this problem was characterized as a general syntactic deficit. Further study has revealed, however, that these patients have the capacity to detect a wide range of glammatical violations, such as those in sentence10: 10. a. The cat was chasedthe dog. b. How many do you have rooms in the house? Their limitation appearsto lie, then, not in their knowledge of syntactic structure but rather in their ability to apply that structure to accessingthe semantic properties of the sentence. This charactertzation of the sentence comprehension deficit has led to the development of a remediation program focusing on the utilization of structural information, which, on the basis of preliminary studies, appearshighly promising (18).
Evidencef rom LanguageProcessing.A productive controversy has raged for two decadesover whether experimental evidencerelating to the processingof sentencesin production and comprehensionsupports the idea of a strictly grammatical competence.It seemsfair to say that in the 1970s,the majority position of those involved in psycholinguistics experimentaproceedswithout drawing Evidencefrom Neurology. If the competence-performance tion was that language processing conclusionwas arrived at negative This grammar. dichotomy has any validity, one would expect it to be reflected on a formal model assumed competence the First, reasons. of types two one for is, That in the organi zation of the brain for language. to be inconsisseemed time the at grammarians generative that by structures neurological of would. predict the existence gTammarof the view serve no cognitive functions other than strictly linguistic ones. tent with the dominant contemporary
AND PERIORMANCE COMPETENCE LINGUISTICS,
processorinterface. This view, the "derivational theory of complexity" (DTC), posits an isomorphic relation between the grammatical steps involved in generating a sentenceand the real-time steps of the processingmechanism. According to the DTC, if a certain sequenceof operations (say,transformations) applies in the grammar in a particular order, the processor's operations wiII mirror those steps. It was pointed out by a number of investigators that, given current assumptions about the way that the grammar was organized, this isomorphic relationship did not exist. For example, all generative grammarians before the late 1970sassumedthe existenceof a transformational rule of passive, which functioned (roughly) to map sentenceslike "John threw the ball" onto those like "The ball was thrown by John." Since the derivation of passives involved the application of one more rule than the derivation of actives, the DTC predicts that passive sentences should take longer to processthan actives.However, this was found not to be the case,thereby calling into question the idea that the grammar was utilized by the processorand, by extension, that there was not any need at all for an autonomous competence grammar (19). Second, experimental evidence seemed to disconfirm the idea that the process of sentence comprehension involves drawing on an autonomously stored grammatical representation. For example, in one experiment Bransford and Franks (20) presented subjectswith sentences such as L1:
507
simultaneously with sentence processing; after a certain (short) period nongrammatical factors predominate. Hence, it seems that Bransford and Franks's off-line experiment does not undercut the idea that speakersutilize grammatical representationswhen processinga sentenceQI). At the same time, other experimental evidencehas borne out the idea that processing does draw on competence.One study shows that when subjects are presented with a class of spntencepairs that differ in some minimal way, their response times in determining that the sentencesare different show a significant effect of grammaticality but not of plausibility. Thus, there is evidence for distinct syntactic and semantic componentsin processing.Another finds that sentenceswith syntactic violations take longer to read than well-formed sentences, even when perceivers do not consciouslydetect the violation-a finding that would be unexpected on the view that syntax is used in a haphazard manner or only when other sources of information yield no unique analysis of an input sentence. And another shows that readers are temporarily "garden-pathed" (i.e., they initially pursue an incorrect analysis) in syntactically ambiguous structures even when preceding sentences provide disambiguating information that in principle could guide the processor'schoice of an appropriate syntactic analysis (the studies referred to above are summarrzed in Ref 22). Utilization of Competencein Performance
11. Three turtles rested on a log and the fish swam beneath them.
There is still widespread disagreement over the means by which grammatical representations are accessedby the sentence processor.One important trend, representedby the theIn a subsequent recognition task subjects believed that they ory of "lexical functional grammar" (LFG) (4), maintains that heard sentence L2a as often as L2b: processing evidence shows that transformational rules must be abandonedentirely. In LFG both active and passive forms L2. a. The fish swam beneath the log. of verbs are stored directly in the lexicon, and sentenceslike b. The fish swam beneath the turtles. 5c, with displacedwh-wordsare derived by a complexbinding operation. Another trend is representedby the extended stanSince the deep structure of t2a is not representedin the deep dard theory (7) and its successor,the government-binding theory (GBT) (3). The GBT continues to posit a transformational structure of sentence11, Bransford and Franks concludedthat meaning was inferred by the use of extralinguistic knowledge derivation of passives and displaced wh-words. However, it such as real-world spatial relations rather than being basedon has revised earlier assumptions about the computational a stored grammatical representation. Such results led many power available to the processor.In place of the DTC it aspsycholinguists to reject the notion of an autonomouslevel of sumes that the parser is able to perform a small number of grammatical competencein their construction of parsing algo- operations simultaneously, rather than performing all seririthms. ally. By allowing nonseriality in computation, the amount of In recent years, however, evidence has mounted that a work the processorcan accomplish per unit of measured time model of mentally representedlinguistic competence doesplay is also increased.Since, in this view, the surface structure and a role in language processing.This change of view came about deep-structure trees are built in parallel by the processor,it for several reasons. First, generative grammarians have (for follows that the processingtime for actives and passiveswill theory-internal reasons) modified the competencemodel so be essentially identical (for discussion,see Ref. 23). that many once-popular grammatical analyses that were inA great amount of work has been devotedin recent years to compatible with the DTC have been abandonedfor those con- the question of parsing strategies, that is, to the proceduresby sistent with it. Second,the DTC itself has been called into which a syntactic representation of a sentenceis constructed question, thereby undermining any attempt to refute the exis- by the processor.For purposes of illustration, consider the tence of linguistic competenceon the grounds that it is incon- much-discussedstratery of "late closure" (24): sistent with that theory (this point is discussedbelow). FiLateClosure. When possible, attach incoming material into nally, the experimental processing evidence that challenged the clause or phrase currently being parsed. Late closure prethe necessity of a competencegrammar has itself been chal- dicts that the parser will treat the sock as the direct object of lenged. For example, the Bransford and Franks conclusions the verb mending in both sentences13a and 13b: have been called into question by the demonstration that the contribution of formal grammar to sentencecomprehensionis 13. While Mary was mending the sock it fell off her lap. manifest only during on-Iine tasks, that is, those performed While Mary was mending the sock fell off her lap.
s08
LlsP
In 13b, of course, the parser will have made an incorrect decithis sentence mending is an intransitive verb, and sion-in the soc& is interpreted as subj ect of fell rather than as object of mending. Hence it follows (correctly) that 13b is more difficult to parse than 13a. Finally, it should be pointed out that a number of researchers have explored the possibility of grounding the formal universals of linguistic theory in the attempt to maximize parcing efficiency. For example, it is suggested (in Ref. 25) that the subjacency condition mentioned above exists for precisely this it, it would be extremely difficult to recover reason-without of the sentence being processed. structure the deep
BIBLIOGRAPHY 1. F. de Saussure,Coursein GeneralLinguistics, McGraw-Hill, New York, 1959. 2. N. Chomsky, Aspects of the Theory of Syntax. MIT Press, Cambridge, MA, 1965. 3. N. Chomsky, Lectures on Gouernment and Binding, Foris, Dordrecht, The Netherlands, 1981. 4. J. Bresnan (ed.),Mental Representationof Grammatical Relations, MIT Press,Cambridge,MA, 1982. 5. G. Gazdar, E. Klein, G. Pullum, and I. Sag, GenerqlizedPhrase Structure Gramrrlar, Harvard University Press, Cambridge, MA, 1985. 6. Reference2, p. 4. 7. N. Chomsky, Deep Structure, Surface Structure, and Semantic Interpretation, in D. Steinberg and L. Jakobovits (eds.),Semantics,CambridgeUniversity Press,Cambridge,U.K., pp. 183-2L6, L97L. 8. M. Kean, Explanation in Neurolinguistics, in N. Hornstein and D. Lightfoot (eds.),Explanation in Linguistics, Longmans, London, p . 1 7 5 ,1 9 8 1 . 9. G. Miller and N. Chomsky,Finitary Modelsof LanguageUsers,in P. Luce, R. Bush, and E. Galanter (eds.),Handbookof Mathematical Psychology,Vol. 2, Wiley, New York, pp. 419-492, 1963. 10. F. Newmeyer, Grammatical Theory: Its Limits and ifs Possibilities,University of ChicagoPress,Chic&go,IL, 1983. 11. E. Williams, "Abstract triggers,"J. Ling. Res. 1' 71-82 (1980). L2. S. Curtiss, Genie:A Psycholinguisticstudy of a Modern-Day "Wild Child," Academic Press, New York, 1977 13. M. Maratsos and M. Chalkley, The Internal Language of Children's Syntax, in K. Nelson (ed.), Children's Language, Vol. 2, Gardner, New York, pp. I27 -2I4, 1980. L4. L. Solan, Acquisition of Structural Restrictions on Anaphora, in S. Tavakolian (ed.), Language Acquisition and Linquistic Theory, MIT Press,Cambridge,MA, pp. I27-L44, 1981. 15. M. Dennis, "Capacity and strategy for syntactic comprehension after left or right hemidecortication," Brain Lang. L0, 287-3I7 (1e80). 16. H. Gardner and G. Denes, "Connotative judgments by aphasic patients on a pictorial adaptation of the semantic differential," Cortex9, 183-196 (1973). L7. S. Blumstein, "Neurological disorders: Language-brain relationships," in S. Filskov and T. BoII (eds.),Handbook of Clinical Neuropsychology,Wiley, New York, pp- 187-223, 1981. 18. M. Linebarger, M. Schwartz, and E. Saffran, "Sensitivity to grammatical structure in so-called agrammatic aphasics," Cognition 13, 361_392(1983). 19. D. Slobin, "Grammatical transformations and sentencecomprehension in childhood and adulthood," J . Verb. Learn. Verb. Behau. 5,2L9-227 (1966).
20. J. Bransford and J. Franks, "The abstraction of linguistic ideas," Cog.Psychol.2,331-350 (1971). 2L. G. Carlson and M. Tanenhaus,SomePreliminaries to Psycholinguistics , Papers from the Eighteenth Regional Meeting of the ChicagoLinguistic Society,pp.48-60, 1982. 22. L. Frazier, Grammar and Language Processing,in F. Newmeyer (ed.;,Linguistics: The Cambridge Suruey,Vol. 2, CambridgeUniversity Press,Cambridge,U.K., in press. 23. R. Berwick and A. Weinberg, The Grammatical Basisof Linguistic Performance,MIT Press, Cambridge, MA, 1984. 24. L.Frazier, On ComprehendingSentences:SyntacticParsing Strategies,IULC Publication, Bloomington, IN, 1979. 25. M. Marcus, A Computational Account of Some Constraints in Language, in A. Joshi, B. Webber, and I. Sag (eds.),Elementsof Discourse Understanding, Cambridge University Press, Cambridge, U.K., pp. L87-200, 1981. F. J. NnwUEYER University of Washington
LISP LISP was invented in 1956 by John McCarthy. Since then it has been in constant use as the language of choice for AI programming. Even now it is the premier language for AI, and its acceptun.. by the larger programming community is growing. Unlike many other programming languag€s,LISP was designed primarily for symbolic processing.Since LISP's early days it has moved from being an exclusively symbolic processing language toward being a truly general-purposeprogramming language. For example, Common LISP supports many floating-point data types, several types of vectors and arrays, strings, and abstract data-structuring constructs. This entry discussesthree main topics: a description of L$P constructs using Common LISP, the history of LISP, and a comparison of the major dialects in use over the history of LISP. SimpleExampleof LISP The following is an example of a LISP program. This example is intended to give the reader an idea of what LISP is like so that the remaining discussionis not entirely abstr act: ;;; This function computesfactorial of n (defun factorial (n) ( c o n d( ( - n 0 ) 1 ) ;base case (t (* n (factorial (- n 1))))));simplerecursion The mathematical definition of factorial is 1)! nl:n(n0!:L The correspondencebetween the LISP definition and the mathematical definition of factorial is quite striking. To apply the LISP factorial function to 3, one writes (factorial 3) Syntactically this differs only slightly from the usual mathematical expressionfor function invocation, which is factorial(3)
LtsP
When factorial is called with 3 as the argument, the variable, n is bound to the value 3. In this particular case,wherever the identifier n is mentioned in the program text, the variable n is referenced, and the value 3 will be obtained. The first line of the function is the beginning of a conditional statement; COND is a generalizatton of an if-then-else special form. The line (cond((: n 0) 1)
;base case
is the beginning of the conditional clause; the value of n is fetched and compared with 0. If the value of n is numerically equal to 0, L is returned. This correspondsto the caseof 0! - 1. Next the line (t (* n (factorial (-n 1)))))) ;simple recursion is the else clause in the conditional; T is a symbol whosevalue is itself, and the symbol T is taken to be the canonical truth value in LISP. If the value of n is not equal to 0, the current value of n will be multiplied by the value of (n 1)!. The expression (-n1) takes the current value of n, decrementsit by 1, and returns that as its value. The expression (factorial (- n 1)) 1. calls the function factorial recursively on the value n When this expressionreturns its value, (n - 1)! that value is multiplied by the original value of n to producethe final value nl: n(n- 1)! tlSP Basics
s09
grams; in LISP all of these conceptsare describedusing the term function. When a programmer is working with LISP, there is an extensive LISP programming environment in which that work takes place. A programmer will type interactively to the LISP interpreter in order to run programs. The LISP interpreter reads LISP expressionsand evaluates them, printing out their values. This part of LISP is called the LISP toplevel. The programming environment also maintains a dynamic heap or area of storage. IJser programs can allocate objects in the heap, and a garbage collector is responsiblefor freeing storage that is no longer in use. The concept of evaluation is important to understanding the nature of LISP. Given an expression,LISP will evaluate that expression,yieldin g a value. Simple expressions,representing constants, variables, and literals, are evaluated directly; compositeexpressions,representing functional application or data structure access,are evaluated by first evaluating all of the subexpressionsin the original expressionand then applying the function specifiedin the expressionor then performing the actions required by the special form designated in the expression.This processis discussedmore preciselybelow, but for now it sufficesto appreciate that the processof evaluation is the key to understanding LISP. LISPSyntax LISP uses a minimal syntax in which parentheses are significant. The name for a LISP object written in this syntax is S -expression. L$P has a facility, called the reader, that parsesthe forms that LISP can manipulate. The reader is responsiblefor building data structures in the LISP environment that correspond to an external representation of these structures. The LISP reader can recognize and construct a relatively small number of objects; among them are symbols, arrays, floating-point numbers, strings, quoted objects,CONSes,lists, vectors, and fixnums. Fixnums are integers that can be represented compactly. Usually a fixnum correspondsto a number such that the bits that represent the number plus the bits that represent the tag for the fixnum fit in a machine word.
LISP (/ist processing) is a symbolic manipulation language. Although LISP can manipulate numbers in various formats, its strength lies in being able to manipulate pointers to objects, such as complex data structures. Processingpointers to objects and altering data structures comprising other such pointers is the essenceof symbolic processing.Typical data structures in LISP are symbols, lists, trees, vectors, records, arrays, and strings. Out of these data structures can be built representations for formulas, real-world objects,natural-lanSymbolsand Atoms. The LISP reader reads characters preguage sentences,visual scenes,stories,medical concepts,geo- sented to it by some input stream-usually a keyboard or a logical concepts,and other symbolic data. Theseobjectscan be disk file-and interprets the sequenceof characters as a LISP manipulated in ways that correspond to actions in the real object. The most basic LISP object is a symbol or an atom; its world or in ways that correspondto thinking about those ob- printed representation is as a sequenceof alphabetic, numeric, jects-or so the AI community hopes. pseudoalphabetic,and special characters. Different LISP sysAmong the objects that can be represented easily in LISP tems define symbols differently, but the essential features are are LISP programs themselves. Therefore, prograffis, such as usually very much along these lines. compilers and program verifiers, that reason about other proHere are several examples of the printed representation of grams can be easily written in LISP-at least the mechanics symbols: of manipulating the programs can be easily written. In fact, some of the first programs written in LISP were LISP interfoo ;a symbol with 3 letters preters (see below). Interpreters execute programs written in foo3 ;a symbol with a digit as a character LISP by tracing through their structures and performing apfoo-bar ;a symbol can have special characters in it propriate actions. +$ ;a symbol with only pseudoalphabeticcharacters Most LISP programs are in the form of functions, which /usr/r ;a symbol with slashesin it return values, and programming in LISP is very much like functional composition: the values of expressions are comSymbols and their roles will be describedin detail later. puted, and those values are passed on to other functions, Their main use is as a way of describing programs and data for which use those values to compute further values. In other programs. Some LISP dialects use the term atom in place of languages one refers to procedures, subroutines, and pro- the term symbol. Here symbol is used uniformly.
510
LISP
CONS Cells and lists. The CONS cell is the original basic building block for data structures. With the CONS cell lists and binary trees can be constructed. A CONS cell is a data structure that holds two pointers. The CONS cell that holds the numbers 1 and 2 is written as: (1.2) This data structure is not the same as the one written (2.1) The order of the two elements is important. A CONS cell written as aboveis said to be written in dot notation; sometimesit is referred to as a dotted list or a dotted pair. A list is written as ( e 1e 2 . . . e n )
This form is a shorthand for the following tree structure: (et.@z
.(e,. NIL). . .))
The symbol NIL is special in that it represents the empty list. NIL is describedbelow in more detail. Lists are represented as bin ary trees whose left branches point to elements in the list and whose right branches point to the remainder of the list. The list (1234) is synonymous with (r .Q . (3 . (4 . NIL)))) (The use of CONS cells to represent lists has been pervasive from the earliest LISP dialects. The LISP printer-a function that producesa character or typed representation of LISP data structures-produces lists from CONS cells wherever possible.) The LISP reader and system will producea CONS cell with a 1 in its left branch and a 2 rn its right branch when presented with (quote (1 . 2)) The "QUOTE" informs the LISP system that,the form (1.2) is to be interpreted as a LISP object rather than,as someother sort of command or statement. A LISP readerinormally supplies a number of shorthand notations, and themost prevalent over many dialects of LISP is the single quote, which is used as a shorthand for QUOTE. (quote (1 . 2)) is equivalent to '(1 . 2) LISP programs can construct CONS cells with the function CONS. The code (consL 2) builds the dotted pair (1.2) The code (cons el (cons e2 .
. (cons en NIL) . . . ))
constructs the list ( e 1e 2 . . . e n )
To accessthe first element of a CONS cell, one writes (car (cell)) And the secondelement of a CONS cell is accessedby writing (cdr (celt)) The left branch of a CONS cell is called the CAR of the cell, and the right branch of a CONS cell is called the CDR. To destructively alter the CAR of a CONS cell, one writes (rplaca (a) (c)) which alters the CAR of the CONS cell (a) to be (c). And to destructively alter the CDR of a CONS cell, one writes (rplacd (a) (d)) which alters the CDR of the CONS cell (a) to be (d). Manipulatinglist Structure. Lists are quite easily manipulated; it is natural to use structural recursion on binary trees. Here is the definition of a function that counts the number of atoms in a tree: (defun count-atoms (LST) (cond ((null LST) 0) ((atom LSD 1) (t ( + (count-atoms (car LST)) (count-atoms(cdr LST)))))) NULL tests whether the end of a list has been reached; ATOM tests whether an objectis an atom. This function can be applied to the definition of count-atoms itself, in which caseit returns 18 (0 and 1 are regarded as atoms). Programsand Data. That LISP programs and LISP data structures so closely correspondis of major significanceto the successof LISP in the researchcommunity. Using this feature, researchershave been able to implement other programming languages on top of LISP easily. In addition, LISP structure editors and compilers have proven to be easily written in LISP. Recall the definition of factorial given above. It is a list whose first element is "defun," whose secondelement is factorial, whose third element is a list whose first element is z, and so on. AlgebraicSyntax. Some LISP dialects have chosento provide an algebraic or algorithmic syntax in addition to the standard LISP syntax. Interlisp (1) supports CLISP (conversational LISP); PSL (2,3) supports RLISP (REDUCE LISP syntax); and Maclisp (4) supports CGOL. Although these syntaxes have their advocates,the bulk of the LISP programming community uses standard LISP syntax. To demonstrate the differences in the various syntaxes for these dialects, the iterative definitions for the factorial function are shown. Common LISP (defun fact (n) ;; Iterative factorial (do ((i 2 (L+ i)) (result 1 (x result i))) ((< n i) result)))
LISP
I nterLisp (DEFINEQ (FACT (LAMBDA (N) (x Iterative factorial) (bind (RESULT 1) for I from 1 to N do (SETQ RESULT (TIMES RESULT D) finally (RETURN RESULT))))) RI.'SP symbolic procedure fact n; Volterative factorial for i :- l:n product i; CGOL define factorial (n); VoIterative factorial prog result; result : - 1; foriinltondo result :- result x i: return(result)$
Symbols,ldentifiers,Locations,and Bindings In this entry a distinction is made between symbols, identifiers, l8cations, and bindings. A location is a temporarily allocated piece of storage-as in other programming languages. An identifier is a name for a location or for a symbol. An identifier is usually written as a string of characters and is recognizedby a reader (parser) as an identifier. The place in a symbol where the value is stored is called the ualue cell. The pair of an identifier and a storage location is called a uariable. In LISP variables are introduced by lambda expressions, let expressions,function definitions, and a few other basic constructs. In Common LISP variables are treated as lexical variables unless they are declared special. A lexical variable is a variable whose scopeis lexical or textual. The value of a lexical variable can only be accessedor altered by expressionsthat appear within the sameexpressionthat introduces the variable. A symbol is a LISP object. It has a name associatedwith it, and it also has a number of aspectsor uses.First it has a value, which can be accessedor altered using exactly the same forms that accessor alter the value of a lexical variable. In fact, the methods of naming symbols are the same as those used for naming a lexical variable. When a programmer declaresthat a variable is special, the LISP system will generate a symbol instead of a lexical variable. When an identifier is created at the toplevel of LISP, the LISP system will also generate a symbol. In addition to a value, a symbol can have a property list, a packagg,& print name, and possibly a function definition associated with it. A property list is simply a list of indicators and values; a property list can be used to store properties associated with the symbol or perhaps associatedwith some object that the symbol is defined by the programmer to represent. A print name is usually the string of characters that constitutes the identifier.
511
A package is a structure that establishes a mapping between an identifier and a symbol. A package is usually a hash table containing symbols. There is always a current package, and when the LISP reader finds an identifier, it examines the current pack age to determine whether there is a symbol with that name in the current package. If there is such a symbol, the reader returns that symbol; otherwise it createsa symbol in the current package with the identifier string as its print name and returns that new symbol. It is in this manner that the mapping is established. A function is normally associatedwith a lexical variable or a symbol. Associating a function definition with a symbol is much more commonly done than associatinga function definition with a lexical variable. When a function is applied, the application mentions the name of a symbol or a variable. The associatedfunction is then invoked. The example (foo x y) is a function application, and the identifier, foo, is associated with a function definition. From the context shown one cannot tell whether foo is a symbol or a lexical variable. Later there are examples of both sorts of referencesto functions. Binding. A related conceptis binding. A binding is an association of an identifier with a location or a symbol.A binding is often temporarily made, and after some context is exited, the binding reverts back to the previous one. Abov€, & variable is defined to be a binding of an identifier with a location. A location is a place to store a value, and thus a binding may associate a name with a value. Many LISP systems implement a location as a memory location, a stack location, or a register, and the compiler is free to choosethe most appropriate location or locations to store the value of the variable during the computation involving them. Likewise, the interpreter will manage bindings and 1ocations, and sometimes the interpreter's management will be identical to the compiled code'smanagement of bindings and locations, though it need not be. A symbol's value cell is a location in which a value can be stored, and bindings of these cells can be made. Such bindings are called special bindings. Here are some examples of these conceptsin use.
This expression-the ia.rrin., r-refers to the location containing the value of the lexical variable named r or it refers to the symbol named tr. In either case,it is said to refer to the value of r-in particular, it refers to the value part of the binding. (setq x (expression)) This expressionalters the value part of the binding of the r. If x is a symbol, its value is altered; if it is a lexical variable, the location to which r refers is altered. (let ((x (expression))) (form)) This is the most common method of establishing a binding. This form causesa new binding of x to be establishedwith an initial value correspondingto the value of the expression.The form, (form), is evaluated in the context of the binding; any -u...r, referencesto x, either to accessor to alter its value, will
512
LISP
or alter this binding. This binding ofr will exist from the time that the form is entered until it is exited. Here the distinction between a symbol and a lexical variable is important. The lexical scopeofthe variable r above is defined to be the textual extent of the LET expression. Referencesto a lexical variable are allowed only within the lexical scope.If r were special-* refers to a symbol-references to r could be made anywhere during the execution of the form, (form), even outside the lexical scopeofthe binding ofr. Free and Bound References.Important notions to understanding LISP are free references and bound references to variables. In the LET expression above, any references to r that are lexically or textually apparent in (form) are called bound referencesto r. Supposethe entire LET expressionwere (let ((x (foo))) (+ x x)) The references to r are bound: Each refers to the binding ofr that is lexically apparent. It can be seenthat the value ofr will be precisely the value ofthe expression (foo). Suppose,on the other hand, the entire LET expression were (let ((x (foo))) (+ y y)) The referencesto y are free referencesto the value ofy. In this casethe reference is to the value ofthe symbol, y; the binding ofy to which this reference refers is neither lexically nor textually apparent. By examining the program text, one cannot see the intended value for y; the value must be that obtained from an earlier binding for the symbol y. Declaring Variables Special. The binding of an identifier that refers to a symbol is called a special binding. Whenever a binding is made, unless the identifier is declared special, the binding is ofan identifier to a location that is not the value cell of a symbol. (let ((x (foo))) (declare (special y)) (+ y y)) This states that y is special. A special variable is a binding whose value part is the value cell of a symbol, and the binding alters the value of the symbol temporarily' Recall that to get the value of a symbol, the value cell of the symbol is examined. When a binding is established by the action of a let or some other similar constructs (e.g., PROG, DEFUN, FLET, MACROLET, LABELS, and MULTIPLE-VALUE-BIND), the establishing process is called let binding or lambda binding. Consider this example: (let ((x (exPression))) (declare (sPecial x)) (form)) This codelet-binds a symbol. Let-binding the symbol r temporarily changesthe value ofr for the duration ofthe LET' (The implementation of a LISP system may not actually perform exactly the operations described here. The operations presented are intended to present an informal operational semantics for LISP. Some LISPS might alter the value of the symbol,
and others might put the value of a special variable on the control stack; in order to find the value of this specialvariable, the control stack is searched.LISPs that changevalue cells are called shallow-binding LISPs, and LISPs that put bindings on the stack are called deep-binding LISPs. The implications of these implementation choices are in terms of performance: Certain operations are performed faster in one type of implementation than in another. The implementors make decisions regarding the frequency or importance of certain operations and optimi ze their implementations according to those decisions.) In the code above, the symbol r is a global LISP object. Changing its value temporarily alters the value of x throughout the evaluation of the form, (form). For example, suppose that the above piece of code is expanded to (let ((x 3)) (declare(sPecialx)) (foo)) The function foo is being applied to no arguments. The declaration tells LISP that r is to be treated as a symbol. The binding of x is visible to any free occurrencesof r in the function foo. Supposefoo were defined as (defun foo 0 (+ x x)) The reference to r is a free reference, and the value of r is the last binding of the special variable. In the code fragment above,r would be specially bound to 3, foo would be called and seethat binding. Adding 3 + 3 yields 6, which is the value of the LET. The scopeof the special binding of r in the code fragment extends through the duration of time that control is within or underneath the forms in the let. This is often termed dynamic scope. Different dialects of LISP use different terms to refer to the concept that is here termed speciaL They are dynamic, fluid, and specvar. Symbols.Recall that LISP has an interactive environment in which the user can type expressions;LISP evaluates the expressionsand prints out the value of the expressions.This toplevel is often called the read-eual-print loop-the three words read, eual, and print indicate the three actions that the toplevel performs. At the toplevel of LISP, if the user types r, this is taken as a request to evaluate the symbol whose name is tr. To evaluate r, LISP witl find its value and return that value. If the symbol doesnot exist, it is created, 8s discussedearlier. Generally, a freshly created symbol has no value, so the evaluator will report this fact. LISP 370 and LISP/VM return the symbol itself as the value of a symbol that has never been assigned a value. A value can be assigned to a symbol using SETQ: (setq x 3) This assigns the value 3 to the symbol r. Had r not existed earlier, the reader would have created it when it read the identifier, x, in this expression. There is a convention concerning what values are regarded as representing true and false. NIL represents falsehoodand
LtsP anything else represents truth. The symbol T is conventionally used to mean true if there is no particular reason to use any other value. Recall that NIL is a symbol and also is the empty list-the empty list is also written ( ). In Common LISP the symbol NIL has the property that it is synonymous with the empty list. One of the ongoing controversies in the LISP implementation community is whether NIL ought be the same as ( ). NIL is a symbol, and ( ) is a list. Somewriters discussingLISP programming style maintain that ( ) should be written when the programmer intends the empty list, NIL when the programmer intends falsehood,and 'NIL when the programmer intends the symbol named NIL. Supposeone types at LISP
s13
This function computes a variant on the factorial function- u(n!);but it doesit in a manner that allows a compiler to do a transformation from recursion to iteration. That is, this function is inherently iterative, although, syntactically, the body of the function calls itself recursively. The function takes two arguments: n and u. The function is called like this: (cfact 10 1) which will compute 10!. The value 10 is bound to the variable n and 1 to u. The body of the function is then evaluated within the context of these bindings. The body is (if (2stoPn) v (cfact (f - n) (* n v))))
(setq x 3) and one has the compiled function,
When n is tested, it is not 0, and so the recursive call is made to cfact. Becausethere is no reason to return to the context in which n is bound to 10 and u to 1, the evaluator can choosenot to save this environment: it can, instead, change the bindings If square+cis invoked immediately after the SETQ, the result of n and u and simply jump to the beginnitrg of the body of will be 9. In other words, the interpreter environment is acces- cfact. When the evaluator can do this, the function to which it can be done is called tail recursiue, and this type of optimizasible to compiled code. In the LISP literature the conceptsof identifier and symbol tion is called tail recursion rerrloualor tail merging. The first version of factorial shown at the beginning is not are often collapsed into a single concept.Sometimes the contail recursive. cepts of print name and identifier-identifier, as used hereOne dialect of LISP, Scheme,dependscritically on tail reare collapsed. All concepts of variables and bindings can be cursion removal to do iteration becauseSchemedoesnot proexplained without recourseto the definition of the symbol, but, as with CONS cells, it is often easier to understand special vide any iteration constructions such as DO or PROG and GO. binding using the concreteimplementation conceptof the symAnonymousFunctions.Consider the simple function add7: bol rather than a formal definition of the semanticsfor special binding. As a further motivation for this approach,it is often (defun add7 (x) easier to understand the implementation history of LISP with (+ x 7)) this treatment in mind than it would be with any other explanatory approach. This simple function adds 7 to its single argument. Suppose this function is neededas part of a larger expression: Functions (setq v (* z (add7 q))) All programs in LISP are functions or collectionsof functions. To evaluate this expressionrequires doing a function call to A function takes somenumber of arguments, binds those arguthe function add7, which can, in certain implementations, be a ments to somevariables, and then evaluates someforms in the context of those bindings. After the evaluation takes place, a time-consuming action. The following codefragment achieves the same purpose: value or values are returned. The argument-passing convention is call-by-value: Every (setq Y @ z argument to a function is first evaluated; and then LISp ((lambda(x) (+ x T)) pointers to those values are passedto the function. This last q))) point is important. Although LISP is call-by-value, the values that are passed are in fact pointers to values, so that if the The expression value is a complex data structure, that data structure is not (lambda (x) (+ x 7)) copied but a pointer to it is passed.If the function alters that data structure-suppose it is a vector, and an element of that is an anonymous function. That is, it acts just like a function, vector is altered-then the data structure is altered by side- but it is not associated with any symbol (function name). effect. An exception to this is the case of immediate objects, Moreover, to call this function doesnot require a function call. like fixnums and characters, which are stored in the pointer A compiler, seeingboth the function definition and its use, can itself. open codeor inline codethe function invocation. Open coding Here is a simple function: is creating assembly language code to perform a function rather than creating assembly language codethat performs a ;;; This computesv*(n!) call to that function. (defun square-x 0 (declare(specialx)) (x x x))
,rt
(defun cfact (n v) (if (zerop n) v (cfact (1
n) (* n v))))
Macros LISP supports a powerful macro facility. When the evaluator finds what appears syntactically to be a function application
s14
usP
(sometimes called a combination in the literature), it first checks whether it has found a macro instead. If it is a macro call, then the macro codeis evaluated, the result of the macro evaluation is substituted for the original form, and the evaluator attempts the evaluation again. Here is an example: (defmacroadd7 (x) (list '+ x Z)) (setq z (* Y (add7 q)))
ture definition facility is defined that is an extension of the fundamental macro facility just described. CommonLISPDefstruct. The Common LISP defstruct capability is similar to capabilities available in other LISP implementations for defining record structures. The user can define new data types and data structures using defstruct: (defstruct person name age)
When the evaluator is evaluating (add7 q), it notices that (add7 A) is a macro call. It binds the variable r in the macro definition of add7 to the symbol q and then evaluates the body of the macro. The body creates a list whose first element is + and whose secondelement is the value of the variable r. Here r is bound to the symbol q, so the result of the macro evaluation is (+ q7) This expressionis then substituted for (add7 q) in the original expression,and the evaluator goeson. With add7 defined as a macro, the original expression is treated exactly as if it had been ( s e t qz ( * y ( + q 7 ) ) ) Backquote Recall the definition of the macro add7: (defmacroadd7 (x) (list '+ x 7)) The use of the function LIST was necessary in order to produce a list that had some constant elements intermixed with elements that were the values of other LISP objects.Because writing macros in this style is pervasive in Common LISP, a special construct-backquote-was invented to make writing such macros simpler. The basic idea is to provide a template for a piece of list structure. The template contains constant as well as variable components.In backquote syntax the list constructor, (list '* x 7) is written '(+
This expression defines a record structure named person. Consider the example (setq ralph (make-person:name "Ralph" :age 53)) This expressioncreates an instance of this data structure. The symbol ralph is given this structure as its value. The person definedhas the string "Ralph" as the filler of the name slot and the integer 53 as the filler of the age slot. To accessthe name slot, one writes (person-nameralph) Moreover, the type of the structure returned by make-person is person. This enablesthe programmer to extend the type system defined by the LISP system. The expression (typep ralph 'person) returns T. To change the ageof ralph (this should be done once a year), one writes (setf (person-ageralph) ( + 1 (person-ageralph))) SETF is a generalized value-setting macro: given any LISP location that can be altered, SETF can be used to set the value in that location. For example, to change the CAR of a CONS cell, one can write (setf (car cell) new-car) One can define new structures that can be taken as subtypes of existing structures: (defstruct (astronaut (:include person)) helmet-size (favorite-beverage'scotch))
,x ,7)
The expression is precededby a backquote ('); wherever a comma (,) appears,the expressionto its right is evaluated,and the resulting value is placed at that point in the list structure. Becausethe value of 7 is 7, the above expressioncan be written '(+ ,x 7) In Common LISP programs the backquote syntax is extensively used. Data Abstraction The powerful macro facility in LISP has enabledprogrammers to develop a sophisticated programming style in which data abstractions play a significant role. All representation commitments can be delayed until the program is developed,and these commitments can be hidden within macro definitions in a separate area of the program. In Common LISP a data struc-
This expressiondefinesa new data type, astronaut, which is a subtype of person. If one writes (setq typical-astronaut
'Hl";TJl;;r"t :age45 :helmet-size17.5))
This creates a new instance of an astronaut. In a DEFSTRUCT, slots can specify default values, which are used in the event that not all slot values are supplied during instance creation. In this example the favorite-beverageslot has scotch as its default value. The slots of astronaut includes the slots of person, and the instance so created is both of type person and 'expression astronaut. In the following example the notation an LISP result of evaluating indicate value' is to the used ) expression.The LISP expressionis to the left of the ), and the value of that expressionis to the right.
LISP
(person-nametypical-astronaut) ) "Buzz" (astronaut-name typical-astronaut) + "Bvzz" (astronaut-favorite-beverage typical-astronaut) ) (typep typical-astronaut 'person) ) t (typep typical-astronaut 'astronaut) ) t
scotch
Closures In somedialects of LISP-Common LISP, for example-anonymous functions are generalized one step further: Functions are treated as first-class objects,just like any other LISP object. Consider the expression
;;; fcdr: returns the cdr of the cons cell ;;; frplaca: takes another argument and changes the ;;; car of the cons cell to that argument ;;; frplacd: takes another argument and changes the ;;; cdr of the cons cell to that argument (case message (fcar a) ;fetch the car (fcdr b) ;fetch the cdr (frplaca (setq a value)) ;change the car (frplacd (setq b value)) ;change the cdr (t (error "Invalid message to a cons cell: message)))))
This function takes two arguments and producesa closure that will act as a CONS cell would. The function that is returned by fcons takes one or two arguments. The first arguThis expression,as any other in LISP, should have a value. In ment should be one of four different messages.The message the values, having LISPs that support expressionslike this fcar will return the CAR of the CONS cell, the messagefcdr value is called a closure. A closure is a function that retains will return the cdr of the CONS cell, and the other two mesthe binding context current when the closure was created, so sageswill change the CAR and the CDR parts of the CONS that that context may be used for referencesto free variables ceII. If the first argument is either fcar or fcdr, the second from within the function's bodY. argument, if supplied, is ignored. The #' syntax signifies that the expressionfollowing it is a The messagesfrpl aca and frplacd correspondto RPLACA function or a closure. Some LISP dialects, such as Scheme, and RPLACD. If the first argument is frplaca, the secondargudenote closures exactly as above but without t}re #' . ment is the new value for the CAR, and if the first argument is A more complex examPle is frplacd, the secondargument is the new value for the CDR. A second argument only makes sense if the first argument is (let ((x 7)) either frplaca or frplacd. There is a way to detect that the #'(lambda (v)(* x Y))) optional argument value was supplied erroneously and to sigA function is created within a context of bindings that, when nal an error if it was. Here is an example of the use of these constructs. applied to a single argument, will add 7 to it. The function itself will refer freely to r when the function is called. Creating (setq cell (fcons'a 'b)) this closure requires retaining the lexical environment of This expressionsets the value of the symbol cell to the closure bindings throughout the lifetime of the closure. created by fcons.To obtain the CAR of cell, one writes A simple example of the use of closuresis (funcall cell 'fcar) (labels ((factorial (n) This expressioncalls the function that is the value of cell witir (if (: n 0) the single argument fcar. To obtain the CDR of ceII, one writes 1 (funcall cell 'fcdr) (* n (factorial (- n 1)))))) ( + (factorial x) (factorial y))) To change the CAR of cell to be c, one writes (funcall cell 'frplaca 'c) In this pieceof codethe function factorial is locally defined and then locally applied to the apparently free variables r and y. and to change the CDR of cell to be d, one writes The construct LABELS is like LET in that it establishes (funcall cell 'frplacd 'd) bindings, but it is used to define recursive functions locally. Note that the definition of factorial refers to itself; the LATo illustrate a simple use of macros, supposeone wishes to BELS construct is used to establish this self-reference. write With closures it is possible to create several objectsthat (fcar cell) retain a shared context that can be manipulated. Using this mechanism, it is possibleto write modules and other sorts of to obtain the CAR cell, where cell is as it is defined above. structuring constructs. However, not all LISP dialects support Then the macro fcar can be defined as closures. (defmacrofcar (cell) Recall that CONS, CAR, and CDR are defined above. Here '(funcall,cell'fcar)) is a way to implement these constructs using closures; the functional correspondentsto these functions are called fcons, LambdaLists. The explanation of closures has introduced fcar, and fcdr, respectively. the notion of &optional arguments to a function. The closure returned by fconscan take one or two arguments. The notation ;;; A functional definition of cons (defun fcons (a b) (message&optional value) #'(Iambda (message&optional value) in the lambda list for that closure has the marker &optional in ;;; There are 4 messagetypes: it. The arguments to the left of this marker are required to be cell fcar: the cons returns car of the ;;; #,(Iambda (x) (+ x Z))
sl6
USP
passedto this function, and the arguments to the right may be passed.In fact, there is a wide variety of special markers that can appear in a lambda list. These markers can enable functions to take keyword arguments, in which the order of arguments to the function may not be known to the programmer, and special keywords are used to causearguments passedto be matched to the proper variables by matchittg the keywords. Also, someof the arguments passedto a function can be turned into a LISP list in the event that the function can take a widely varying number of arguments. The SimpleEvaluationModel Earlier the conceptof evaluation was introduced as the key to understanding LISP. LISP can be defined by an informal operational semantics in which the method of evaluating forms and expressionsis given. The simple rule of evaluation starts with an expressionand specifiesa method for determining its value. An expressionis either a constant, a variable, a symbol, a combination,or a specialform. Specialforms look like combinations. A constant is a number, a string, or a quoted object. As noted above, a quoted object is written (quote (object)) And the value of the quoted object above is (object).A quoted object can be abbreviated '(object) using a special feature of the reader. The value of a constant is the constant itself. The value of a symbol is the contents of the value cell of the symbol, and the value of a variable is obtained from its associatedlocation. Both T and NIL evaluate to themselves. A combination is a list whose first element is not quote. Normally a combination is a function invocation, a macro invocation, or a special form. For example, a combination would look like (cfact 13 1) This combination is a function invocation that calls the function cfact, which is defined above.The first element of the list is the function to apply, and the remaining elements are the arguments to which the function is to be applied. To evaluate a combination, the evaluator evaluates the arguments recursively and then passesthose arguments to the function, where the arguments will be bound to the variables specifiedin the function definition. At this point the body of the function is evaluated within the context of the bindings thus established, and the value of the body is the value of the original combination. A combination can be a macro invocation, in which casethe first element of the list is the name of a macro. To evaluate a macro invocation, the combination is passedas an argument to the macro function, and the body of the macro is evaluated to producea form that is evaluated in place of the original combination. Defmacro allows the programmer to name the parts of the combination, as in the definition of the macro version of add7. However, this practical facility is built upon the basic LISP macro facility-which is uniform over almost all LISP dialects-in which the entire combination is passed as the single argument to the macro function. A special form is one in which the simple evaluation
method is different-or example, the form
special-from the normal method. For (quote foo)
is special becausethe value of the form is the symbol whose name is foo, not the value of the symbol whose name is foo. If this form were evaluated according to the rules above, the evaluator would first recursively evaluate its subform before passing that subform's value to the function quote. In order to achieve the intent of quote, the basic evaluation rules must be broken. Hence the conceptof a specialform. Another special form is if, the conditional expression: (if (pred) (then) (else)) The expression(pred) is evaluated; if it returns something regarded as true, then the value of the if is the value of the expression(then), and the expression(else) is not evaluated; otherwise, the value of the if is the value of the expression (else),and the expression(then) is not evaluated. In the history section a possibledefinition of if is presented. Someother specialforms are COND, AND, OR, PROG, and GO. COND is a special form that enables the programmer to write a conditional with more than one predicate. (if (pred) (then) (else)) is equivalent to (cond((pred) (then)) (t (else))) And the general form of the conditional is (cond (pred1 form1,1. (pred2 form2,1 .
(pred, form-,r .
form1,n1) form 2,n2)
form *,n*)
In this expression, pred; are predicates. They are evaluated in order until all have been evaluated or until one produces a non-NIL result. In the latter case the corresponding forms are evaluated, and no more predicates are evaluated. Frequently T is written for the last predicate to mean otherwise. Here T is gparanteed to evaluate to something non-NIL, and so if no pred; evaluates to non-NIL, this clause will be executed. To execute a clause, the individual forms form;,; are evaluated, and the value of the last form form;,,r, in the clause is returned as the value of the cond. The form cond is one of three major propositional forms; the other two propositional forms are AND and OR; and has the form (and Predl .
Pred,,)
The pred; are evaluated in left-to-right order; if all of them evaluate to a non-NIL value, then the value of the AND is the value of pred,,, the last form in the and. If, in evaluating the forms left to right, one of the forms evaluates to NIL, then the value of the and is NIL, and evaluation of the remaining pred; is terminated. Sometimes this sort of and is referred to as a conditional AND. In most other programming languages Boolean expressions always evaluate each of their clauses before
LtsP the propositional result is computed; in LISP evaluation stops as soon as the result can be determined. The special form OR is defined similarly: (or Predl .
Pred,,)
The pred; are evaluated in left-to-right order; if all of them evaluate to NIL, then the value of the or is NIL; if one of them evaluates to a non-NIL value, then the value of the OR is that non-NIL value, and evaluation of the remaining pred; is terminated. PROG and GO are a means for writing simple sequential programs. PROG is often called "the prog1am feature." The form of a PROG is (prog ((variables)) er .
. en)
where (variables) are local variables whose values may be assigned using SETQ, and each eirseither a LISP expression or a symbol. If ei is a LISP expression, it is evaluated; if it is a symbol, it is taken as a tag or a label. The special form
(eo (tae)) causescontrol to transfer to the tag named (tag), if it exists; an error is signaled otherwise. The expression (return (expression)) returns the value of (expression)from the PROG. If the last expression, €n, is not a transfer of control using GO or RETURN, the PROG is exited, and the its value is NIL. Here is a simple example of the definition of a factorial function using PROG: (defun factorial (n) (prog (answer) (setq answer 1) loop (when (zerop n) (return answer)) (setq answer (x n answer) n (1- n)) (go loop)))
RunTime Typing One of the most interesting aspectsof LISP is that it supports run time typing, the ability to determine what type an objectis at run time. In many other programming languagesvariables, not objects, are typed, so that a compiler is able to generate very specific and efficient codefor operations on variables. In LISP, though it is possiblefor the compiler to understand type declarations that refer to variables, it must always be possible at run time to determine the type of an object. LISP manipulates objects by passing around pointers to those objectsrather than passingthe objectsthemselves.LISP is call-by-value, so that all of the arguments to a function are evaluated, and pointers to the values are passedto the function. The pointers are usually consideredto contain the type information, but in an actual implementation the type information may be contained in the address or stored with the object. To be more specific, there are three primary methods for encoding type information. The first is for pointers to contain type information directly. For example, in a computer addressesmight take up only part of a machine word. In this case it might be possibleto use the remaining extra bits to store the
517
type. A combination of an address and type information rs called a pointer. In fact, the term as used in the LISP literature is usually taken to mean such a combination. When the type information is stored in the pointer itself, that type information is called a tag. The secondtype of mechanism for encoding the type of a pointer is to partition memory into blocks of storage.Objects that reside in a particular block are defined to hold some specific type, and whenever an object of that specifictype is created, the storage allocated to contain the object will be allocated within that block. The type can be identified by looking in a table of block descriptorsthat outline the block of memory and store the type of the objects contained therein. This method also stores the type as part of the pointer, but indirectly through a table-lookup operation. That is, given a pointer-which is simply a machine address-it is possibleto determine the type of the object to which the pointer points without examining the object. The third basic method for storing type information is to store that information with the object itself. In order to determine the type of a pointer, the object itself (the memory indicated by the pointer address) must be examined to find the type information. In practice, most LISP implementations use a hybrid of these methods. In these hybrids there are three major types of objects: immediate pointers, CONS cells, and objects with headers. The general strategy of implementation of the storage management for these objectsin the most common hybrid system of LISP implementation on stock hardware (not LISP machines) will be presented.Some of the differencesbetween this style of implementation and that used on LISP machines are describedbelow. In the most common hybrid, short objects,such as fixnums, are passedas immediate pointers: The tag for the object indicates that the object is contained in the remainder of the word that represents the pointer. A short floating-point number might be representedby 29 bits of stordge, and the tag might be 3 bits; in this situation the tag and the data could be represented by a single 32-bit word. There is a trade-off regarding whether it is expedient to pack a short floating-point number into a word like this; it might be better to passa pointer, which contains the tag and an address,where the pointer points to a floating-point number in the normal machine format for that number. In the first implementation there is an advantage that to create a short floating-point number doesnot require a memory location to be allocated for the number, and the second technique does not require the bits in the number to be extracted from the pointer when a short floating-point number is numerically manipulated. Normally short floating-point numbers, fixnums, and characters are represented as immediate objects. CONS cells are stored as two words of storage. The two words represent the CAR and the CDR of the CONS cell, and using CONS cells,binary trees and lists can be represented.A pointer to a CONS cell is created and passedaround. The tag of the pointer indicates that the object at the address indicated by the pointer is a CONS cell. Each of the two words of storage beginning at that addressare LISP pointers and can point to anything-immediate pointers can be stored in a CONS cell as well as any other sort of pointer. The third type of object is an object with a header; this includesvectors,arrays, strings, blocks of code,functions,and user-definedrecord structures. The first part of the header is a
518
LISP
tag that indicates that some specified amount of storage folCharacfersand Sfrrngs. Common LISP supports both ASCII lowing this header represents the object. The data part of the and EBCDIC character sets. Strings are vectors of characters. immediate pointer contains the srzeof the block of storageand, A variety of string operations is provided. usually, & subtype or a secondarytag, which states what type Vectors and Arrays. Vectors, both with and without fill of object is stored in this storage-vector, array, string, and so pointers, are supported. Multidimensional arrays, with genon. Depending on this subtype, the subsequentstorage is used eral as well as specialrzedelements,are also defined. in various ways. A vector typically contains sequential units of storage(bits, bytes, words,etc.) that representthe elements of the vector; a user-definedrecord structure contains one or Interpretation more words specifying the user-defined type for the object; a LISP has been presentedby specifying an informal operational string contains a sequenceof characters, probably in the for- semantics-a semanticsbasedon a model of evaluation. Often mat for strings that the underlying hardware provides. this model is directly implemented by a LISP interpreter. The pointer to such an object contains a tag that states that A LISP interpreter examines a representation of a LISP the object stored at the addressindicated by the pointer con- program and performs the operations expressedtherein. For tains a header with a secondarytag. In order to determine the example,when a LISP system reads the following expression, type of such an object, the header of the object is fetched or (+ 12) otherwise examined. This hybrid scheme has certain aspects important to the overall implementation of the LISP system. For example, a the LISP interpreter seesthat it has a combination that is not stop-and-copygarbage collector, which does a linear scan of a special form; in fact, the combination is a request to call the parts of memory, must be able to recognuzeenough about the function + on two arguments. The arguments are evaluated objectsit finds to be able to decidewhether the garbage collec- and found to be the integers 1 and 2. These arguments are tor should trace the parts of the object as pointers. The garbage passedto the function f , which has already been defined. Functions like + are defined by writing programs in some collector must do different things for vectors containing that manipulates the internal data structures used language non-LISP for vectors containing pointers from what it must do to implement the LISP data structures. In the early days of objects. serious LISP implementation efforts-the mid-1960s through early 1970s-the language chosento implement the LISP the Data Structures. Someof the other LISP data structures not was assemblylanguage,though occasionallyFORinterpreter mentioned above are presented here. was used. In this example + would be impleTRAN C or numeric data of have a variety LISPs Modern Numbers. mented as an assembly language program that would receive types. The following are the numeric data types defined in Common LISP: integers, ratios, floating-point numbers, and its arguments in somestandard locations, such as on a stack or in some registers. complex numbers. More recently, production LISPs are being implemented in integers that between is made distinction For integers a can be efficiently representedby the underlying computer and LISP, and compilers are used to translate the LISP code into integers that cannot be efficiently represented.A ftxnum is an machine language. When the interpreter recognizesa function call, such as a integer that roughly correspondsto the machine-representto *, the interpreter places the evaluated arguments on call into fall typically able fixed-point number. These numbers the stack and jumps to the function definition. The interpreter 1 for suitable values of n. An some range -Zn through 2" integer that is not a fixnum is called a bignum (big number). is recursive in the sensethat to evaluate the arguments to a These integers are sometimes represented in computers as a function, the interpreter is called recursively. The interpreter causesthe special forms to be evaluated by vector of bits or as a list of fixnums. special actions. For example, in one possibleimplementaking algebra symbolic for doing useful especially Bignums are and symbolic mathematics. For example, it is often necessary tation of a LISP interpreter, to evaluate an IF expression,the predito compute the factorial of a number exactly as an integer then and the else clauses are placed on the stack. The pulled off is clause then the non-NIL, if and is evaluated, cate The rather than approximately as a floating-point number. then clause is the and removed, is clause else the stack, of the can be number this digits; decimal 2568 has number L000! handed to the evaluator. If the predicate returns NIL, then the computed exactly in Common LISP using bignums. then clause is discarded,and the else clause evaluated.SimiA ratio is a number that representsthe mathematical ratio lar sorts of actions are taken for the other special forms. of two integers. These are denoted by the form Constants are immediately evaluated, and lexical variables (numerator)/ (denominator). are evaluated by finding their values in the environment. are locatedby looking in the value cell for the Common LISP defines four different precisions of floating- Symbol values symbol. point numbers: short floating point, single floating point, double floating point, and long floating point. These four precisions are defined in order to cover the casewhere a particular Compilation implementation of LISP might be able to provide some hardAlthough LISP has been presented by specifying an informal ware support for some or all of them. be easComplex numbers are pairs of noncomplexnumbers. The operational semantics,and though this semanticscan compiled a is also LISP interpreter, an with ily implemented numeric the same two parts of a complex number must be of language.Every major dialect of LISP has a compiler.Compiltype.
LtsP ing LISP programs is very easy to do when only a low level of sophistication is required of the compiled code,but such compilation is relatively difficult when a high degreeof optimization in the compiled code is required. A LISP interpreter examines a representation of a LISP program and performs the operations that the codeindicates. A LISP compiler examines a representation of a LISP program and producesmachine language codethat implements the operations that the LISP code expresses. Each special form is compiled according to its own specific stylized technique. For example, a conditional expression is compiled as blocks of codethat expressesthe evaluation of the various predicates and forms, with conditional branches among the blocks at the machine language level corresponding to the logic of the conditional expression. Let-binding is compiled by producing machine language that stores values in the locations determined to hold them. Consider the expression
s19
elimination, and tail recursion removal. One or more phases are dedicated to performing these optimizations. One or more phasesallocate registers and temporaty memory.There is generally a stylizedmanner in which the stack is used, and the stack is typically organized as a stack of frames, in which every frame has the same format. Finally code generation takes place. An important compiler optimizatton is to remove as many run time type checksas possible.CommonLISP supportsa set of type declarations, and the user may supply declarations to help the compiler deduce the types of variables. When the types of variables are known, the compiler can specializethe LISP operations expressedin the code to take advantage of this additional knowledge. For example, supposethat the user writes (+xy)
If the compiler has no information about what types of numbers r and y are, it must produce code that will work for aII possible types of numbers. But if the compiler can determine (let ((x (foo))) that x, !, and the result of the + are all fixnums, then the (body)) machine instruction for addition can be used to add the numThe compiler produces code to evaluate (foo); then it decides bers-assuming that the representation of fixnums is preserved by the machine addition instruction. which location to use to store the value that (foo) produces.A Typically, compiled LISP code runs 10-100 times faster typical LISP compiler will selecta location on the control stack interpreted LISP code. than locations are memory registers and to hold the value, though possible candidates. Code is produced that stores the value that foo returns in the location selectedto hold it. Finally the ComparativeHistoryof Lisp:1956-1960 compiler producescodeto evaluate (body).If (body)refers to r, then the compiled code will fetch the value from the selected Some of the key ideas in LISP were developedby John McCarthy during the Dartmouth Summer Research Project on location. Function calls are compiled in a stylized manner. In some Artificial Intelligence, which was the first organized study of LISP systemsthere are a variety of styles of compiled function Artificial Intelligence (5). This meeting was held during the calls. Typical compilers use the control stack to store function summer of 1956. McCarthy's motivation was to develop an algebraic list-processinglanguage for AI work on the IBM 704 return information and to pass arguments. A typical LISP compiler is structured as a series of phases, computer. During the Dartmouth meeting Newell, Sh&w, and Simon each phase having, perhaps, o number of passes.The phases are partitioned into two parts: The first part builds a computa- (researchersat Carnegie-Mellon University) describedIPL 2 tion graph that represents the computation that the LISP pro- (6), a list-processing language for the Rand Corporation gram expresses,and the second part uses the computation JOHNNIAC computer (7) in which they implemented their Logic Theorist program. graph to generate machine language code. McCarthy decided against creating a language similar to Each node in the computation graph is a block of computations, and arcs representrelationshipsbetweenthe blocks.For IPL 2 becauseits form was basedon a JOHNNIAC loader that example,if the value of one expressionis required to evaluate happenedto be available to them and becausethe FORTRAN another expression, the arc between the nodes representing idea of writing programs algebraically was attractive. A primary motivation was that arbitrary subexpressionsof symthe two expressionsindicates this relationship. At each node in the computation graph a set of properties is bolic expressionscould be obtained by composingthe functions computed, such as the nature the side effects produced.Usu- that extract immediate subexpressions. During the period from the summer of 1956 through the ally a separate phase is performed to complete or refine some summer of 1958, McCarthy worked concurrently on the form aspect of the computation graph structure. During an early phase all identifiers are made distinct from of LISP and on his research in AI. The AI research centered one another, macros are expanded,declarations are noted, and around ideas that led to his Advice Taker proposal (8). The Advice Taker is a reasoning program that decideswhat syntax checking is performed. Arcs are placed between the places where variables are referenced and the place where to do in specificsituations by making logical inferences.In the Advice Taker information about the world is represented by they are bound. Optimizations are generally expressedas transformations sentencesin a suitable formal language. Representing senon this computation graph. Often the transformations have a tences by list structure seemedappropriate to McCarthy, and clean representation as source-to-sourcetransformations on a list-processing language also seemed appropriate for prothe LISP code-although the transformations are not imple- gramming the operations involved in deduction. mented that way. Optimizations include removing unnecesAt that time the key ideas in LISP were computing with sary bindings, constant folding, strength reduction, dead-code symbolic expressionsrather than numbers, representation of
520
LISP
symbolic expressionsand other information by list structure in the memory of a computer, representation of information in external media mostly by multilevel lists and sometimesby Sexpressions,a small set of selector and constructor operations expressedas functions, composition of functions as a tool for forming more complex functions, use of conditional expressions for introducing branching into function definitions, the recursive use of conditional expressionsas a sufficient tool for building computablefunctions, the use of lambda expressions for naming functions, the storage of information on the property lists of symbols, the representation of LISP programs as LISP data, the conditional expressioninterpretation of Boolean connectives,the LISP function EVAL that servesboth as a formal definition of the language and as an interpreter, and garbage collection as a means of handling the erasure problem. The erasure problem is that of how to free storage that is no longer in use. Until 1958 there was no such thing as a conditional expression that returned a value. At that time all conditional expressions resulted in a branch to different code depending on the condition. Some of the above ideas were taken from other languag€s, but most were new. Toward the end of the initial period, it became clear that this combination of ideas made an elegant mathematical system as well as a practical programming language. Then mathematical neatnessbecamea goal and led to pruning somefeatures from the core of the language. This was partly motivated by aesthetic reasons and partly by the belief that it would be easier to devise techniques for proving programs to be correct if the semantics were compact and without exceptions.
FORTRAN did not), and erasure was handled explicitly by the user program. One of the key ideas addedto LISP by Gerlenter and Gerberich was to make CONS a function rather than a subroutine. The value of the function is the word allocated by the cons, and with this mechanism new expressionscan be constructedout of subexpressionsby composingoccurrencesof cons. In the summer of 1958,while McCarthy was at IBM working on symbolic differentiation at the invitation of Nathaniel Rochester,conditional expressionsand MAPCAR were added to the definition of LISP. In addition, lambda notation for anonymous functions was developed. In the fall of 1958 the first real implementation of LISP was started at MIT. The initial plan was to implement a compiler, but this was believed to require man-years of effort, so various LISP functions were hand-compiledinto assemblylanguage to experiment with subroutine linkage, stack handling, and erasure. Subroutines were written to create a LISP environment in which one could read and print list structure using the parenthesized notation. On paper, LISP functions were written in an informal notation called M-expressions-for Meta expressions-intendedto resembleFORTRAN as much as possible.BesidesFORTRANlike assignment statements and GO-TO, the language allowed conditional expressionsand the basic functions of LISP. The M-notation also used brackets instead of parenthesesto enclosethe arguments of functions in order to reserve parentheses for list structure constants. The READ and PRINT programs induced a de facto standard external notation for symbolic information. For example, x + y was written as ( * x y) Any other notation necessarily requires special programming becausestandard mathematical EarlylmplementationConsiderations.The first LISP, LISP 1, notations treat different operators in syntactically different was implemented for the IBM 7A4 computer. This computer ways. This notation later came to be called "Cambridge Polhas a 36-bit word; this word is broken up into four parts: two ish" becauseit resembled the prefix notation of Lukasi ewrcz 15-bit parts (called the address and decrement), a 3-bit tag, (10) and becausethe Harvard philosopherQuine (11) had also and a S-bit prefix. There are special instructions for moving used a parenthesizedprefix notation. the contents of the address and decrement parts of a word to Explicit erasure of list structure, as was done in IPL, was and from the 15-bit index registers. Addressesin the machine regarded as unaesthetic; implicit erasure using reference were 15 bits, so it was decided that list structure should use counts was the first idea considered,but the 6 bits in the IBM 15-bit pointers. A CONS cell, then, was a single word with two 704 word not used by addressesin a CONS cell were in sepa15-bit addressesin it. rated parts of the word. The chosen alternative was garbage Four functions were proposedfor dealing with CONS cells, collection, in which data still in use was marked by tracing one to extract each of the four parts mentioned above. Addifrom known roots; the storageno longer in use was returned to tionally, d function was provided to fetch a word from memory; a free pool. this instruction was called CWR, standing for contents of the A single contiguous stack was used to store local variables, urordin register number. It was soonnoted that to chain down CONS was made a function, and the prefix and tag parts of the a list, the CWR function was being composedwith the instruc- CONS cell were abandoned.This resulted in a single type of tion to extract the decrementpart of the word. A function was object,the 15-bit address,and the language therefore required defined to achieve this composition,and it was called CDR, no type declarations. standing for contents of the decrement part of register numMcCarthy wrote a paper about recursive function theory, ber. Similarly, CAR was defined, and its name stands for con- based on LISP, called "Recursive Functions of Symbolic Extents of the address part of register number. CONS was de- pressions and Their Computation by Machine, Part I" (12). fined as a subroutine rather than as a function. Part II was never written but was intended to contain applicaThis work was done with paper and pencil at Dartmouth tions to computing with algebraic expressions.The recursive becausethere was no IBM 704 computer there. function theorists tended to prefer the Turing machine as the The first real LISP implementation was FLPL-Fortran paradigmatic computing engine. /ist processing /anguage. This was a set of subroutines that McCarthy felt that one way to show that LISP was neater were addedto FORTRAN on the IBM 704 computer.This work than Turing machines was to write a universal LISP function was undertaken by Herbert Gelernter and Carl Gerberich (9) and show that it is briefer and more comprehensiblethan the under the direction of Nathaniel Rochester;FLPL did not have description of a universal Turing machine. This was the Lisp conditional expressions,it did not support recursion (because function EVAL, where (eval e a) computesthe value of a LISP
LfsP
expression e, the second argument a being a list of assignments of values to variables. Writing EVAL required inventing a notation representing L]$P functions as LISP data. Logical completenessrequired that the notation used to expressfunctions used as functional arguments be extendedto provide for recursive functions, and the LABEL notation was invented by Nathaniel Rochesterfor that purpose (9). StephenRussell noticed that EVAL could serve as an interpreter for LISP and hand-coded it, producing an interpreterbased programming language (9). The appearance of an interpreter tended to freeze the form of the language to use S-expressions.Moreover, the early implementors expected to switch to writing programs as M-expressions.The project of defining M-expressionsprecisely and compiling them or at least translating them into S-expressions was neither finalized nor explicitly abandoned.It just receded into the indefinite future, and a new generation of programmers appeared who preferred internal notation to any FORTRAN-Iike or ALGOL-like notation that could be devised. Moreover, a machine-readable M-notation would have required redefinition because the pencil-and-paper M-notation used characters unavailable on the IBM 026 keypunch. LISP1.5. LISP 1.5 was an extensionof LISP 1, which was implemented by McCarthy and Russell (9). The additions to LISP 1 to create LISP 1.5 are summarized below. Properfy Lists. This conceptis necessaryfor the implementation of the Advice Taker. Destructive Lisf Operations.RPLACA, RPLACD, and NCONC enabled the user to alter, destructively, existing list structure. This feature is used to implement efficient list structure editing functions. Given that LISP source code is represented by lists, this enables structure editors for LISP codeto be written. The Interlisp structure editor is the best example of this type of editor. Numbers. Numbers in LISP 1 were represented as lists of atoms. LISP 1.5 used a more efficient representation;but even this new representation was still insufficiently efficient to competewith FORTRAN. It was not until L971when the MIT LISCOM and NCOMPLR compilers for Maclisp were available that numeric code written in LISP was fast enough for serious numeric applications. FIJNARGS.The ability to capture a binding environment at the point of a function definition was added,although the functionality of the construct is considerablybelow what would be ideal. Later LISPs extended this to the notion of closures,discussedearlier. SpecialForms. The existence of EVAL made special forms possible.This enabledexperimenters to try out new constructs easily. Functionsof a VariableNumberof Arguments.Functions like LIST take a variable number of arguments. This facility enabled programmers to experiment further with the functioncalling mechanisms,such as the sophisticated Common LISP lambda lists. Program Feature.The idea of executing sequential programs, much as FORTRAN does,was not an afterthought: It precededthe conceptof functional composition.The syntax of PROG's and GO's, however, was an inelegant afterthought. Compiler. The first LISP compiler was written by Timothy Hart and Michael Levin (9). It was written in LISP and may
s21
have been the first compiler written in the language to be compiled. L$P 1.5 is the real takeoff point for LISP implementations and experiments. At the same time as the first implementation, on the IBM 7090, Lisp 1.5 Programfti.er'sManual was published (9). This book has been used by generationsof LISP programmers to learn LISP. Russell, Edwards, LISP 1.5 was implemented by Hart, and Brayton (9). The following is an example of the definition of the function member in LISP 1.5. It is reproducedexactly as it appearedin Lisp 1.5 Programmer's Manual: DEFINE(( (MEMBER (LAMBDA (A X) (COND ((NULL X) F) ((EQ A (CAR X) ) T) (T (MEMBER A (CDR X))) ))) )) Note that the S-expressionsare not indented in the style used in this entry. The indentation of LISP codeto indicate which parts were subordinate to others was in common use by the mid-1960s, but some programmers continued to not indent LISP code. However, the style of indentation has changed over the years; here is an example of MEMBER written in a LISP style in use in 1966: DEFINE(( (MEMBER (LAMBDA (A X) (COND ((NULL X) F) ((Eq A (CAR X) ) T) (T (MEMBER A (CDR X))) ))) )) The following is the definition of member as an M-expression, indented in modern style: nember[a;x] - [nulllx]- ;F; eqla;car[x]l- >T; T- )member[a;cdrlx]ll Specia/ andCommon Variables.In LISP 1.5 a special variable is a binding of an identifier to a location, a pointer to which is placed on the property list of the symbol whose name is the identifier. When a programmer declaresr to be special, the compiler and loader combine to create a symbol named r. The property list for r has a property with the indicator special, and its value is a cell. That cell is the location to which r refers and where assignments to x place values. When r is let bound, the old value stored in this cell is placedon the control stack along with enough information to restore the old value to this cell when control returns past the let-binding point. A commonvariable is one in which the value cell is used to hold the value, and bindin g a common variable requires the use of an explicit a-Iist. Evaluating a common variable at run time requires a call to EVAL. Special variables can be shared among different compiled functions but cannot be shared with interpreted functions; common variables can be shared by compiled functions and interpreted functions but are much slower than special variables. In terms of implementation, special variables are the pre-
s22
usP
cursors of what is now known as shallow binding, whereas common variables are the precursors of deep binding. In shallow binding, the value of a special variable is always kept in the value cell of a symbol. LET binding the symbol causes the old value to be placed on the control stack and the new value to be placed in the value cell. References to the value of the symbol within the dynamic scope of the LET will see this new value. When control exits the LET, the old value is restored. In deep binding, the identifier-value pair is placed on the control stack when LET binding occurs. Reference to the value ofa symbol first looks at the stack for any bindings there, and if none are found, the value cell is referenced. Searching the stack can be done relatively efficiently by linking together special-binding blocks (the placeswhere identifier-value pairs are stored on the stack) by putting a flag in the symbol stating whether the symbol has been bound (and therefore should be looked up on the stack) or by caching special variables in stack frames in which they will be frequently referenced. These techniques can be used alone or in combinations.
ComparativeHistory of l-lSP:1950-1970
tems, the maintenance of BBN LISP was shared by BBN and Xerox Palo Alto Research Center, and the name of the LISP was changed to Interlisp. The PDP-6 and PDP-10 computers were, by design, especially suited for LISP becausethey had 36-bit words and 18-bit addresses.This allowed a CONS cell, a pair of pointers or addresses,to be stored efficiently in a single word. The PDP-6 and PDP-10 had fast, powerful stack instructions, which enables fast function calling for LISP. Almost all of these implementations had a small handcoded core and a compiler; the rest of the LISP was written in LISP and compiled. In 1965 virtually all of the LISPs in existencewere identical to each other or differed only in trivial ways. After 1965,or more precisely, after Maclisp and BBN LISP diverged from each other, there came a plethora of LISP dialects. ComparativeHistoryof LISP:1970-1980 Early Maclisp. In the early MIT PDP-6 Maclisp (16) Greenblatt (17) decidedthat having both common and special variables, as in LISP 1.5, was inelegant and removedcommon variables from the language but made special variables work using the value cell. This was the first implementation of LISP to use what is called shallow binding. The toplevel of LISP 1.5 was EVALQUOTE. The toplevel of Maclisp is EVAL and not EVALQUOTE. In Lisp 1.5 one could type expressionslike this to EVALQUOTE:
In the early 1960s Timothy Hart and Thomas Evans implemented LISP 1.5 on the Univac M 460, a military version of the Univac 490 (13). It was bootstrapped offof LISP 1.5 on the IBM 7090: A cross-compiler that ran on the IBM 7090 and compiled codefor the Univac machine was used to compile the cons(ab) bulk of the LISP code. A small amount of machine language codewas written for the lowest levels of the LISP implementa- to create the pair of a and b. In Maclisp one could type this tion (13). expression,instead, to EVAL: Robert Saunders and his colleagues at System Develop(cons'a 'b) ment Corporation implemented LISP 1.5 on the IBM-built AN/FSQ-32/V computer,often called the Q-32 (13).The impleThe "quote" in EVALQUOTE signifies the implicit quoting mentation was bootstrapped from the IBM 7090 and PDP-1 of joint the arguments to the function applied. Interlisp retained project was a computers at Stanford University. This EVALQUOTE as a toplevel evaluation form while Maclisp Informaand Development Corporation System effort between forked off and used EVAL. tion International. In LISP 1.5 and Maclisp specialforms are implementedby The PDP-I LISP at Stanford was implemented by Mcplacing a function on the property list of the symbol whose (13). Carthy and Russell print name correspondsto the special form's name. For examimplehigh school student, Deutsch, then a In 1963 L. Peter ple, in Maclisp, COND has an fsubr property, where the "f" in mented a LISP similar to LISP 1.5 on the PDP-1 at Bolt, Beranek, & Newman (BBN) (13). This LISP was called Basic fsubr signifies a special form, and the "subr" signifies a compiled subroutine. The evaluation processfor arguments is then PDP-1 L$P. BBN also implemented one of the first time-sharleft up to the programmer. Here is how IF could be defined as a PDP-I. ing operating systems for the special form in Maclisp: By 1964 a version of LISP 1.5 was running at MIT, in the Electrical Engineering Department, on an IBM 7094 com(defun if fexpr (form) puter, under the Compatible Time Sharing System (CTSS) (let ((predicate (car form)) (14). This LISP and Basic PDP-I LISP were the main influ(then (cadr form)) some by DEC and LISP implemented PDP-6 ences on the (else (caddr form))) members of the MIT Model Railroad Club in the spring of ((eval predicate) (eval then)) (cond 1964. This LISP was the first program written on the PDP-6. (t (eval else))))) Also, this LISP was the ancestor of Maclisp, the LISP written (15) (ITS) Time Sharing System Incompatible to run under the This codeis not as efficient as it could be, but it illustrates that at MIT on the PDP-6 and later on the PDP-10. special forms can be extended by user-written code. At BBN a successorto Basic PDP-1 LISP was implemented Maclisp introduced the lexpr, which is a function that can patterned on the PDP-1 and an upward-compatible version, any number of arguments and puts them on the stack; take on was implemented system, CTSS MIT after LISP 1.5 on the is bound to the number of the Scientific Data Systems 940 (SDS 940), by Daniel Bobrow the single argument to the function list for this argulambda the of form passed. The arguments and D. L. Murphy (13). A further upward-compatible version case.Here is an lexpr the list, signals not a and ment, a symbol and Murphy, and Hartley was written for the PDP-10 by Alice example of how to define LIST, a function of a variable number this LISP was called BBN LISP (13). In 1973, around the time arguments: that SDS was acquired by Xerox and named Xerox Data sys- of arguments that returns the list of those
LlsP
t,i'ftti',ttit,, (answer 0 (cons (arg i) answer))) ((zeropi) answer ))) The single arggment, n, is bound to the number of arguments passed. The expression, (arg i), refers to the ith argument passed. Other major additions to LISP 1.5 were arrays, the modification of simple predicates,such as MEMBER, to be functions that return useful values, PROG2, and the introduction of the pair of functions ERR and ERRSET. ERRSET was useful when one wanted to execute a piece of codethat might cause an error. One wrote (errset (form)) which would evaluate (form) in a context in which errors would not cause a breakpoint to occur. If (form) did not cause an error, ERRSET would return a pair of the value and NIL. If (form) caused an erTor, no error would be signaled, and the ERRSET would return NIL. If, in evaluating (form), the expression (err (exPression)) is evaluated, the value of (expression)would be returned as the value of the ERRSET. ERRSET was later generalized to CATCH and ERR to THROW. The simple but powerful macro facility on which DEFMACRO is basedwas introduced in Maclisp in the mid-1960s. Later Maclisp. The most significant development for MacLisp occurred in the early 1970s when the techniques in the "fast arithmetic compilet," LISCOM, were incorporated into the Maclisp compiler. This new compiler, NCOMPLR, would becomea standard against which all other Lisp compilerswere measured in terms of the speedof running code.Inspired by the needs of the MIT Artificial Intelligence Laboratory (AI Lab), whose needs coveredthe numeric computations done in vision and robotics, several new ways of representing and compiling numeric code resulted in numeric performance of compiled Maclisp on a near par with FORTRAN compilers. LISCOM was largely the work of Jeff Golden and John L. White at MIT (18). The relative performancesof compiled numeric LISP codeand FORTRAN numeric codeon the PDP-10 stirred the Digital compiler writers to improve the DEC FORTRAN compiler to the point where it was difficult for the LISP compilers to compete.The relative performance of subsequent LISP compilers on numeric code has never reached the high water mark achieved in L972, but some LISP compilers have done very well. During the mid-1970s,in conjunction with work on LISP machines by Richard Greenblatt, David Moon, and others, Maclisp began to expand toward a much fuller language. The sophisticated lambda lists seen in Common LISP are the results of early experimentation with programming styles by the LISP machine group; these styles found their way into Maclisp. Maclisp was implemented on related operating systemsto ITS for PDP-10s. By 1978 Maclisp ran on ITS, TOPS-IO, TOPS-20,TENEX, and WAITS (19).The last operating system is the Stanford Artificial Intelligence Laboratory PDP-10 operating system. Maclisp and Interlisp were the dominant LISP dialects from 1970 through approximately 1978.
s23
Interlisp. Interlisp introduced many radical ideas into L[$P programming style and methodology.The most visible of these ideas are embodiedin programming tools, Iike the spelling corrector, DWIM, the file package, CLNP, the structure editor, and MASTERSCOPE. The spelling corrector is a progTamthat comparesa possibly misspetled word, usually a symbol, with a list of known words. The spelling corrector is invoked when a symbol is unknown. The user has options for controlling the behavior of the system with respectto spelling correction. The system can correct automatically, it can pause and ask whether the correction is acceptable,or it can simply signal an error. The spelling corrector is under the general control of a much larger program, called DWIM, standing for do what I meatt. Whenever an error of any sort is detected by the LISP system, DWIM is invoked to determine the appropriate action. Among other things, the spelling correctormight be invoked. DWIM is able to correct some forms of parenthesis errors, and this, along with the spelling correction of identifiers, comprise the most common user typographical errors. DWIM would not be especially useful unless correcting errors were done permanently. Interlisp doesnot maintain a file of function definitions in the same way that, sBY, Maclisp does.In Maclisp files of LISP codeare simply text files, which are read into Maclisp or compiled for Maclisp. In Interlisp a file is used as an external storage medium for user sourcecode, but the fact that it is a file is unimportant. What is important is that the file is a permanent repository for user sourcecode, and all modifications to the user code is done within the Interlisp programming environment. What is unimportant is the representation used in those files becausethe user never uses a text editor to edit the sourcesbut a resident LISP structure editor. The ideal situation that is approximated by using fiIes is that the user is interacting with a LISP system that never terminates, and the proglams as they are in use are important, not their representation as text in a file. When errors are corrected by DWIM, the sourcein the user's system and, as an incidental side effect, in the file in which the sourceis located is altered to reflect the correction. Therefore, once a bug is corrected, it is permanently corrected. CLISP, standing for conversational LISP, is an ALGOLlike syntax used along with a normal LISP syntax. For example, here is a valid definition of FACTORIAL written in Interlisp CLISP syntax: (DEFINEQ (FACTORIAL (LAMBDA (N) (if N-0 then 1 elseN*(FACTORIAL N-1))))) A number of infix operators are defined in CLISP, list construction syntax is defined, and a useful set of iteration constructs is defined. Here is a simple program to print all of the prime numbers in the range rn < p < n: (for X from M to N do (PRINT X) while (PRIMEP X)) CLISP, DWIM, and the spelling coruector can work together to recognizethe following as a valid definition of FACTORIAL: (DEFINEQ (FACTORIAL (LAMBDA (N) (iff N-0 thennl esle N*8FACTORIALNN- 1))))
s24
USP
The editor used in Interlisp is a structure editor. With this editor LISP code is displayed and altered within the LISP system. Operations in the editor apply to the current S-expression, as selectedby the user, or to a small surrounding context of the current expression.This editor was developedwhen teletypes and other slow printing terminals were standard; the Interlisp structure editor enables a programmer to edit LISP source code very efficiently on such terminals. This style of editing is natural to people who are almost exclusively programming in LISP, though some people prefer to edit text. Other programming tools, such as MASTERSCOPE,help the programmer to develop large systems.MASTERSCOPE is a facility for finding out information about the functions in a large system. The user asks questions about the system, and MASTERSCOPEanalyzesthe code,building up a database,to answer the question. Questions include which functions call which others (directly or indirectly), which variables are bound where, which functions destructively alter structures, and several others. Interlisp does not implement a macro facility that is as easily used as Maclisp's. Interlisp introduced the conceptof block compilation, in which multiple functions are compiledas a single block; this results in faster function calling than would otherwise be possiblein Interlisp. Interlisp runs on PDP-10s,Vaxes, and a variety of specialpurpose LISP machines developedby Xerox and BBN. The rnost commonly available Interlisp machines are the Dolphin, the Dorado, and the Dandelion. The Dorado is the fastest of the three, and the Dandelion is the most commonly used. Interlisp-l0, the PDP-10 version of Interlitp, is a shallowbound LISP, and Interlisp-D, the special-purposeLISP machine version, is a deep-boundLISP. Like Maclisp, Interlisp extended the function-calling mechanismsin LISP 1.5 with respectto how argumentscan be passedto a function. Interlisp function definitions specify arguments as the cross-productof two attributes: lambda versus nlambda, and spread versus nospread. Lambda functions evaluate each of their arguments; nlambda evaluates none of their arguments. Spreadfunctions require a fixed number of arguments; nospread functions accept a variable number. Nospread functions treat their arguments very much as Maclisp does in the lexpr case. One of the most innovative of the language extensions introduced by Interlisp was the spaghetti stack (20). The problem of retention of the dynamic function-definition environment in the presenceof specialvariables was never completely solved until spaghetti stacks were invented. The idea behind spaghetti stacks is to generahzethe structure of stacks to be more like a tree, with various branchesof the tree subject to retention whenever a pointer to that branch is retained. That is, parts of the stack are subject to the same garbage collection policies as are other LISP objects.Unlike closures,the retained environment captures both the control environment and the binding environment. Spaghetti stacks per se are an efficient implementation of tree-structured stacks using a linear stack; in the situation where parts of the stack are not retained, there is almost no performance penalty incurred. Interlisp retained the LISP 1.5 flavor to a greater extent than did Maclisp. Maclisp changed the order of some arguments to somefunctions (such as MAPCAR and the other map functions) while Interlisp retained the original order. Interlisp programming style is heavily influenced by the pro-
gramming environment, CLISP, and DWIM; therefore, the LISP portion of Interlisp remains very similar to early LISPs, the effort in improving programming style focusing on CLISP and DWIM. fn dialects like Maclisp, in which a "Lispy" style was retained, the LISP part of the language itself was advanced. One of the minor, but interesting, modificationsto LISP 1.5 that Interlisp made was the introduction of the superparenthesis. If a right square bracket (l) is encountered during a read operation, it balancesall outstanding open left parentheses or it matches the last outstanding left square bracket (D. Here is a simple example of this syntax: (DEFINEQ (FACTORIAL (LAMBDA (N) (cond [(zeropn 0) 1] (t (times n (FACTORIAL (subl nl Other LispDialects. There are severalother major LISP dialects from this era. Most are more similar to Maclisp than to Interlisp. The two primary dialects are Standard LISP (21) and Portable Standard LISP (2). Standard LISP was first defined in 1969 by Anthony Hearn and Martin Griss, along with their students and colleagues(2I). The motivation was to define a subsetof LISP 1.5 and the other prevailing LISP dialects that could serve as a common transportation mechanism for programs. The intended use of Standard LISP was to transport REDUCE, a symbolic-algebrasystem, that was written by Hearn and his colleaguesand which was of interest in the scientific and engineering community. Standard LISP was designed so that if an existing LISP implementation (called the target LISP) could implement the Standard LISP constructs,then Standard LISP and REDUCE could run on top of that existing dialect. Later Hearn and his colleaguesdiscoveredthat they would need more control over the environment and the compiler, and Portable Standard LISP (PSL) was born (3). This dialect shared the simplicity of Standard LISP, but it was a full implementation with a portable compiler. At the end of the 1970s PSL ran on about a dozen different computers. At Stanford an early version of Maclisp was adaptedto the PDP-6; this LISP was called LISP 1.6. The early adaptation was rewritten by John Allen and Lynn Quam, and later compiler improvements were made by Whit Diffie Q2). UCI LISP (29 is an extendedversion of LISP 1.6 in which an Interlisp style editor and other programming environment improvements were made. The Demiseof the PDP-I0. By the middle of the 1970sit becameapparent that the 18-bit addressspaceof the PDP-10 would not provide enough working space for what were becoming average-sizedAI programs. The PDP-10 line of computers (KL-10s and DEC-2Os)were being altered to use an extendedaddressingschemein which multiple 18-bit address spacescould be addressedusing a baseregister. However,this addition was not a smooth expansion to the existing addressing modes as far as the LISP implementor was concerned; moreover,the PDP-10 line was abandonedby DEC. One responseto the address spaceproblem was the LISP machine, d special-purposecomputer that specializesin running LISP programs. This is the topic describedbelow. The other response was to use computers with larger address
LrsP spacesthan 18 bits. Digital Equipment Corporation introduced such computers, the Vax line of computers. Vaxes presented both an opportunity and a problem for LNP implementors. The Vax instruction set provided some goodopportunities for implementit g the low-level LISP primitive efficiently; the exception to this was LISP function calli.g, which could not be accurately modeledwith the Vax function-catling instructions. The problem with the Vax was two-fold. First, several major LISP implementations at the time had a large assembly language base-foremost among them was Maclisp. This style enabled the implementor to provide a fast, compactLISP but not one that was easily portabte. Second,the Vax, although it had a large addressspace, was intended for use by many small progTaffis,not several Iarge ones. Thus, paging overhead for large LISP programs would remain a problem not fully solved on the Vaxes. The primary Vax LISP dialects developedin the 1970swere Franz LISP, NIL, and PSL. NIL (new lmplementation of LISP) (24) was the successorto Maclisp and was designedby John L. White; Guy L. Steele Jr., and others at MIT, under the influenceof LISP Machine LISP, also developedat MIT. NIL was a large LISP, and soonits implementation was centeredaround a large assembly language base. Franz LISP was written to enable research on symbolic algebra to continue at the University of California at Berkeley, under the supervision of Richard J. Fateman, who was one of the principal implementors of Macsyma at MIT (2il. Fateman and his students started with a PDP-11 version of LISP written at Harvard and extended it into a Maclisp-like LISP that can run on virtually all Unix-based computers.This is a result of the fact that Franz is almost entirely written in C. BecauseFranz was intended to be a vehicle for research in symbolic algebra, it never became a solid LISP in the same sensethat Maclisp did. On the other hand, it was a widely available LISP dialect on one of the most widely available computers in the AI community. In 1978 a LISP project was begun at Lawrence Livermore National Laboratories by Richard P. Gabriel and Guy L. Steele Jr. to implement NIL on the S-1 Mark IIA supercomputer (26,27). This LISP, never completely functional, would be the testbed for adapting advanced compiler techniques to LISP implementation. With the developmentof the S-1 LISP compiler, it onceagain becamefeasibleto implement LISP in LISP and to expect similar performance to the best hand-tuned, assembly-language-basedLISP systems.
s2s
Heights, New York. LISP 370 is an elegant variant of LISP that highlighted both special and lexical binding, closures over both special and lexical variables (using a technique like spaghetti stacks), and a programming environment strongly influenced by Interlisp. ComparativeHistoryof LISP:LISPMachines,1970-1985 The discussionof LISP implementation thus far has concentrated on implementations of LISP on stock hardware. A stock hardware computer is a computer that is not designedspecifically to support LISP. For example, a Digital Equipment Corporation Vax LLl780is stock hardware, and a Symbolics3600 is an example of a computer designedspecificallyto support LISP. Several of the operations frequently performed while executing LISP programs can be better performed or significantly sped up by special hardware. The hardware-assisted operations most frequently performed by special hardware on LISP machines are tagging, function calling, and garbage collection. The words of storage in a LISP machine can be designedto have enough bits to support pointers directly; the addressand the tag bits can both fit in a word of storagewithout losing any addressingbits. Instructions to check the tag of a pointer can be made fast with special data paths in the computer. Operations, such as addition, can checkthe types of their operandsin parallel with the operation itself, and if the types are not suitable, an exceptionalcondition can be flagged,and a more general course of action can then be taken. Function calling is one of the most frequent operations in LISP programs. The construction of stack frames can be performed by hardware. Caching parts of the stack can make a major performance difference to a LISP implementation. Garbage collection is often performed by need-when the dynamic heap runs out of free space,a garbage collection can be initiated to gain some free space. Garbage collection can take a relatively long time; performittg it concurrently with other LISP operationscan eliminate the long pauseassociated with garbage collection.Most of the known techniquesfor incremental garbage collection are best implemented with special hardware. Many LISP machines are essentially general-purposecomputers that have been specially microcodedfor LISP.
CommercialLISPMachines.In the early 1970s,at MIT, the Scheme.One of the most important developmentsin LISP LISP Machine project began. The machine constructed,called occurredduring the secondhalf of the 1970s:Scheme.Scheme the CADR, was a specially microcoded,32-bit processor,with is a simple dialect of LISP that brought together some of the some data paths of particular usefulness to LISP. LISP Maprogramming language semantics ideas developed in the chine, Incorporated (LMI) was the first to commercialize this 1960s with the LISP culture that had largely developedby machine in the late 1970s. seat-of-the-pants efforts. Also in the late 1970s a secondLISP machine company The major contributions of Scheme were lexical scoping, emerged,Symbolics,which also sold a CADR copy.In the early first-class functions (closures),and a pure programming style. 1980s Symbolics introduced a new LISP machine, the 3600, which would becomethe industry leader in LISP machine perlBM. Although the first LISPs were implemented on IBM formance for the next 5 years. computers, IBM faded from the LISP sceneduring the latter The dialect of LISP on the LISP machines is called LISP half of the 1960s.The original LISP 1.5 continued in the form Machine LISP or ZetaLrsp (28). of LISP 360, which was improved and developeduntil the In the late 1970s BBN built an Interlisp LISP machine, early 1970's,when the LISP 370 implementation began. LISP called the Jericho. It remained a BBN internal machine. 370 is now called LISP/VM. About the same time, Xerox microcodedthe Dorado, also The LISP 370 project was under the direction of Fred Blair known as the Xerox IL32, to execute an Interlisp instruction (25) at the Thomas J. Watson Research Center in Yorktown set. This work was based on the earlier attempt to microcode
s26
USP
the Xerox Alto computer to implement the Interlisp virtual machine specification (29). A second machine, the Dolphin, was similarly microcoded,resulting in a secondXerox LISP machine.The Dorado,a high-speedcomputer,was originally a Mesa machine, Mesa being an ALGOL-tike programming language.
to live with a syntax that can be easily translated into LISP, there is no better language design language than LISP. In earlier examples of LISP code it could be seen that the syntax of the special forms was the same as the syntax of userdefined functions. Moreover, macro invocations are also in the same syntax as the rest of LISP. If a LISP programmer adds a new function to LISP, loading that new function into LISP and LISPMachine Software. An important contribution of the possibly informing the compiler of its existencein one of sevLISP machines was to programming languages and the pro- eral ways is sufficient to add that function to LISP exactly as if gramming environment. All of the LISP machine companies it had been designed into the LISP originally. Consider the example added graphics, windowing capabilities, and mouse interaction capabilities to their programming environments. LISP, (defun cube (x) (* x x x)) particularly on the MlT-style LISP machines, grew in comWhen this function is loaded into the LISP system, the plexity and completeness.SETF, DEFSTRUCT, multiple values, and a better style for arrays were important additions to programmer can write LISP based on the MIT LISP machine work. (+ (cubex) (* x y)) At MIT, Flavors, an object-orientedprogramming system, was developed and integrated into the LISP machines. The and the referenceto the function cube is syntactically indistinguishable from the reference to x. window system, in particular, is written in flavors (28). (30) If the user writes led to the deAt Xerox the experiencewith Smalltalk (31), programming an LOOPS object-oriented of velopment (defmacro cube (x) system.
'f,lJ',11:F3sYm)))
Historyof LISP:1980-Present In 1980 the LISP situation was that Symbolics and LMI were developing ZetaLisp, stock-hardware implementation groups were developing NIL, Franz LISP, and PSL, Xerox was developing Interlisp, and the Scientific Personal Integrated Computing Environment project (SPICE) at CMU was developing a Maclisp-like dialect of LISP called Spicelisp. Several of these groups got together and began to define Common LISP. Of the above LISPS, most are Maclisp derivatives. Symbolics, the SPICE project, the NIL project at MIT, and the S-1 LISP project joined together in this effort, which was led by Scott Fahlman, Daniel Weinreb, David Moon, Guy L. SteeleJr., and Richard P. Gabriel (32). The definition of Common LISP was as a description of a family of languages; if certain specificationswere met, a particular LISP dialect could be a member of the Common LISP family. The main goals, stated in the book Common Lisp: the Language, (32) were to be portable, common, stable, powerful, expressive,and efficient. The bulk of the design work was carried on in the form of network mail over the ARPANET. In addition, there were two face-to-facemeetings before the Common LISP book was published. The primary influences on Common LISP were Zetalisp, Maclisp, NIL, S-1 LISP, Spice LISP, and Scheme. Since then, Common LISP has becomea de facto standard, with virtually every commercial LISP vendor and many hardware vendors offering a Common LISP. However, Common LISP is not an ideal LISP (aswas pointed out in Ref. 33), being a compromiseamong many similar and large dialects of LISP. ExtendingLISP A prim ary strength of LISP is the ability to extend the language; with such a facility researchersare free to define their own languages and to then program in them. As long as the language designer is willing either to live with LISP syntax or
(x,var ,var ,var)))) then the compiler will also be able to compile code involving CUBE correctly, and the compiler will also be able to use the optimization techniques it uses for x to optimize uses of CUBE, if possible. GENSYM returns a symbol that has never appeared in the currently running LISP system. In this macro definition it is used to create an identifier that is guaranteed not to conflict with any other identifier that has been declared special. The expression that is to be cubed should be bound to some variable because that expression might perform side effects that cannot be done three times, which is what would happen had the macro been written as
't':i::Ji'e(x) ProgramDevelopmentStrengthsof LISP LISP, being uniform in its own syntax and also in its syntax as extended to the user, is a friendly system in which to develop large systems. There are three other important features of LISP for developing systems: separate compilation, typelessness, and the debugging environment. SeparateCompilation. Separatecompilation is the ability of the compiler to produce codethat calls functions even though the compiler has never seenthe functions being called. That is, if a LISP prograrn comprisesa number of functions, each function can be compiled in completeisolation from the others, and loading all of the functions into a single LISP system will produce a working progfam. Moreover, the LISP program can contain a mixture of compiled and interpreted code,which can help the programmer both develop and debug large systems. A typical use of this feature is to write and debug a portion of a large program before the entire program has beenwritten. Several people can divide the work among themselves and
LlsP only have to agree on the interfaces among the parts and on the macro files to use. It might be the case that better code can be produced when the compiler can analyze all of the functions together, but for development and debugging the modularity is useful. With facilities such as Interlisp block compilation, the advantages of modularity at development time and speedat delivery time can be realized in the same sYstem. A LISP program need not have the types of Typelessness. all, or of any, variables declared in order to producea working program. This speedsup the debugging cycle in somecasesby enabling the programmer to write a simple version of a larger function quickly, to test it out, and to refine the algorithm and data structures without having to make a commitment to the types of atl variables before the program can be tested. Of course, in many situations the need to declare the types of variables can lead the programmer to more carefully consider the program, the algorithm, and the data structures before too much time is wasted "hacking" the program together. But in many other situations, particularly in those where there is no good understanding of the nature of the algorithm, time can be savedif a prototype program can be tested early in the programming cycle. LISP enables the programmer to pursue either one of these program development methodologies. DebuggingEnvironment.LISP providesan extensivedebugglng environment. Because LISP can treat interpreted LISP programs exactly like data, debugging tools can be written to examine and modify programs while they are being debugged. Examples of LISP debugging tools are single steppers,tracers, inspectors, and breakPoints. A single stepper is an addition to the interpreter that enables the progTammer to execute "one line" of LISP code at a time, stopping after each expression, allowing the programmer to examine the LISP environment. This examination is performed by invoking the interpreter in a reentrant manner. Variations on the single stepper include the ability to save state during the stepping processso that the code can be, in effect, steppedthrough backward. A tracer is a function that is invoked on function entry andl or exit. Statements can be printed at these points, conditions can be tested, additions to a database can be made for later analysis, and any LISP action can be taken. An inspector is a progTamthat helps a progTammerexamine an instance of a data structure. For example, the programmer might have defined a complex graph structure using data abstractions. An inspector will help the programmer browse through the instance of the data structure, printing its parts as appropriate for the data structure definitions. Finally, breakpoints can be inserted anywhere in a LISP function so that execution can be stoppedand the LISP environment examined. Code can be a mixture of compiled and interpreted code.
s27
and the function definitions of LISP can be written in LISP and compiled. Such implementations might be slower than implementations that depend on a hand-codedassembly language interpreter, but LISP-in-LISP implementations are easier to understand, to write, to debug, and to port to other machines than implementations based on other techniques. Conclusions LISP is a powerful programming language and environment for developinglarge programs. Artificial intelligence programming requires the flexibility, the extensibility, the modularity, and the underlying data structures and data abstraction facilities that LISP provides. Although LISP is one of the older proglamming languages in use, it has remained the most widely used language in AI programming.
BIBLIOGRAPHY Manua.l,XeroxPaloAlto 1. W. Teitelmanet al.,InterLispReference Center,PaloAlto, CA, 1978. Research 2. The Utah SymbolicComputationGroup,ThePortableStandard Lisp (Jsers'Manual, Department of Computer Science,IJniversity of Utah, Salt Lake City, TR-10, 1982. 3. M. L. Griss and E. Benson,PSL: A Portable Lisp System, Proceedings of the 1982ACM Symposium on Lisp and Functional Programming, August 1982,pp. 88-97. 4. D. Moon, MacLisp ReferenceManual, Reuision 0, MIT Project MAC, Cambridg", MA, April L974. 5. J. McCarthy, Stanford University, private communication, 1985. 6. A. Newell, IPL-V Programmers'ReferenceManual, RM-3739-RC, Rand Corporation Technical Report, 1963. 7. Rand Corporation, The History of the Johnniac, Rand Corporation Technical Report AD-679-L52, Santa Monica, cA, 1968. 8. J. McCarthy, Programs with Common Sense, Proceed,ingsof the Symposium on the Mechanization of Thought Processes,National Physical Laboratory, I, pp. 77 -84, (1958) [reprinted in M. L. Minsky (ed.) Semantic Information Processing,MIT Press, Cambridge, MA, 19681. 9. J. McCarthy, P. W. Abrahams, D. J. Edwards, P. A. Fox, T. P. Hart, and M. J. Levin, Lisp 1.5 Prograrnmer'sManual, MIT Press, Cambridge,MA, L962. 10. J. Lukasiewrcz, "Philosophiche benerkungen zrr mehrwertigen systemen des aussagenkalkuls", ComptesRendus des Sdancesde la SocieteScienceset de la Lettres des Varsouie23(ClasseIII), 5177 (1930); see also same author in "Zu.r Geschichteder Aussagenlogik" Erkenntnis 5, 111-131 (1934) Berlin (original in Polish). 11. W. V. Quine, Mathematical Logic, Harvard University Press, Cambridge,MA, 1961. 12. J. McCarthy, "Recursive functions of symbolic expressionsand their computation by machine, part I," CACM 3(4), 184-195 (1960).
13. E. C. Berkeley and D. G. Bobrow, The Programming Language Lisp: Its Operation and Applications, MIT Press, Cambridge, MA, Modern LISP implernentations are almostly entirely written 1964. in LISP. Bootstrapping the LISP involves writing a cross-com- 14. F. J. Corbato et &I., An Experimental Time-sharing System, piler that runs in some other dialect of LISP, perhaps on some AFIPS ConferenceProceedings,Spring Joint Computer Conference,Vol. 21, pp. 335-344, L962. other machine. AII of the code that makes up the interpreter
RealLISPlmplementations
528
IISP MACHINES
15. D. Eastlake,R. Greenblatt,J. HoIIow&y,T. Knight, and S. Nelson, ITS 1.5 ReferenceManual, MIT Artificial Intelligence Memo No. 1614, 1969. 16. J. L. White, An Interim Lisp User's Guide, MIT Artificial IntelligenceMemo No. 190, 1970. L7. R. Greenblatt, The Lisp Machine, Working Paper 79, MIT Artificial Intelligence Laboratory, Cambridge, MA, L974. 18. J. P. Golden and J. L. White, A User's Guide to the A. I. Group LISCOM Compiler: Interim Report, MIT Artificial Intelligence Memo No. 210, December1970. 19. B. Harvey, Monitor Command Manual, Stanford University, SAILON 54.7, Stanford, CA, 1982. 20. D. Bobrow, and B. Wegbreit, "A model and stack implementation of multiple environments,"CACM 16(10),591-603(October19?3). 2L. J. B. Marti, A. C. Hearn, M. L. Griss, and C. Griss, Standard Lisp Report, University of utah, salt Lake city, uucs-79-101, 197g. 22. L. Quam and W. Diffie, Stanford Lisp 1.6 Manual, Stanford University, SAILON 28.7, Stanford, CA, L972. 23. R. J. Bobrow, R. R. Burton, and D. Lewts, UCI Lisp Manual (An Extended Stanford Lisp 1.6 System), University of California at Irvine, Information and Computer ScienceTechnical Report No. 2t, L972. 24. G. S. Burke, G. J. Carrette, and C. R. Eliot, NIL ReferenceManual, MassachusettsInstitute of Technology, MIT/LCS/TR-311, 1983. 25. F. U. Blair, The Definition of Lisp 1.8+0.3i, Unpublished Memo, IBM Thomas J. Watson ResearchCenter, Yorktown Heights, NY, r97 5. 26. R. A. Brooks,R. P. Gabriel, and G. L. Steele,An Optimizing Compiler For Lexically ScopedLisp, Proceedingsof the 1982 ACM Compiler Construction Conference,June L982, pp. 261-275. 27. R. A. Brooks, R. P. Gabriel, and G. L. Steele,S-1 Common Lisp Implementation, Proceedings of the 1982 ACM Symposium on Lisp and Functianal Programming, August 1982,pp. 108-113. 28. D. Weinreb and D. Moon, Lisp Machine Manual, 4th ed., Massachusetts Institute of Technology Artificial Intelligence Laboratory, Cambridge,MA, July 1981. 29. S. J. Moore, The InterLisp Virtual Machine Specification,Xerox PARC CSL-76-5,L976, 30. A. Goldberg and D. Robson,Srnalltalk-8}: The Language and its Implementation,Addison-Wesley,Reading,MA, 1983. 31. D. G. Bobrow and M. J. Stefik, The Loops Manual, Intelligent Systems Laboratory, Xerox Corporation, 1983. 32. G. L. Steele, Jr. et &1.,Common Lisp: The Language, Digital, Burlington, MA, 1984. 33. R. A. Brooks and R. P. Gabriel, A Critique of CommonLisp, Proceedings of the 1984 ACM Symposium on Lisp and Functional Programming, August 1984, PP. 1-8-
D. Bobrow and B. Raphael, "New programming languages for artificial intelligence,"ACM Comput.Suru.6(3), 153-124 (I974). R. R. Burton et al., Interlisp-D Overview,Paperson InterLisp-D,Xerox Palo Alto ResearchCenter, CIS-5 (SSL-80-4),1981. E. Charniak, C. Riesbeck, and D. McDermott, Artificial Intelligence Programming, Lawrence Erlbaum, Hillsdale, NJ, 1980. J. Cohen, "Garbage collection of linked data structures," ACM Comput. Suru.13(3),341-367 (September1981). L. P. Deutsch and D. Bobrow, "An efficient incremental, automatic garbage collector," CACM 19(9), 522-526 (September1976). J. K. Foderaroand K. L. Sklower, The FRANZ Lisp Manual, University of California, Berkeley, CA, April L982. R. P. Gabriel , Performanceand Eualuation of Lisp Systems,MIT Press, Cambridg", MA, 1985. J. Marti, A. C. Hearn, and M. L. Griss, "Standard Lisp report," S/GPLAN Notices L4, October L0, L979. L. Masinter, InterLisp-VAX: A Reporf, Department of Computer Science,Stanford University, STAN-CS-81-879,August 1981. L. M. Masinter and L. P. Deutsch,Local Optimization For a Compiler for Stack-basedLisp Machin es,Paperson InterLisp-D , Xerox Palo Alto ResearchCenter, CIS-5 (SSL-80-4),1981. J. McCarthy, History of Lisp, in D. Wexelblat (ed.),History of Programming Languages,Academic Press, New York, 1978. J. Moses,"The function of FUNCTION in Lisp," ACM S/GSAM BuIl. 13, 27 (1970). G. L. Steele, Jr. and G. Sussman,LAMBDA-T6e Ultimate Imperatiue,Memo No. 353, Artificial IntelligenceLaboratory,MIT, Cambridge, MA, 1976. G. L. Steele,Jr., Data Representationsin PDP-10 Maclisp, Proceedings of the 1977 MACSYMA Users' Conference,NASA Scientific and Technical Information Office,Washington,DC, July 1977. G. L. Steele,Jr., Fast Arithmetic in Maclisp, Proceedingsof the 1977 MACSYMA Users'Conference,NASA Scientific and Technical Information Office,Washington, DC, July 1977. G. L. Steele, Jr. and G. Sussman, The ReuisedReport on Scheme:A Dialect of Lisp, Memo No. 452, Artificial Intelligence Laboratory, MIT, Cambridge,MA, 1978. G. L. Steele,Jr. et al., An Overview of Common Lisp, Proceedingsof tlrc 1982 ACM Symposium on Lisp and Functional Programming, August 1982, pp. 98-107 . D. Touretzky, Lisp-A Gentle Introduction to Symbolic Computation, Harper & Row, New York, 1984. J. L. White, NIL: A Perspective,Proceedingsof the 1979 MACSYMA U sers'Conference, July L979,pp. 190-199. P. H. Winston and K. P. H. Horn, Lisp, 2nd ed., Addison-Wesley, Reading,MA, 1984. R. GaSRIEL Lucid, Inc.
General Referenees H. Abelson and G. Sussman,Structure and Interpretation of Computer Progranzs,MIT Press, Cambridge, MA, 1984. J. Allen, Anatomy of Lisp, McGraw-Hill, New York, 1978. H. B. Baker, "List processing in real time on a serial computer," CACM 2r(4),280-294 (April 1978). H. B. Baker, "shallow binding in Lisp I.5:' CACM 2t(7), 565-569 (July 1978). D. Dyer, and H. Koomen, Implementation of Interlisp on a Bates, R. Vax, Proceedingsof the 1982 ACM Symposium on Lisp and Functional Programming, August 1982. D. Bobrow and D. Clark, "Compact encodingsof list structure," ACM Trans. Prog. Lang. Sys. l(2), 266 (October 1979).
L I S PM A C H I N E S The development of commercially available LISP (qv) processors has made a dramatic impact on the use of AI and LISPbasedtechnology. Once oriented almost exclusively toward research, LISP machines have catalyzed widespread interest in the commercial application of this technology. The genesisof today's commercial LISP machines began in 1973 at the MIT AI Laboratory. The key to achieving high throughput for LISP programs was to integrate the run-time Typed nature of the language with a tagged hardware architecture. A thorough review of past evolution was done, and a
LISP MACHINES
LlSP-oriented macrocodewas defined. In addition, two new LISP extension resulted: the AREA feature for storage management and the PACKAGE system allowing multiple LISP programs to coexist. In 1974, after completing this initial software, work began on hardware. Called the CONS machine (named after the CoNstructor operator of LISP), it had atagged architecture,z bit-mapped display (inspired from work done at both Xerox and MIT), and a writable control store. The CONS was completed in 1976. In late I97 6 a secondgeneration, the CADR, was started (CADR is the LISP function for selectingthe secondelement of a LISP). Operational in L977,the CADR exhibits many of the features that are seen on today's LISP machines. These are describedbelow. LargeVirtual-Address Space.The CADR had a 16-megaword (32-bit-word) virtual-address space.In comparison,the DEC PDP-10,the standard researchtool of the duy, had 0.25 megawords (36-bit word). Graphical Display Console.The display on the CADR was 800 x L024 monochrome, which provided a high-resolution display io the user. Along with th; keyboard (providing a superset of the ASCII character set), the CADR also provided a mousepointing device promoting the utilization of menus and other point-sensitive graphical objectsin lieu of keyboard commands. GarbageCollection. The tagged architecture of the CADR combinedwith somehardware assist allowed for the first realtime or incremental garbage collector for LISP systems.Earlier LISP garbagecollectors were charactertzedby occasional pauses in LISP computation while the garbage collector executed. The CADR approach allowed interleaved execution. LISP Software Environment.The MIT ZETALISP system (originally called LISP Machine LISP) was, and is, a single, fully integrated, run-time system consisting of only LISP and microcode,providittg a uniform program developmentand execution environment. ZETALISP served as a testbed for objectoriented programming languages (qv), receiving inspiration from the Smalltalk system developedat XEROX. This effort culminated in the FLAVORS system, a second-generationextension of Smalltalk, with control fully integrated into LISP. One of the utilities developed for the CADR was the ZMACS editot, d superset of the EMACS editor originally developed on the PDP-10. ZMACS provides the user with over 400 commands,many of which are LlSP-specific,and also allows users to write their own commands.The CADR software also includes the Window System, the basic user interface managing the high-resolution display (written in FLAVORS), and the INSPECTOR, a data-structure-examination facility that makes heavy use of the graphic capabilities of the display. MIT built 35 CADRs to be used internally. In 1980 both LISP Machine (LMI) and Symbolics were granted licenses from MIT for utilization of LISP machine technology. LMI started producing CADR systemsin May 1981. Symbolicsrepackaged the CADR to be more production compatible and producedthe LM-2 in september 1981. The LMI CADR and the Symbolics LM-2 were functionally identical. Both companies realized that new-generation machines were required. Symbolics developed their second-generation
529
LISP machine, the 3600, and delivered the first unit in December L982. LMI introduced the Lambda in August 1983. Although the 3600 and the Lambda both execute variants of the ZETALISP sofbwareenvironment, and as a result are quite similar, the hardw are architectures of the machines are extremely different. The 3600, which takes a totally different approachthan the Lambda to the CADR architecture, is a unit processorwith an instruction-fetch unit to increasemacrocode performance. The 3600 has 8,000 words of control memory, a 36-bit word and a variable-leneth tag field. The system provides a 68,000 I/O front end and has a 256-megawordvirtualaddressspace. The Lambda provides the user with a bus-centeredarchitecture basedon the NuBus, a processor-independent bus originally developed at MIT's Laboratory of Computer Science. There is also an integral MULTIBUS for accessto third-party peripherals. The LISP processormaintains the architectural philosophy of the CADR with extensive modifications and improvements. It provides users with a 32-megaword virtualaddressspaceand supports a 64,000-worduser-programmable control memory. An optional microcompiler is also supported providing direct translation of LISP source to system microcode. Since their initial introduction, both Symbolics and LMI have introduced additional production options all basedon the same processorarchitecture. With technology licensed from LMI, Texas Instruments (TI) has also introduced a LISP machine in the spring of 198b, called the Explorer. Distributed by both TI and LMI, the Explorer is a compact unit-processormachine utilizing an architectural philosophy similar to the Lambda and the CADR. The Explorer is based on the NuBus and provides the user with a l2-megaword virtual-address space. Its processorresides on one printed circuit (PC) board. Near-term commercial offerings of LISP machineswill obey certain guidelines. First, the cost per user must becomemore competitive with non-LISP machine work-station technology. Second,LISP machines must be able to be more readily interfaced to generic-mainframe computing environments through networks, system-levelbus links, low-speedperipheral buses, and optional processors.Finally, performance will continue to increase as processorarchitectures are refined. In the next 5-10 years the goal of most AI systems is to search a large state spaceas efficiently as possible.With current LISP machine technology, that search procedure is carried out by a single processor.Researchis currently underway to allow a large number of processorsto participate in the search simultaneously, thus decreasing the time required to complete the search by many orders of magnitude. These machines, when commercially available, will eataryzea major leap in the applicability of AI. This advance,although similar to the effect that commercial LISP Machines had on AI, will, by nature of the problems that it will allow to be solved, be much larger. General
References
R. Brooks and R. Gabriel, A Critique of Common Lisp, Proceedings of the 1984 ACM Symposium on Lisp and, Functional Programming, pp. 1-8. R. Gabriel, Performance Eualuation of LISP System.s, MIT press, Cambridge, MA, 1985. R. Greenblatt, T. Knight, J. Holloway, D. Moon, and D. weinreb, The Lisp Machine, in D. Barstow, H. Shrobe, and E. Sandewall (eds.),
530
Al LITERATURE, I nteractiueP rogramming E nuiro nment s, McGraw-Hill, New York, 1984,pp. 326-352.
K. Kahn and M. Carlsson, How to Implement Prolog on a Lisp Machine, in J. A. Campbell (ed.), Implementations of Prolog, Ellis Horwood, 1984. E. Sandewall, Programming in an Interactive Environment: The Lisp Experience,ACM Computing Surveys,March, 1978, in Interactiue Programming Enuironments, McGraw-Hill, New York, 1984, pp. 31-80. B. Shiel, Power Tools for Programmers, rn Interactiue Programming Enuironments,McGraw-Hill, New York, 1984,pp. 19-30. D. Weinreb and D. Moon, The Lisp Machine Manuol, MIT Technical Report. S. Wholey and S. Fahlman, The Design of an Instruction Set for Common Lisp, Proceedingsof the 1984 ACM Symposium on Lisp and Functional Programming, pp. 150-158.
field on modeling human cognition vs. extending earlier concepts of regulating machinery. In some sense, AI was true heresy, whereas cybernetics was a logical extension of the capabilities of existing mechanisms. The true inspirational origins of AI seem clearly related to the work of computing pioneer Alan Turing. Turing's early papers, such as "Intelligent Machinery," [written L947, published 1969 (5)l and "Computing Machinery and Intelligence," lwritten in 1950,reprinted in 1963 (6)], are seeneven today as remarkably endowedwith the philosophical emphasisof much more modern AI research. Beginnings of Al: The 1950s
The AI literature of the 1950s consists largely of scattered journal articles published in the literatures of other disciplines in which its early pioneers began: computing, psychology,enR. GnnENBLArr,G. Cuner, ANDM. KnEEGER gineering, and mathematics. LISP Machines
Game-PlayingPrograms.Of particular interest at this time is early work on game-playing (qv) programs, which has been an enduring subdivision of AI since its beginnings. Gameplaying behavior bridges the boundary between automata and human behavior in a nearly unique manner. Thus, the earliest AI LITERATURE, work on chess is by Shannon, as in "A Chess-Playing MaArtificial intelligence has undergone considerablechangesin chine," (February 1950) (7) and "Programming a Computer for its history, and each change has been accompanied by a Playing Chess," (1950) (8). Work with a more cleanly AI charchangein the nature of the AI literature. Four historic periods acter followed later, os in "Chess Playing Programs and the may be identified: Pre-AI, Beginnings of AI, Pre-Expert-Sys- Problem of Complexity," by Newell, Shaw, and Simon [1958, tems AI, and Commercial AI. Only in the last period has there reprinted 1963 (9)l (seealso Computer chessmethods). begun to emerge an awarenessof AI on the part of the general Automata Theory. A field parallel to AI at the time was public, and with it a recognition by the scientific and referautomata theory. Automata Studies (1956),by Claude Shanence-publishingcommunity of a needto packagethe literature non and John McCarthy (10) summarized the work in automof AI for easier access.Being a field with multidisciplinary origins, AI has until now been very difficult to track through ata theory, but this work also represented an uneasy bond the labyrinth of indexing and bibliographic tools designedto between these pioneers and one that led to the dramatic namserve more traditional fields such as psycholory, mathematics, ing of AI as a separate field shortly thereafter. engineering, and computing. To this date there is no one of DartmouthConference.Artificial intelligence as a term was these disciplines within which AI can be said to wholly reby John McCarthy in proposing a now historic confercoined in these librarians help side-and henceno traditional tools to ence held at Dartmouth College in the summer of 1956. The disciplines completely cover the AI literature. AI has also been among the first fields to exploit additional name was chosen deliberately to set the field apart from the existing domains of cybernetics and autom ata. forms of information not associatedwith traditional library reof technical science. These have included extensive use TechnicalReports. In the aftermath of the Dartmouth Conports, on-line machine-readabletext fiIes, and electronic-mail ference of 1956, the established centers of AI were formed. digests. MIT, BBN, CMU, Stanford University, and the then Stanford ResearchInstitute (now SRI International) becamepioneering Pre-Al Literature:1940s centers of AI research. Their impact on the early literature, Cybernetics.Norbert Weiner and Julian Bigelow coinedthe however, was largely available only in technical reports and term cybernetics(qv) to describewhat they saw as "the essen- published articles. The book as a factor in AI had yet to become tial unity of the problems centering about communication, commonplace.Fortunately for the contemporary follower of control, and statistical mechanics, whether in the machine or AI, the recognition of the importance of these original techniin living tissue." In L943 two papers were published which cal reports has led to their being reprinted on microfiche for Seymour Papert describesas the birth of cybernetics (1): "Be- general dissemination. Scientific Datalink, a division of Compackhavior, Purpose,and TeleoloW," by Rosenblueth,Wiener, and tex Scientific Corporation, has brought to the market AI Research CMU 525 the containing microfiche of sets aged in Immanent Ideas (2) the of Calculus and "A Logical Bigelow Reports since 1956, the 45L MIT AI Laboratory Reports since Nervous Activity," by McCuIIoch and Pitts (3). 1960, The more famous work, by Wiener, was of courseCybernet- 1958,the 279 BBN Artificial Intelligence Reports since SRI 318 the 1963, and since Memoranda AI Stanford 350 the (a). cyberHowever, although ics (lst ed. 1948, 2nd ed. 1961) and centers research Other 1968. since reports Technical AI AI. of origin the true be not to it was d"y, netics persists to this (11). added to be The distinction seems to have been between basing the new universities continue
tlST PROCESSING.See LISP; PROLOG.
Al L|TERATURE,
531
reading. Its royalties were used to establish the "Computers and Thought" lectures, which are given biennially at the fournals. The use of technical reports may have been forced IJCAI conferences. on the fledgling field as early works in AI did not fall within The British AI effort began publishing the classicMachine the traditional characterizations of existing periodicals. Early Intettigence Series (edited by Donald Michie et al.) in L967 AI was published in journals belonging to the fields in which (28). However, it was nearly 5 years after Cornputers and its practitioners had obtained their degrees-but only in limThought untll Minsky brought together a set of papers and ited quantities. AI articles were published within Communi- dissertations from MIT to produce another classic anthology, cations of the Association for Computing Machinery QACM) Semantic Information Processing(1968) (29). This year also (L2) and Journal of the Associntion for Computing Machinery marked the beginning of an essentially steady stream of new (JACM) (13) throughout the 1960s. Such classics as Sim- AI books: Minsky and Papert's Perceptrons (qt) (1968) (30), mons's question-answering system surveys (L4,15), Weizen- Simon's Sciences of the Artificial (1969) (31), and Ernst and baum's ELIZA (16), Quillian's Teachable Language Compre- Newell's GPS: A CaseStudy in Generality and Problem Soluhender (L7), Woods'sTransition Networks (18), and Simmons ing (qv) (1969) (32). and Slocum's natural-language-generation paper (19) all appeared in CACM, while JA CM published the early heuristic search and theorem-proving papers of Slagle and Dixon Al: The 1970s (20,2L). With such a shortage of journals in which to publish fournals. The 1970sbegan auspiciously with AI's first jourAI work, it became acceptedpractice at rnajor universities at which AI developed to publish theses and dissertations as nal, appropriately named Artificial Intelligence (1970-) (33). technical reports. Only considerably later did these works be- The International Journal of Man-Machine Studies (34), which also publishes AI articles, began the year before, in gin to appear in edited collections as books. One exception to this immersion of AI in other discipline's 1969. The Association for Literary and Linguistic Computing beliteratures is the literature of computational linguistics (qv). one gan publishing the ALLC Bulletin (L973-) and later the ALLC Perhaps because it could easily be seen as relevant to (1980-), which again supplemented the computahas a linguistics Journal major discipline, linguistics, computational solid and early (ancient) journal-level heritage. The Finite tional-linguistics literature. String began publication in 1964 (22) and was edited by Hood Books. The year L972marks the date of the appearanceof a Roberts until 1973. The much earlier MT: Mechanical Transthe by AI dissertation as a separatebook, Winograd's Un(23), was adopted significant 1954, in Ynrye founded by lation Association for Computational Linguistics (ACL) as Mechani' derstanding Natural Language (L972) (35). Winograd also repcal Translation and Computational Linguistics in 1965 but resents the boundary between what might be termed the firstwould only survive until 1968. In I97 4 the ACL began pub- generation AI authors and the secondgeneration. Winograd lishing the American Journal of Computational Linguistics, was a student of Minsky. There would be new pioneers after shortening its name in 1984to Computational Linguistics. The Winograd, to be sure, but the gfoundwork of AI had been Finite String continues to be published as a part of Computa- established.Winograd's work was not only a landmark in this tional Lingui.sfics (24). Supplementing the computational-lin- way but was also a significant advance for computational lin' guistics literature, Computers and the Humanities (25) ap- guistics. It was soon followed by further significant computational linguistics books such as Schank and Colby's (eds.) peared in 1966. Camputer Models of Thought and Language (1973) (36); Rusjournal literature, AI tin's (ed.)Natural Language Pracessing(1973) (37); Schank's ConferenceLiterature. Just as in the (ed.)ConceptualInformation Processing(1975)(38); Charniak AI sessions of the context within first emerged at conferences at other establishedmeetings. The Spring and Fall Joint Com- and Wilks' (eds.)Computational Semantics:An,Introduction to puter Conferencesof the late 1960scontained major early pa- Artificial Intelligence and Natural Language Comprehension pers in AI before the ongoing UCAI conferenceseries started (1976) (39); Schank and Abelson's Scripts, Plans, Goals and Understanding (L977) (40); Walker's (ed.) Understanding Spoin 1969 (26). As in journal literature, computational linguistics ap- ken Language (1978) (4L); Fahlman's dissertation, NETL: A peared earlier than AI itself. The ACL first met in conjunction System for Representing and Using Real-World Knowledge (1979) (42); and Findler's (ed.) semantic networks survey colwith the 1963 ACM National Conference.The first International Conferenceon Computational Linguistics (now termed lection, Associatiue Networks: Representation and Use of Knowledge by Computer (1979) (43). COLING) occuned in 1965 (L22). Cognitive Science. Also during the 1970s several works Books. Much of the earliest AI work appearedin a classic were published that clearly marked the creation of "cognitive volume edited by Feingenbaum and Feldman entitled Com- science"(qrr) and its eventual semiseparationfrom AI as that puters and Thought (I96U Q7). This work, featured in early portion of AI research concerned primarily with emulating coursesin AI throughout the United States, included the first human performance, whether conect or abnormal las in CoIcomprehensive bibliography of the earliest AI literature. It by's Artificial Paranoia: A Computer Simulation of Paranoid organized the literature with an elaborate system of cross- Processes(1974) (44)l rather than merely performing tasks at referencesby subject areas-clearly declaring the highly in- a human level of proficiency. Anderson and Bower's Humnn terdisciplinary nature of AI research. The remarkable thing AssociatiueMemory (1973;rev. €d., 1980)(45,46)clearly repreabout this book is that it remained for so long virtually the sented the cognitive psychologist's perspective. Bobrow and only AI book at the top of everyone's list of recommended Collins's (eds.)Representationand Understanding (1975) (47)
Al: The 1960s
532
AI LITERATURE,
may have been the first book to introduce the term "cognitive 'Winston's (ed.) The Psychology of Computer Vision science." (1975) (48) lcontrasting with the totally engineering view of Duda and Hart's Pattern Classification and Scene Analysis (1973)@9)only 2 years priorl, Norman and Rumelhart's (eds.) Explorations in Cognition (1975) (50), and Lindsay and Norman's Human Information Processing (1977, 2nd ed.) (51) added to the conceptof AI as a new paradigm for psychology. lPsychologists such as Miller and Johnson-Laird also began contributing literature to this parallel channel of development with their Language and Perception (1976) (52). In L977 the first issue of the journal Cognitiue Science (53) appeared, firmly establishing the field as desiring its own publication apart from whatever avenue existed for AI articles. Overviews.Another new trend in AI books at this stage is the generation of a new set of general overviews of AI. Jackson, a student at MIT, produceda comprehensivesurvey of AI in his Introduction to Artifi,cial Intelligence (L974) (54), which startled many in that it was a book about AI written not by one of its primary practitioners but by someonefrom the outside looking in. The professionalshad been too busy "doing" AI to describe it in overview. This soon changed with Winston's Artifi,cial Intelligence(L977)(55).Prior to this the term AI had been used more in the senseof "This book is about my view of something called AI" [such as in Fogel'sArtificial Intelligence Through Simulated Euolution (1967) (56), Banerji's Theory of Problem Soluing: An Approach to Artificial Intelligence(1969) (57), Nilsson's Problem Soluing Methods in Artifi,cial Intelligence Q971) (58), Slagle'sArtifi.cial Intelligence:The Heuristic Programming Approach (1971) (59), and Arbib's The Metaphorical Brain: An Introduction to Cybernetics as Artificial Intetligence and Brain Theory (L972) (60)1. The field had reached its first self-consciousstage. the 1970s Anti-At Literature.Speakingof self-consciousness, chords disharmonious first the of beginnings the also marked to the technological symphony of AI authors. Dreyfus's first book, What Computers Can't Do: A Critique of Artifi,cial Reason (1972) (61), and Weizenbaum'sComputerPower and Hurllan Reason: From Judgement to Calculation (1976) (62) sounded two notes of concern. AI was not going to be able to claim successeswithout being challenged that its results were misleading. A recent study of the philosophical aspectsof AI and the limits of AI is Torrance's The Mind and The Machine (1e84)(63). of Al. The AI progTammittg USP,theProgrammingLanguage language LISP (qt) saw a number of new books appear. Whereas McCarthy's "Blue Book" LISP 1.5 Programming's Manual ft96il (64) and Weissman'sLISP 1.5 Primer (1967) (G5)were virtualty the only LISP manuals available for the 1960s; Friedman's The Little LISPeT (L974) (66), Siklossy's Let's Tatk LISP (1976) (67), Allen's A natomy of LISP (1978) (68), and Meehan's New UCI Lisp Manual (L979) (69) appeared in the 1970s as well as the Interlisp manual from XEROX which appearedin several versions. AI History Literature.In L979 Pamela McCorduck's Machines Who Think: A Personal Inquiry into the History and Prospectsof Artificial Intettigence(1979)(1) becamethe field's first true history book. McCorduck effectively captured the spirit and goals of AI up to that year. However, the field was evolving rapidly. The next few years would seea virtual explosion of work in AI as the new hardware and software displayed its potential for an ever larger audience of observers.
Al: The 19B0s Books. The 1980ssaw a number of substantial bookson AI being released every year. During 198L-L982 Barr, Feigenbaum, and Cohen edited together the first truly comprehensive survey of AI, The Handbook of Artifi.cial Intelligence, volumes 1-3 (70). This book, although now beginning to age slightly, is an unparalleled introductory exposition of the field and the single best starting source for someonetrying to understand AI. AI also began to expand its horizons to include a growing applied and lay audience.AI Magazine(71) published its first issue in 1980 as if to symbohzethe beginning of this new view of AI. Other significant books issued in the 1980sinclude a number in computational linguistics, such as: Marcus'sA Theory of Syntactic Recognition for Natural Language (1980) (72); de Beaugrande's Text, Discourse, and Process:Toward a Multidisciplinary Scienceof Texts (1930) (73); Joshi, Webber, and Sag's(eds.)Elements of Discourse Understanding (198D QD; Sager's Natural Language Information Processing: A Com' puter Grammar of English and its Applications (1981) (75); Schank and Riesbeck's(eds.)Inside Computer Understanding: Fiue Programs Plus Miniatures (1981) (76); Harris's A Gramrrlar of English on Mathematical Principles (1982) (77); King's (ed.)Parsing Natural Language (1983) (78); Winograd'sLanguage as a Cognitiue Process,volume t, Syntatc (1983) Qil; Simmons'sComputationsfrom the English (1984)(80); Sowa's Conceptual Structures: Informatian Processing in Mind and Machine (1984) (81), and Sparck Jones'sA utomatic Natural Language Parsing (82). Additionally, general AI continued to see new books appearing every year, such as: Banerji's Artificial Intelligence:A TheoreticalApproach (1980)(83),Nilsson'sPrinciples of Artificial Intelligence (1980) (84), Rich's Artificial Intelligence (1983), the secondedition of Winston's Artificial Intelligence (1984) (55), and Charniak and McDermott's Introduction to Artift,cial Intelligence (1985) (85). USP.The 1980s saw the introduction of many new LISP books,featuring more integrated approachesto using LISP for AI applications as in Charniak's Artifi,cial Intelligence Prograrnrning (1980) (86) and Winston and Horn's LISP [1981, 1984(2nd ed.)l (87),rather than just describingthe syntax of a particular dialect or tutoring beginners on how to write syntactically valid LISP code.LISP also becamethe subject of its own series of conferences,with the first being held at Stanford in 1980, ConferenceRecord of the 1980LISP Conference(88). LISP underwent a dramatic increase in dialects as new versions for minicomputers LFranz LISP, as described in Wilensky'sLISPcraft (1984)(89)l and work stations [e.g.,Apollo's Portable Standard Lisp GSL)I and even personal computers (e.g.,IQ LISP for the IBM PC) began to appear.DARPA exercised its influence to attempt to bring together the several dialects of LISP that existed, and "Common Lisp" was coined as a term for this approved dialect. Steele's Common LISP: The Language (1984) (90) describesthis version, and the implementation called "Golden Common LISP" provides the PC counterpart. Logic Programming and PROLOG.PROLOG (qv) first appeared as a book in Clocksin and Mellish's Programming in Prolog (1981) (91) (see Logic programming). Unlike LISP, which appeared at a time when AI was not rapidly churning out books, PROLOG has appeared at a time when AI's major
LITERATURE, AI
form of publication is the book. Hence several new PROLOG books now appear every year. For example, Clark's Logic Prograrnming (1982) (92), Campbell's Implementations of Prolog (1984)(93), and Li's A PROLOG DatabaseSystem(L984)(94). TheFifth Generation. In 1983 Feigenbaum and McCorduck published The Fifth Generation:Artificial Intelligence and Japan's Computer Challenge to the World (95), and the field of AI has not been the same since (see Computer systems). This book, intended for a mass audience, sounded an alarm not unlike that which the launching of Sputnik by the USSR inspired in the 1950s.It was a call to U.S. industry and government to accept the fact that the Japanesehad mastered computer technology to such a degree that without major commitments to research and development by the United States, they would becomethe dominant technological nation of the world. The very fact that you are reading the Encyclopedia of AI at this point may be due to the responseoccasionedby this reahzation. In 1984 a bibliography by Bramer and Bramer, The Fifth Generation: An Annotated Bibliography (96) and in 1985 and introduction by Shirai and Tsujii, A rtificial Intelligence-Concepts, Techniques, and Applications, drawn from the Japanesefifth-generation computer program (97) appeared. ExpertSysfems.Professionally,AI changedgears.An entire new audience tens or hundreds of times larger than the previous group of researchersfocusedattention on the "new" field of AI. Some shifts in the semantics of AI terminology occurred with "expert systems," heretofore but one small methodolory for AI research, becoming a more generic label apt to be applied to nearly any AI approach. A vast new crop of books designedto tutor newcomersto AI appeared.The book,in fact, became the standard means of publishing AI, with dozensof dissertations appearing as books rather than just as technical reports. Among the books recently to have appeared dealing with expert systems are Hayes-Roth's Building Expert Systems (1983)(98), Bigger and Coupland'sExpert Systems;A Bibliography (1984) (99), Forsyth's Expert Systems: Principles and CaseStudies (1984)(100),weiss and Kulikowski's A practical Guide to Designing Expert Systems(1984) (10), and Naylor's Build Your Own Expert System(1985)(102). AI in Business.Perhaps one of the most startling developments to have occurredin the AI field is the effort by businessmen to understand the new technology. AI became a cover story in publications such as Fortune and TIME, which even featured a "computer" as the Man of the Year. Today, nearly every discipline's journals may contain articles on AI in an effort to explain how it will impact on fields from aviation through zoolory. The NYU Symposium on Artificial Intelligence Applications for Business(1984)(103),Johnson'sThe CommercialApplication of Expert SystemsTechnology(1984)(104),Winston's The AI Buslness.' The Commercial Uses of Artificiat Intettigence (1984) (105), and Harmon's Expert Systems-Artificial Intelligencein Business(1985)(106)all evidencethis commercial trend. fournals. A number of new journals started in 1983-198b. They include Computational Inteltigencellntelligence Informatique (107), Expert Systems(108),Data and Knowled,geEngineering (109), Future GenerationsComputer Systems(FGCS) (110), Journal of Automated Reasoning (111), The Interna-
533
tional Journal of Intelligent Systems QI2), The Journal of Logic Programming (113), and New Generation Computing (114). Newsletters.Several business-oriented newsletters, the sure earmark of an era of economicconcern,appeared,including: Applied Artificial Intelligence Reporter (115), the Spang RobinsonReport (116),and KnowledgeEngineering (117). Conferences.In 1980 the AAAI created its own conference series (118). Today, several new conferenceseries have been started by local institutions and national AI conferencessuch as the Canadian AI conferences(119), and European AI conferenceshave begun to proliferate . The Proceedingsof the First Conferenceof the European Chapter of the Associationfor Computational Linguistics appearedin 1983 (120). Morgan Kaufmann Publishers have provided a single source for all of the IJCAI and AAAI proceedings-greatly simplifying accessto these originally separately issued AI books. However, AI has moved so rapidly into applications that new subsets of AI literature are separating from the mainstream and forming their own conferencesand literatures. Robotics is certainly one such discipline. Medical AI is another. Computer vision, factory automation, military and spaceapplications of AI, speechprocessing,educationalusesof AI, and of course the businessman's interest in expert systems also seem to be evolving into separate AI spin-off literatures (see Computer-integrated manufacturing; Education applications; Expert systems; Medical-advice systems; Military applications; Robotics;Speechunderstanding; Vision, early). Electronic Text. The development of the ARPANET as a mechanism for the communication within the AI community and early access to computer text-formatting facilities at many ARPANET sites led to the creation of on-line technical reports at a few major institutions such as MIT, Stanford, and CMU. The evolution of electronic-mail messagesinto electronic-mail digests has led to the creation of an AI electronicnews digest, AILIST, by Ken Laws at SRI-AI.ARPA and a separate mailing list for AI in Education, AI-ED, by Mark Richer at Stanford's SUMEX-AIM.ARPA. Other mailing lists such as the IRLIST, for information-retrieval news and the PHILOSOPHY-OF-SCIENCE mailing list additionally emphasizeAl-related themes in their texts. In addition, there are numerous mailing lists for Al-related hardware and its accompanying softw are, such as the Symbolics LISP Users Group mailing list, SLUG, and other lists for other advanced work stations.A full list of the ARPAI{ET mailing lists is available on the ARPANET at the Network Information Center. The Future What of the future of AI literature? Clearly the publication of standard reference works such as handbooks, encyclopedias, and even dictionaries seems likely. The periodical literature still lacks a coherent AI literature survey, but much progress has been made in volumes such as the Report Store'sBibtiography of the AI Literature (121). Will AI develop new media any further than other disciplines have? The creation of a futl-text AI literature database would seem reasonable,but this goal has eluded many other disciplines before AI. Microfiche seemsan uncomfortable solu-
534
LITERATURE, Al
tion to the newer literature of AI, which is all created on electronic text-formatting machinery-yet only seespublication as printed paper disseminated via traditional methods. The number of articles on AI published in magazines and journals seemsto be growing rapidly with a count of AI articles tn The BusinessPeriodicals Index showing 15 articles in 1981182,only 6 in 1982/83,but then jumping to 49 in 1983i84 and 88 in L984185. It remains to be seen whether the AI literature will be as enduring as the literatures of the sciencesor rather fall into a pattern more like that of engineering with rapid technological obsolescence.Most Alers would describe the AI literature of little more than a decadeago as largely only of historic interest tod ay, yet more and more AI literature from this period is being made available every year as it is retroactively reissued by new publishers. Clearly the AI literature will continue to evolve as the field grows larger.
BIBLIOGRAPHY 1. P. McCorduck, MachinesWhoThink,A PersonalInquiry into the History and Prospectsof Artificial Intelligence,W. H. Freeman, SanFrancisco, CA, 1979. 2. A. Rosenblueth, N. Wiener,andJ. Bigelow,"Behavior,purpose, and teleology," Philos.Scl. f0(1), 18-24(1943). 3. W. McCullochand W. Pitts, "A logicalcalculusof the ideasimmanentin nervousactivity,"Bull. Math.Biophys.(19a3). 4. N. Wiener, Cybernetics:Or Control and Communication in the Animal and the Machine,2nd ed., MIT Press,Cambridge,MA, 1 9 61 . 5. A. Turing, Intelligent Machin€ry, in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 5, American Elsevier, New York, 1970. 6. A. Turing, Computing Machinery and Intelligence, in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, McGraw-Hill, New York, 1963. 7. C. Shannon, "A chess-playingmachine," Sci. Am. L82, 48-51 (February 1950). 8. C. Shannon, "Programming a computer for playing chess," Philos. Mag.41 (Series7), 265-275 (1950). 9. A. Newell, J. C. Shaw, and H. A. Simon, ChessPlaying Programs and the Problem of Complexity, in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, McGraw-Hill, New York, 1958.Also in IBM J. Res.Deuel.(1958). 10. C. Shannon and J. McCarthy, Annals of Mathematical Studies, Vol. 34, Automata Studies, Princeton University Press, Princeton, NJ, 1956. 11. (Microfichecollections),SDL AI microfichecollectionsfor: The AI Laboratory, Massachusetts Institute of Technology; Stanford University; University of Illinois; Edinburgh University; University of Rochester;Yale University; Carnegie-Mellon University; SRI International; Purdue University; University of Maryland; Rutgers University; Bolt Beranek & Newman; ISI; and Stanford KSL; all published by Scientific Datalink, New York. 12. (Journal), Communications of the Association for Computing Machinery (CACM), published by Association for Computing Machin€ry, New York. 13. (Journal), Journal of the Association for Computing Machin€ti, published by Association for Computing Machin€ry, New York. 14. R. F. Simmons, "Answering English questions by computer: A survey," CACM 8,53-70 (January 1965). 15. R. F. Simmons, "Natural language question answering systems: 1969," CACM 13, 15-30 (January 1970).
16. J. Weizenbaum, "ELIZA-A computer program for the study of natural language communication between man and machine," CACM 9,36-45 (1966). 17. M. R. Quillian, "The teachablelanguage comprehender:A simulation program and the theory of language," CACM 12r 459-476
(1e6e). 18. W. A. Woods, "Transition network grammars for natural language analysis," CACM 13, 591-606 (1970). 19. R. F. Simmons and J. Slocum, "Generating English discourse from semantic networks," CACM 15, 891-905 (1972). 20. J. R. Slagle and J. K. Dixon, "Experiments with someprograms that search game trees," J. Assoc.Comput. Mach. 16, 189-207 (196e). 2I. J. R. Slagle, "A heuristic program that solvessymbolic integration problems in freshman calculus," J. Assoc.Comput. Mach. 10, 507-520 (1963). (Also in E. Feigenbaum and J. Feldman (eds.),Computersand T'houghf,McGraw-Hill, New York, 1963). 22. (Journal), Finite String, L964-L973 as a separatenewsletter; at present it is included in ComputationalLinguistics, the journal of the Association for Computational Linguistics. 23. (Journal), Mechanical Translation and Computational Linguistics, L954-L964 as MT: Mechanical Translation; 1965-1968 as Mechanical Translation and Computational Linguistics; ceased publication in 1968. 24. (Journal), Computational Linguistics. The Finite String, presently a newsletter included in journal, began publication in L964. From 1965 to 1968 published as Mechanical Translation and ComputationalLinguistics.In L974 resumedpublication as The American Journal of Computational Linguistics, changing its name in 1984 to Computational Linguistics. Published by the Association for Computational Linguistics, Bell Communications Research,Morristown, NJ. 25. (Journal), Computersand the Humanities, published since 1966. North-Holland, Amsterdam, The Netherlands. 26. (Conferenceproceedings series), International Joint Conferences on Artificial Intelligence. IJCAI; Proceedings of the International Conference on Aftificial Intelligence, Palo Alto, CA, 1969-. 27. E. A. Feigenbaum and J. Feldman (eds.), Computers and Thoughf, McGraw-Hill, New York, 1963. 28. (Book series),D.Michie et al. (eds.).Machine Intelligence.[No. 1 by N. L. Collins and D. Michie (eds.),published 1967by Oliver and Boyd, London; No. 2 by E. Dale and D. Michie (eds.),published L967by American Elsevier, New York; No. 3 by D. Michie (ed.),published 1968by American Elsevier, New York; Nos. 4-6 by B. Meltzer and D. Michie (eds.),published 1969, L970, and 19?1by American Elsevier, New York; No. 7 by B. Meltzer and D. Michie (eds.),published L977by Wiley, New York; No. 8 by E. W. Elcock and D. Michie (eds.),published 1977 by Halsted, Wil"y, New York; No. 9 by J. E. Hayes, D. Michie, and L. I. Mikulich (eds.),published L979by Halsted, Wiley, New York; No. 10 by J. E. Hayes,D. Michie, and Y. H. Pao (eds.),published1982by Halsted, Wiley, New York; and No. 11 by J. E. Hayes,D. Michie, and J. Richards (eds.), published 1986 by Oxford University Press, Oxford, UKl. Zg. M. L. Minsky (ed.),SemanticInformationProcessing,MIT Press, Cambridge,MA, 1968. 30. M. L. Minsky and S. Papert, Perceptrons,MIT Press,Cambridge, MA, 1968. 31. H. A. Simon, Sciencesof the Artifi.cial, MIT Press, Cambridge, MA, 1969. 32. G. E. Ernst and A. Newell, GPS: A CaseStudy in Generalityand Problem Soluing, AcademicPress,New York, 1969. 33. (Journal), Artificial Intelligence:An International Journal, pubIished since 1980,North-Holland, Amsterdam, The Netherlands.
AI LITERATURE, g4. (Journal), International Journal of Man-Machine Studies, published monthly since 1969 by Academic Press, New York. 35. T. Winograd, [Jnderstanding Natural Language, Academic Press,New York, t972. 36. R. G. Schank and K. M. Colby (eds.), Computer Models af Thought and Language, W. H. Freeman' San Francisco, 1973' 37. R. Rustin (ed.), Natural Language Processing, Algorithmics, New York, L973. 88. R. C. Schank, ConceptualInformation Processing,Elsevier, New York, 1975. 39. E. Charniak and Y. Witks (eds.),Computational Semantics:An Introduction to Artificial Intelligence and Natural Language Comprehension,North-Holland, New York, 1976. 40. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals,and Un' derstanding, Erlbaum, Hillsdale, NJ, 19774L. D. E. Walker, (ed.), (Jnderstanding Spoken Language, NorthHolland, New York, 1978. 42. S. E. Fahlman, NETL, A System for Representingand Using ReaI-WorId Knowledge, MIT Press, cambridge, MA, 1979. 43. N. V. Findler (ed.), AssociatiueNetworks: Representationand (Jse of Knotttledge by Computers, Academic Press, New York, 1979. 44. K. M. Colby, Artificial Paranoia: A Computer Simulation of Pergamon, New York, 1974. Paranoid Processes, 45. J. Anderson, and G. Bower, Human AssociatiueMemory, Winston, Washington, DC, 1973. 46. J. Anderson and G. Bower, Human AssociatiueMemory: A Brief Edition, rev. ed., Erlbaum, Hillsdale, NJ, 1980. 47. D. G. Bobrow and A. Collins (eds.),Representationand Understanding, Academic Press, New York, 1975. 48. P. H. Winston (ed.), The Psychology of Computer Vision, McGraw-Hill, New York, I975. 49. R. O. Duda and P. E. Hart, Pattern Classifi,cationand Scene Analysis, Wiley, New York, 1973. 50. D. A. Norman and D. E. Rumelhart, Explorations in Cognition, W. H. Freeman, San Francisco,1975. 51. P. H. Lindsay and D. A. Norman, Human Information Processing: An Introduction to Psychology, 2nd ed., Academic Press, New York, L977. 52. G. A. Miller and P. N. Johnson-Laird, Language and.Perception, Harvard University Press, Belknap, Cambridge, MA, 1976. 53. Cognitive ScienceSociety,Perspectiueson CognitiueScience:Papers Presentedat the First Annual Meeting of the Cognitiue ScienceSociety,La Jolla, CA, 1981. 54. P. C. Jackson, Introduction to Artificial Intelligence, Masor./ Charter , L974. 55. P. H. Winston, Artificial Intelligence, Addison-Wesley,Reading, MA, L977 (2nd ed., 1984). 56. L. J. Fogel,Artificial IntelligenceThrough Simulated Euolution. Wiley, New York, 1967. 57. R. B. Banerji, Theory of Problem Soluing: An Approach to Artificial Intelligence, Elsevier, New York, 1969. 58. N. J. Nilsson, Problem Soluing Methods in Artifi,cial Intelligence, McGraw-Hill, New York, I97L. 59. J. R. Slagle, Artificial Intelligence: The Heuristic Programming Approach, McGraw-Hill, New York, 1971. 60. M. A. Arbib, The Metaphorical Brain: An Introduction to Cybernetics as Artificial Intelligence and Brain Theory, Wiley, New York, L972.
535
63. S. Torrance, The Mind and The Machine, Wiley, New York, 1984. McCarthy, Lisp 1.5 Prograrrlrner'sManual. MIT Press, CamJ. 64. bridge, MA, 1965. 6 5 . C. Weissman, LISP 1.5 Primer, Dickensen,Belmont, CA, 1967. 66. D. Friedman, The Little LISPer, Science Research Associates, Chicago,fL, L974. 67. L. Siklossy, Let's Talk LISP, Prentice-Hall, Englewood Cliffs, NJ, L976. 68. J. Allen, Anatorny of LISP, McGraw-Hill, New York, 1978. 69. J. R. Meehan, New UCI LISP Manual, Erlbauffi, Hillsdale, NJ, L979. 7 0 .A. Barr, P. R. Cohen, and E. Feigenbaum, The Handbook of Artifi.cial Intelligence,3 vols., Kaufmann, Los Altos, CA, 1981, 1982.Volumes 1 and 2 edited by Barr and Feigenbaum;volume 3 edited by Cohen and Feigenbaum. 7 L . (Magazine), AI Magazine, published by the American Association for Artificial Intelligence, Menlo Park, CA. 72. M. P. Marcus, A Th.eoryof Syntactic Recognition for Natural Language, MIT Press, Cambridge, MA, 1980. 7 3 . R. de Beaugfande, Text, Discourse,and Process:Toward A Multidisciplinary Scienceof Texts, Ablex, Norwood, NJ, 1980. 74. A. Joshi, B. Webber, and I. Sag (eds.),Elements of Discourse Understanding, Cambridge University Press, Cambridg", U.K., 1981. 7 5 . N. Sager, Natural Langua,ge Information Processing: A Computer Grammar of English and lts Applications, Addison-Wesley, Reading,MA, 1981. 76. R. C. Schank and C. K. Riesbeck(eds.),InsideComputerUnder' standing: Fiue Programs Plus Minatures, Erlbaum, Hillsdale, NJ, 1991. 77. Z. S. Harris, A Grammar of English on Mathematical Principles, Wiley, New York, 1982. 78. M. King, (ed.;,Parsing Natural Languag€,AcademicPress,New York, 1983. 79. T. Winograd, Language as a Cognitiue Process,Vol. L, Synta"tc. Addison-Wesley,Reading,MA, 1983. 80. R. F. Simmons, Computations from the English, Prentice-Hall, New York, 1984. 81. J. F. Sowa, Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley,Reading, MA, 1984. 82. K. Sparck Jones and Y. Wilks (eds.),Automatic Nq.turalLanguage Parsing, Wiley, New York, 1985. 83. R. B. Banerji, Artificial Intelligence: A Theoretical Approach, North-Holland, New York, 1980. 84. N. J. Nilsson, Principles of Artificial Intelligence, Tioga, Palo Alto, CA, 1980. 85. E. Charniak and D. V. McDermott, Introduction to Artifr.cial Intelligence, Addison-Wesley,Reading, MA, 1985. 86. E. Charniak, Artificial Intelligence Programming, Erlbauffi, Hillsdale, NJ, 1980. 87. P. H. Winston and B. K. P. Horn, Lisp, Addison-Wesley,Reading, MA, 1984. 88. The LISP Company, ConferenceRecord of the 1980 Stanford LISP Conference,Santa Clara, CA, 1980. 89. R. Wilensky, LISPCraft, Norton, 1984. 90. G. L. Steele Jr., Common Lisp: The Language, Digital, Burlington, MA, L984.
61. H. L. Dreyfus,What ComputersCan't Do: A Critique of Artifi,cial Reason,Harper & Row, New York, L972.
91. W. F. Clocksin and C. S. Mellish, Programming in Prolog, Springer-Verlag,Berlin, 1981.
62. J. Weizenbaum, Computer Power and Human Reason: From Judgement to Calculation,W. H. Freeman, San Francisco,1976.
92. K. L. Clark and S.-A. Tarnlund (eds.),Logic Programming, Academic Press,New York, 1982.
s36
IOGIC
93. J. A. Campbell (ed.), Implementstions of Prolog, Ellis Horwood, 1984. 94. D. Li, A PROLOG DatabaseSystem.ResearchStudies, AddisonWesley,Reading MA, 1984. 95. E. A. Feigenbaum and P. McCorduck, The Fifth Generation:Artificial Intelligence and Japan's Computer Challenge to the World, Addison-Wesley,Reading,MA, 1983. 96. M. A. Bramer, and D. Bramer, The Fifth Generation:An Annotated Bibliography, Addison-Wesley,Reading, MA, 1984. 97. Y. Shirai and J. Tsujii, and F. R. D. Apps,Artificial Intelligence: Concepts,Techniques,and Applications, Wiley, New York, 1985. 98. F. Hayes-Roth,D. A. Waterman, and D. B. Lenat (eds.),Building Expert Systems,Addison-Wesley,Reading, MA, 1983. 99. C. J. Bigger and J. W. Coupland, Expert Systems:A Bibliography, Institution of Electrical Engineers, 1984. 100. R. Forsyth (ed.), Expert Systems:Principles and Case Studies, Chapman & Hall, London, L984. 101. S. M. Weiss and C. A. Kulikowski, A Practical Guide to Designing Expert Systems,Rowman and Allanheld, Totowa, NJ, 1984. L02. C. Naylor, Buitd. Your Own Expert System, Wiley, New York, 1985. 103. W. Reitman (ed.), Artift.cial Intelligence Applications for Business:Proceedings of the NYU Symposium, Moy, 1983, Ablex, Norwood,NJ, 1984. 104. T. Johnson, The CommercialApplication of Erpert SystemsTechnology, Ovum, London, 1984 105. P. Winston and K. A. Prendergast(eds.),The AI Business:The Commercial Uses of Artificial Intelligence, MIT Press, Cambridge, MA, 1984. 106. P. Harman, Expert Systems-Artificial Intelligence in Business, Wiley, New York, 1985. L07. (J ournal), Computational IntelligenceII ntelligenceI nformatique, published quarterly since February 1985 by the National Research Council of Canada, Ottawa. 108. (Journal), Expert Systerns,published quarterly since July 1984 by Learned Information, Medford, NJ. 109. (Journal), Data and Knowledge Engineering, published quarterly since June 1985 by Elsevier Science,New York. 110. (Journal), Future GenerationsComputer Systems(FGCS), published bimonthly sinceJuly 1984by Elsevier Science,New York. 111. (Journal), Journal of AutomatedReasoning,publishedquarterly since Spring 1985 by Kluwer Academic, Hingham, MA. LLz. (Journal), International Journal of Intelligent Systems, published quarterly since March 1986,R. R. Yager (ed.),Wiley, New York. 113. (Journal), Journal of Logic Programming, published quarterly since January 1984 by Elsevier, New York. II4. (Journal), New Generation Computing, published quarterly since July 1983 by Springer-Verlag, Secaucus,NJ. 115. (Newsletter),Apptied Artificial Intelligence Reporter, published monthly by the Intelligent Computer SystemsResearchInstitute of the University of Miami, published by ICS ResearchInstitute, Fort Lee, NJ. 116. (Newsletter), The Spang Robinson Report, published by The Spang Robinson Report, PaIo Alto, CA. (Newsletter), Knowledge Engineering, a monthly directory of exIl7. pert systemspublished since May 1986 by Richmond Publishing Corporation, New York. 118. (Conferenceproceedings series),American Associationfor ArtifiConferenceson Artift'cial In' cial Intelligence, AAAI-National telligence,Palo Alto, CA, 1980-. 119. (Conferenceproceedingsseries),Canadian Society for Computational Studies of Intelligence, Proceedingsof the CSCYISCEIO
National Conference: Computational Studies in Intelligence, L976-. L20. Association for Computational Linguistics, Proceedingsof the First Conferenceof the European Chapter of the Association for Computationa,lLinguistic s, 1983. IZL. H. M. Rylko (comp. ed.), Artifi,cial Intelligence; Bibliographic Summaries of the Select Literature, Vol. 1 and 2, The Report Store, Lawrence, KS, L98415. L22. (Conference series),COLING; Proceedings of the International Conferenceson Computational Linguistics, published by the Association of Computational Linguistics, Morristown, NJ.
R. A. Auslnn Bell Communications Research
LOGIC The Nature of Logic A central concern of logic is to take a situation describedby a particular set of statements that are assumed, supposed,or otherwise acceptedas true and then to determine what other statements must also be true in that situation. These other true statements are implicit in that situation and are, thus, said to be implied by the original ones.Thus, logic can be used to make implicitly true statements explicit. The original statements are called premises, the "new" statements are called conclusions,and the processof making conclusionsexplicit is called inference. One natural criterion for such inference is to be truth preserving. Deductive logic employs inferential methods that achieve this goal: A deductive argument-a set of premises and a conclusioninferred from them-is said to be valid iff any situation in which the premises are (assumed to be) true is (thereby) also a situation in which conclusion is (assumedto be) true. The rules for determining when a statement is true in a situation are among the concerns of semantics (qv). A valid argument whose premises are in fact true is said to be sound. However, the determination of the actual truth value of a given statement is beyond the scopeof both logic and semantics; it is either subject-matter specific or else dependson observation (empirical investigation). It should be noted that "actual truth," or correspondenceto "facts" in the actual world, is not required: Statements can merely be assumedto be true (taken as if true) and deduction procOedfrom there. The rules for inferring a statement from other statements can be arbitrary relations among statements serving as premises and conclusions. The study of such rules is among the concernsof syntax. It is a not always reachable ideal of logic that syntactic and semantic methods should "overlap": 1. that all statements syntactically inferrable from others (i.e., those conclusionsthat follow from premises according to rules of inference) also be vatidly inferrable from them, that is, that the conclusionsbe true if the premises are; 2. that all statements semantically interrable from others (including those that are tautologies-tr,re in all situations) be syntactically inferrable from them (or be theorems).
LOGIC A perfect overl.p, in which both 1 and 2 hold, is referred to as completeness(qv) of the logic in question. Systemsof Logic Traditionally, systems of logic have been classified as either inductive or deductive. Inductive logics employ inferential methods that can fall short of truth preservation. They are used for reasoning in situations where there is incomplete information, such that only statistical or provisional conclusions can be drawn. For example, inductive inference (qv) might only guarantee that conclusion is highly likely to follow from given premises. Nonmonotonic logics can be considered to fall under this categorY. Besides the standard propositional and predicate logics, there are several varieties of deductive logics: Modal logics deal with the conceptsof necessity and possibility; epistemic and doxastic logics deal with the conceptsof knowledge and belief, respectively; deontic logics deal with moral notions such as obligation and permission; erotetic logics are the logics of questions; and there are also several logics of commands. Relevance logics and logics of counterfactual conditionals deal with more subtle analyses of the if-then connective. (Relevance logic is historically related to the developmentof modal logic.) Deductive logics need not be limited to the two truth values of truth and falsity: There are many-valued logics and logics with truth value "gaps" (for dealing with statements whosetruth values are not determinable). Nor need deductive logics be limited to what actually exists or whether anything exists: There are logics of nonexistent objects (including fictional objects),Iogics for dealing with inconsistent situations (1), and free logics (logics that are free of existencepresuppositions). Discussions of many of these logics and referencesto the literature may be found in the articles on logic in this encyclopedia. An especially good survey is Ref. 2, and issues of the Journal of Philosophical Logic frequently contain articles of relevance to AI.
s37
people think"; John McCarthy and Patrick Hayes are among the leading proponents of this approach. The so-called scruffy "approach is to try to get computers to imitate the way the human mind works which . . is almost certainly not with mathematical logic;" Marvin Minsky and Roger Schank are among the leading proponents of this approach. Thus, neatness is associatedwith formality, mathematics, and logic, and scruffiness is associated with psychological validity. Scruffy methods are attacked as being not well-defined, whereas neat methods are attacked as being overly defined, hence not flexible enough. Neat methods are seen as artificial and unable to handle certain phenomena (such as default reasoning or nonmonotonicity); yet surely any realm that is amenable to algorithmic treatment is thereby formalizable. On the scruffy side, automatic theorem provers (see Theorem provitg) and general problem solvers (seeProblem solving) are objectedto on the grounds that they are not intelligent or that they are too general; as one neat-sympathizer paraphrasesthe scruffy position, "classical theorem-provers know very little about what to do, and are incapable of being told it" (7). On the neat side, Iogic, because of its semantics, is consideredto be "the most successful precise language ever developed to express human thought and inference" (7). Logic "justifies inferences," whereas a processor "performs inferences" (7). The two are independent, and the way in which the processorinfers need not be an automated theorem prover. The neat-scruffy dispute overlaps another dispute about the goals of AI: so-called weak AI tries to "simul ate" human intelligent behavior without attempting to do it in precisely the way humans do, without attempting to be psychologicallyaccurate; so-calledstrong AI tries to "emulate" human intelligent behavior, to be psychologically accurate (8). Thus, perhaps, the real issue in the neat-scruffy debate is a dispute over the level at which logic or psychologyenters into the analysis and solution of problems in AI. (But seeRef. 9 for a recent argument concerningcomputational limitations on neatness.)
Guide to LogicArticles in this Encyclopedia Logicand Artificial lntelligence The relevance of logic to AI should be clear. First, logic is at the heart of reasoning 1qv), and reasoning is at the heart of inteltigence. Since so much is known about the nature of logical reasoning, and since its algorithmic nature has been wellstudied, it was one of the earliest and most successfultargets of AI researchers[e.9.,the Logic Theorist (3) and the method of resolution (qv) (4)1.Second,the wide variety of systemsof logic offers an equally wide variety of formats for representing information (together with built-in inference mechanisms). Thus, the expressivepower of various logics has becomeone of the central aspects of the field of knowledge representation (qv). Becauseactual human reasoning is often not logical (5) and because some researchers have perceived or misperceived standard logic to be overly formal or limiting, several AI researchershave disdained the use of logic. This has given rise to what has been called the "neat/scruffy debate." In a survey article Kolata (6) characterizedthese two positions as follows: The so-calledneat approach to AI "is to design computer programs to reason according to well worked out languages of mathematical logic, whether or not that is actually the way
The following articles provide more referencesand more detailed discussionsof logic, reasoning, and inference, and their relations to AI: Actor formalisms Agenda-basedsystems Backtracking Backtracking, dependencydirected Bayesian decision methods Belief revision Belief systems Blackboard systems Circumscription Completeness Constraint Propagation Control structures Coroutines Demons Distributed problem solving Expert systems Frame theory Fuzzy logic
Mathematical induction Metaknowledge, metarules, and metareasoning Modal logic Physics,naive Planning Problem solving Processing, bottom up and top down Production systems Qualitative processtheory Reasoning Reasonirg, causal Reasonirg, commonsense Reasonirg, default Reasonitg, nonmonotonic Reasonitg, plausible Reasonitg, resource-limited Reasonitg, sPatial
538
LOGIC,PREDICATE
Heuristics Inference Inference, grammatical Inference, inductive Inheritance hierarchy Logic, predicate Logic programming Logic, propositional
Representation, knowledge Representation,procedural Resolution Scripts Semantic networks Temporal logic Theorem proving
BIBLIOGRAPHY 1. N. Rescherand R. Brandom, The Logic of Inconsistency: A Study in Non-Standard Possible-World Semantics and Ontology, Rowman and Littlefield, Totowa, NJ, L979. 2. D. Gabbay and F. Guenthner, Handbook of Philosophical Logic, 4 vols., D. Reidel, Dordrecht, 1983. 3. A. Newell, J. C. Shaw, and H. A. Simon, "Empirical Explorations with the Logic Theory Machine: A Case Study in Heuristics," in E. A. Feigenbaum and J. Feldman (eds.),Computersand Thought, McGraw-Hill, New York, 1963. 4. J. A. Robinson,Logic: Form and Function; The Mechanization of DeductiueReasoning,Elsevier North Holland, New York, 1979.
made on the basis of an analysis of atomic sentencesinto "terms" (essentially, noun phrases) and "predicates" (essentially, verb phrases). It is an extension of propositional (or sentential) logic and is the modern descendantof Aristotle's logic of subjectsand predicates(seeLogic, propositional).For discussions of traditional Aristotelian syllogistic logic see Refs. L-4. For a general discussion of logic and referencesto other articles on logic in this encyclopedia,seeLogic. Secondarily, it is also the study of the representation of information (see Representation, knowledge) by predicates and their terms. Becauseof the relationships of predicatesand terms to noun phrases and verb phrases, predicate logic has often served as a foundation for natural-language syntax and semantics (see the Natural-language entries; Parsirg; Se. mantics). In this entry, the syntactic items that are used in the representation of information are called sentences,and the items in the "world" that sentencesmean or expressare called propositions. A "predicate" is, as suggestedabove,usually taken to be a verb phrase or the name of a property, relation, or class of objects.Thus, in the sentence Rosesare red
5. D. Kahneman, P. Slovic, and A. Tversky, Judgment under Uncertainty: Heuristics and Biases, Cambridge University Press, Cambridge, U.K., 1982. 6. G. Kolata, "How Can Computers Get Common Sense?"Science217, L237-1238 (1982).
"(is) red" is the predicate; it can be taken to name the property or attribute of being red or of redness,or the class {x: x is red} or {r: r has redness}.In addition to this "subatomic" analysis of the atomic sentencestreated by propositional logic, predicate 7. P. J. Hayes, "In defenceof logic," Proc. of the Fifth IJCAI, Cam- logic employs a machinery of variables and quantifiers that bridge, MA, 559-565,t977. allows it to expresshow many objectsfall under a given predi8. J. R. Searle, "Minds, brains, and programs,"Behau.Brain Sci.3, cate. The adjective "first-order" indicates that the quantifiers 4t7 -457 (1980). only range over individuals, not properties, relations, or C. 9. Cherniak. "Computational complexity and the universal accep- classes(i.e., they range over the things representedby terms, tance of logic,"J. Philos. Sl, 739-758 (1984). not the things representedby predicates).Second-orderlogic (seebelow) quantifies over predicates;by extrapolation, propositional logic may be thought of as being of "zeto order." General References Although predicate logic is usually taken to be a way of J. McCarthy, "Epistemological problems of artificial intelligence," propositions or declarative sentences,there are also analyzing Proc. of the Fifth IJCAI, Cambridge,MA, 1038-1044 (1977\. predicate logics for other types of sentences(e.g.,quantified R. C. Moore, "Problems in logical form," Proc. ACL 19, LL7-L24 modal logic and quantified epistemic logic). In fact, the logic of (1e81). some sentences,such as interrogatives (erotetic logic), only P. Wallich, "AI spcialists debate logic at conference,"IEEE The Instiinteresting in the quantified case.(For discussionsof becomes tute 7, 2 (November 1983). epistemicand other modal logics,seeLogic, modal, and Belief systemsand Refs. 5-9; for erotetic logic, seeRefs. 10-12.) W. J. Rapeponr As is the casewith propositional logic, the representational SUNY at Buffal,r systemof predicatelogic is its underlying language,consisting essentially of terms, predicates, quantifiers, and truth-functional connectives,with a grammatical syntax and a semantOGlC, DEFAULT. See Reasonitg, nonmonotonic. tics in terms of individuals and properties (or classes).The syntax is often extended to include functions (or term-producing operators), the identity predicate, and definite and indefiLOGIC, MODAL. See Modal logic. nite description operators. The deductive system of predicate logic extends that of propositional logic to include axioms and rules for manipulating quantifiers. tOGlC, NONMONOTONIC. See Backtracking, dependency Reasonitg, nondirected; Belief revision; Circumscription; monotonic.
LOGIC,PREDICATE known as predicate calculus or first-orPredicate logic-also the study of inferences that can be der (predicate) logic-is
The Languageof PredicateLogic Informally, an atomic proposition is analyzed into a single verb phrase (the predicate) and a sequenceof noun phrases (gXammatically, its subjectsand objects)called the argurnents of the predicate. For examPle, Socratesis Greek
LOCIC, PREDICATE
consistsof the Predicate " ment "socrates"; and
. is Greek" together with its argu-
Fredonia is between Erie and Buffalo
539
Table 2. Terms af I (T1) A11individual variables are terms' (T2) A11individual constants are terms. (T3) If h, . , to ate terms and f is an n-placefunction symbol, t n )i s a t e r m . t h e nf ( t t , . . , (T4) Nothing else is a term.
'" to. and is between consists of the predic ate ". For example, each of the following is a term: and "Buffalo" "Erie," gether with its arguments "Fredonia," x the In Buffalo)). (Fredonia, Erie, ior the argument sequence, X1 or Greek, being property: the names predicate the case first a, predicate the class: {r: x, is Greekh in the second case the Qzs class the or ., . . and . names the relation: being between John of ordered triples: {(x, !, zli r is between y and z}. Discussed Mother-of(Bill) below are important theoretical differences between the full Son-of(Harriet, Frank) fi.rst-order logic of relations and monadic first-order logic, which only has one-placepredicates. To be able to expresspropositions such as Table 3. Well-formed formulas of I
AII humans are mortal. some philosophers are computer scientists. There are no unicorns. quantifiers and variables are used. Thus, the first of these examples might be expressedusing the universal quantifier ("for all"): For all x, if x is human, then r is mortal. and the secondmight be expressedusing the existential quantifier ("for some" or "there exists"):
(wff.l) If h, . , tnare terms and P is an n-place predicate symbol, then P(h, . , tn) is a(n atomic) well-formed formula. (wff.2) lf p and g arc well-formed formulas, v is an individual variable, and g(v*) is a well-formed formula containing zero or more occurrencesof v, then
ri,f,y> Vv[,p(v*)] 3v[p(v*)] are well-formed formulas. (wff.3) Nothing else is a well-formed formula. Parentheses and brackets will sometimes be omitted when no ambiguity results. For example, each of the following is a wff: A(x, y) In(Eiffel-tower, France) - Republican(John-F.Kennedy) (Capital(Albany, New-York) v B) VrFr Vr[- Human(r) v Mortal(r)] -t llr Unicorn(r)
For somex, x is a philosopherand r is a computer scientist. There exists an rc such that r is a philosopher and r is a computer scientist. Syntax. A formal syntax for a language 9 of predicate logic can be presentedby giving an alphabet, a recursive definition of term, and a recursive definition of utell-formedformulo (wff) (given in Tables 1-3). In order to define the notion of a sentence and to give the inference rules, the following definitions ate necessaryi (D1) Let pbe a wffprefixed by a quantifier phrase (i.e.,either Vv or 3v). Then rp is the scopeof the quantifier phrase.
Table 1. Alphabet of I n-place predicate symbols (n an integer) n-place function symbols (n an integer) Individual Individual
variables constants
A, . . (l an integer); , Z; At B,, any sequence of words separated by hyphens f, g, h; f , Ei, h; (i an integer)i any sequence of words separated by hyphens . , z) ui, . , zi (i an integer) . , €i ei, . , ei (i an integer); any noun phrase (the words separated by hyphens)
Connectives Punctuation Quantifiers Universal Existential
(()" V 3
(Note that a zero-place"predicate," Iike B, is an atomic wff.)
For example, the scopeof Vr rnVxg(r) is g@), but the scopeof 3y in Fy p0) v {) is ,p(y). (D2) Let the variable in a quantified phrase be called its uariable of quantification. Then: (a) An occurrence of an individual variable in a wff p is bound means: the variable occurs in the scopeof a quantifier phrase in E that has that variable as its variable of quantification. (b) An occurrenceof an individual variable in a wff p is free means: the occurrence of that variable is not bound. (c) A variable ts bound means: there is an occurrenceof that variable that is bound. (d) A variable rs free means: there is an occurrence of that variable that is free. For example, in (Fr v VrGr)
540
LOGIC, PREDICATE
the first occurrenceof x is free and the secondis bound; the variable x is both free and bound in this wff. Finally, (Dg) A sentenceis a wff with no free variables. (For further discussion of the grammatical syntax of a firstorder language and translations of natural-language sentencesinto it, seeRefs. 13-15.)
2. If p and $ are wffs and v is an individual variable, then (a) Fr - I if and only if not- fte, (b) Flp v {) if and only tf *e or Ffil (c) Frvvp if and only rf Fr,g for every interpretation I' that differs from I at most on what it assignsto v; (d) FFvp if and only if Fr,9 for some interpretation /' that differs from I at most on what it assigns to v. Finally,
Semantics.Providing a semanticsfor such a first-order language is somewhat more problematic than it is in the propositional case.The main reasonfor this is that a decisionmust be made about the domain (or universe) of discourse.It was noted above that a predicate can name a property (or relation) or a class.But classesare extensional("two" classesare identical if they have the same members), whereas properties are intensional (i.e., nonextensional).Moreover, there are important questions about what counts as an individual: 1. Can properties or classesthemselvesbe individuals? This is surely plausible; considersuch propositionsas: Red is a color. Colors are properties. {x: x is a rational number} is countable.
A wff p is valid in M (written: M F d if and only if Frg for every interpretation / on M. A structure M is a model for a set H of wffs if and only if M F H; for every wff H; e H. Expressibility.As is the case with propositional logic, one can chooseto employ either a small number of connectivesand quantifiers (for eleganceand metatheoretical simplicity) or a wide variety (for expressivepower). Thus, on the one hand, the formal system presented above may be extended in a natural way to include the other truth-functional connectivesor, on the other hand, restricted to using (say; only -, V, and V. The latter can be accomplishedas in propositional logic, together with the following definition: AvP :d.f-' Vv - I
However, care must be taken to avoid paradox, as in Russell's (16) well-known example: {x: x # *} € {r i rc # r} if and only if {x: rc # )c}f {x: x f x} 2. Must the individual actually exist? If variables and terms may only range over existents, how does one expresssuch sentencesas the following? There are no round squares. Santa Claus doesnot exist. All unicorns are white.
Vr[Ar + Br] and =lr[Ar n Br]
If f is an individual constant or individual variable, then I(t) € D. If f is a function symbol, then /(f) € F. It f is an n-placefunction symbol and tt, . , tn ate terms, I(f)(I(t), t)) . then l(f(h, ,I(t)) € D. , If P is an n-place predicate symbol, then /(P) e R. The notion of "truth on an interpretation" (symbolizedas: Fl) can be defined recursively as follows:
(I(t),.
,I(t")>€/(P).
All As are Bs. SomeAs are Bs. as, respectively,
Thus, a semantics for a first-order language cannot be completely specifiedindependently ofan ontology-a precise specification of the domain. Nevertheless, the general form of such a semantics (often calledfo rmal semantics,seeRef. 8) doesnot vary. Metatheoretical results are given here in terms of settheoretic semantics (i.e., in terms of an ontologa of sets and ertvrsrv are given in most of the their members), which is the way they literature. Let M be the structure (D, R, F), where D is a nonempty set, R is a set of n-place relations on the elements of D, and F is a set of n-place functions on the elements of D. An interpretation, .f, on M for I is a function from the symbols of 9 to D U RUFsuchthat:
1. If P is an z-place predicate symbol, and tt, are terms, then FrP(/r, , tn) if and
Another variation is to employ restricted quantifiers. Instead of translating
tn
only
if
with the noticeable change in syntactic structure, a family of restricted quantifiers can be introduced: Nr: rp(r)) and (fr: 'p(r)) Using this notation, the translations become the more uniform-looking (Vr: Ar)Br and (lr: Ar)Br This notation has the additional advantage of being extendible to generalized quantifiers for handling such sentencesas Most As are Bs. Many As are Bs. as well as numerical quantifiers: Exactly 4LLGAs are Bs. Greater than 5 As are Bs. Between 5 and 10 As are Bs. Generalized and numerical quantifiers are, however, beyond the scopeof first-order logic (for discussionsof these issues,see Refs. L7 -2L). Other alternatives to first-order languages and logics have been motivated by ontological concerns. As is seen below, when deduction is discussed,Vxq@) implies axg@) in a nonempty domain. But what about the empty domain? Why
LOG|C, PREDICATE
should logic imply that something exists? Shouldn't logic be independent of ontology? Attempts to broaden the scope of first-order logic have included free logics (i.e., logics that are free of existence presuppositions)and Meinongian logics that allow for representing and reasoning about nonexistents.Both of these kinds of logics often chooseto represent existenceby a special predicate, E!, rather than by trying to define existence in purely first-order terms (as, e.g., "Jtcfx - af" for "o exists") (for discussionsof free logics, see Refs. 10 and 22-27 and for discussionsof Meinongian logics, seeRefs. 28-36). DeductiveSystemsof PredicateLogic Syntax.As with propositional logic, a deductivesystem for predicate logic can be presented axiomatically or as a natural deduction system. Axiomatic PredicateLogic. In this section a set of axioms and rules of inference for predicate logic are presented using the terminology introduced in the entry Logic, propositional. As is done there, the wffs are restricted to those whose only connectives are -t and +; and the only quantifier is the universal quantifier. All wffs of the following forms will be axiom schemata: (Al) (A2) (AB) (A4) (A5)
(p - (./l- d). ((p - (./l- X)) - ((p - f) - (p - X))). ((- e + - f) -- (,, - d). (Vv[p - ,r] - (p -+ Vvf)), where v is not free in p. (Vvq(v*) + p(t/v)), where p(tlv) is the result of replacing all free occurrencesof v in g by any term t and where
in t are rreeat all locationsin .,,wherev ;::JtrH|:il There are two rules of inference: Modus ponens:From g and (p - f), infer g. Universal generalization: From
541
tion rf not containing c, infer rf, where c is an individual constant that has not been used before and p(clv) is as describedabove. (Theserules are adapted from Ref. 37. For further discussion and other sets of rules, see other standard introductory texts, such as Refs. 1b and 88-49.) As is the casewith propositional logic, there is a form of the inference rule Resolution (qv) that has proved to be of importance in AI contexts (seeRefs. 44-50 and Theorem provirg). As an example of the use of the introduction and elimination rules, Figure 1 showsa translation and natural-deduction proof of the argument: Horses are animals. .'. Every head of a horse is a head of an animal. The rules of + Elimination and -> Introduction usedon lines 7 and 11 can be derived from the rules for the connectives- and n and the logical equivalence "material conditional"; the former rule is, essentially, modus ponens (seeLogic, propositional and Ref. 37 for details of these rules and the derivations). Extensions of PredicateLogic First-order languages are often extended by the addition of two important symbols: the two-place predicate symbol for identity, :, and the definite-description operator r (in many AI and natural-language contexts,words such as equal and the are used instead). These additions to the representational power of the language also entail greater deductive power. Identity. Syntactically, the identity predicate can be defined by adding the following to the definition of wff: (wff.-; If h and t2 are terms, then (tt: formed proposition.
fz) is a(n atomic) well-
A Natural-DeductionSysfemfor PredicateLogic.The natu- Often, (h + t) is defined as an abbreviation for -' (h _ t). ral-deduction system for propositional logic introduced in Semantically FrGt - t) if and only if I(t) : I(t).The axioLogic, propositional may be extendedto predicate logic by pro- matic formulation of predicate logic can then be extended by viding introduction and elimination rules for the quantifiers. the following two axiom schemata: Becausethese rules involve the substitution of variables by constants, and vice versa, care must be taken not to acciden- (A6) Vrlv - vl. tally bind a previously free variable or free a previously bound (AZ) VvrVvz [(vt : vz) - (p(vr*) * q(vzlvr*))], one. Consequently,the quantifier rules are not as "natural" as where $(vzlvr*) is the result of replacing v2 for v1 at zero the rules for the connectives. or more of the free occurrencesof v1 rn g where v2 would not be bound. V Elimination' From Vvg(v*), infer g@lv), where p(c/v) is the wff that results from gft*) by replacing all free occurDescriptions rences of the variable v by the constant c. Definite descriptions.Noun phrasessuch as v Introductiou From g@*), infer vvg(v/c), where p(v/c) is the wff that results from p(c*) by replacing all occurrences the first human on the Moon of c by v, provided: c does not occur in a premise; if p(e*) the Present King of France occurs in a subproof,then no individual constant in V@*) the woman who wrote "The Story of an Hour,, occurs in an assumption that is global to the subproof;and all new occurrencesof v must be free after the replacement. can be treated as having the form 3 Introduction' From p(c*), infer Jvg(vlc*), where g(v/c*) is the formula that results from g@*) by replacing zeroor the r such that p(x). more occurrencesof c by v. J Eliminatiou From Jvp(v*) and a subproofthat begins Thus, the expressive capabilities of the first-order language with the assumption p(clv) and that ends with a proposi- (and hence the deductive capabilities of first-order logic) intro-
LOGIC,PREDICATE
542
: T ranslation VrlHorse(r) -+ Animal(r)1 r Vy13r[Horse(r) n Head-sf(y, r)1 --+ f zlAnimal(z) n Head-of(y,z)ll Proof: 1. Vr[Horse(r)
+ Animal(r)]
; premise of argument
* * * * * *
; BEGIN subproof using + Introduction to prove t(fx[Horse(r) n Head-of(a,r)] - ilzlAnimal(z) n Head-of(a,z)]) 2. 3r[Horse(r) n Head-of(a,x)] ; assumption for + Introduction ;BEGIN sub-subproof using 3 Elimination to prove f z[Animal(z) n Head-of(o, e)] * 3. (Horse(b) n Head-of(o, b)) ; from line 2 (assumption for !l Elimination) * 4. Vr[Horse(r) + Animal(r)] sent in from line 1 * 5. (Horse(b)- Animal(b)) from line 4, by Y Elimination * 6. Horse(b) from line 3, by n Elimination. * 7. Animal(6) from lines 5 and 6, by + Elimination * 8. Head-of(a,b) from line 3, by n Elimination
*
{< 9. (Animal(b)
*
n Head-of(o, b))
r' *
; from lines 7 and 8, by n Introduction
10.!lz[Animal(z) n Head-of(a, z))i ftom line 9, by a Introduction ;END of sub-subproof that used 3 Elimination to prove SzlAnimal(z) n Head-of(a, z)l x 11.!lz[Animal(z) n Head-of(o, z)] ; returned to outer subproof from line 10 of innermost sub-subproof * 12.(lr[Horse(r) n Head-of(o,3)] -+ 3z[Animal(z) n Head-of(o,z)]) ; from Iines 2 and 11, by -'> Introduction --> Introducti.on to prove ; END of subproof that used -(llrlHorse(r) SzlAnimal(z) n Head-of(o, z)l) n Head-of(a, r)l ; 13.(frlHorse(r) n Head-of(a,x)l - f zlAnimal(z) n Head-of(o,z)l) ; returned to main proof from line 12 of outer subproof 14.Vy[!lrlHorse(r) n Head-of(y, r)] --+ 3z[Animal(z) n Head-of(y, z)]l ;from line 13, by V Introduction Figure 1. An example of Introduction and Elirninatiioz rules to prove the argument that if horses are animals, then every head of a horse is a head of an animal.
duced here can be extended by introducing a new variablebinding operator in addition to the quantifiers. Unlike the quantifiers, which are wff-producing operators, the definite description operator r is a term-producing operator. The definition of term can be augmented as follows: (T5) If p is a wff and v is an individual variable, then r v[91 is a term. There has been a great deal of controversy over the semantics of such terms. The approach due to B. Russelt (61) has becomethe standard logical one. According to Russell'sanalysis, sentencesof the form *bx p(r)) should not be treated as subject-predicate sentences;that is, they should not be parsed as consisting of a noun phrase, 1x Q(r), and a verb phrase, rf. Rather, they are to be arralyzedas 3xl,p(x) n Vytp(y) + y - )cl n f(r)l For instance, to use Russell's famous example, The present King of France is bald
It is a consequenceof this analysis that the sentencecomesout false, since there is no present King of France. Similarly, The book that Knuth wrote is interesting is false, since Knuth has written more than one book; and The winged horse captured by Bellerophon is named "Pegasus" is false, since the winged horse captured by Bellerophon does not exist. The addition to the axiomatic formulation of predicate logic is straightforward: Simply add the axiom schema (A8) f (rvr.p(vr))* =lvr[
is to be represented as 3r[Present-King-of-France(*)
An alternative
analysis' due to Strawson (51)' takes
- t : xtnBard(r)r nvvrpresent-King-of-France(v) f,$:?l#.,lui:*i?1r:ffij[1i:fffi;;eji,1"":ln3i?T;
that is, One and only one thing is a present King of France and he is bald.
norfalse onlif lbx,p(x))isundefined.Thatis, lftxg@) doesnot denote a member of D (i.e., if nothing satisfies the predicate 9), then r!(rrp(r)) is truth-valueless (for further discussion on truth-value gaps, see Refs. 10 and 25).
LOGIC, PREDICATE
A third approach, stemming from work done by Meinong (52), takes *(rxe@D to be of subject-predicate form but by choosesa universe of discourse that allows / to be total stratery This providing an object for each definite description. can be made plausible if the universe of discourseis taken to consist of the objects of thought and, hence, is the most appropriate one for Ai applications (for details, seeRefs. 34, 35, 2933, and 53). Indefinifedescriptions. A noun phrase such as a person I met todaY
543
And, if predicatescan be quantified over' then they can be the arguments of other, "higher-order" predicates.Thus, for exampl;, that a relation is reflexive can be expressedas VRlReflexive(R) <+ VrRrrl with R appearing in both subject and predicate position. Such a logic is termed second-or higher-order logic or the extended predicate calculus. Atthough second-orderlogic clearly has greater expressive power than first-order logic, it also has some metatheoretic disadvantages.For one thing, a form of Russell'sparadox can be developed:
can be treated as having the form
V9[Self-referential(e) e p(dl implies, by V Elimination,
an rc such that p@). The indefinite description operator e, which is also variable binding and term producirg, can be addedto predicate logic in a manner similat to the addition of r (for details' see Refs' 54 and 55). MetatheoreticResults A few major metatheoretic results are worth mentioning briefly. As is the casefor propositional logic, predicate logic is sound (all theorems are valid, i.e., true on all interpretations-in symbols: if l-cp,then Vd and consistent (no wff I is -, such that both | 9 and F d. And Gridel showed that it is complete (all valid wffs are theorems-if | 9' then I p) (see completeness). Lowenheim (and, later, Skolem) (56) showedthat monadic first-order logic (i.e., first-order logic without relations) is decidable: for any wff g, rf there is a nonempty universe of discourse D and there is an interpretation / whose range is D and that is such that FN, then there is an interpretation .I' whose range is the set of all positive integers and that is such that F, i.However, Church showed that the full first-order predicate calculus is undecidable (for details' see Refs. 43, 57, and 58). Second-OrderLogic If quantifiers are allowed to range over predicate variables, ths resulting language allows the expressionof such propositions as There is a relation that holds between Bill and Hector, which would seem to be a logical consequenceof Bill is a student of Hector. In symbols, Student-of(Hector,Bill) implies lP P(Hector, BiIl) as well as ! fr f y f
P Pxy
In such a language, identity can be defined by VxY ylx : y e Vpl,p(x).-,p(y)11
Self-referential (- Self-referential)
:?Tfi:J:*: ::'H:""-*] one Foranother, ".Ll logic is incomplete: There are second-order theorem is that
(for discussions of true second-order wffs that are not theorems second-order logic, see Refs. 41 , 43, 57,59, and 60).
BIBLIOGRAPHY (Oxford 1. W. Kneale and M. Kneale , The Deuelopmentof Logic, 1962. Oxford, Press, University Z. G. T. Kneebone, Mathematical Logic and the Foundations of Mathematics: An Introd,uctory Suruey Van Nostrand, London, 1963. B. A. N. Prior, "Logic, History of," in P. Edwards (ed.),Encyclopedia of Philosophy, Vol. 4, Macmillan and Free Press, New York, pp. 5 L 3 - 5 7 1 ,1 9 6 7 . 4. A. N. Prior, "Logrc, Traditional," in P. Edwards (ed.),Encycloped.ia of Philosophy,YoL S, Macmillan and Free Press, New York, pp. 34-45, L967. 5. J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell University Press, Ithaca, NY, 1962' 6. A. N. Prior, "Logic, Modal," in P. Edwards (ed.),Encyclopediaof Philosophy,YoL 5, Macmillan and Free Press,New York, pp' 5L2, 1967. 7. G. E. Hughes and M. J. Cresswell, An Introduction to Modal Logic, Methuen, London, 1968. 8. D. Nute, Essential Formal Semantics, Rowman and Littlefield, Totowa,NJ, 1981. g. D. Gabbay and F. Guenthner (eds.),Handbook of Philosophical Logic, Vol. 2, Extensions of Classical Logic, D. Reidel, Dordrecht, 1984. 10. K. Lambert (ed.;,The LogicalWay of Doing Things, Yale University Press, New Haven, CT, 1969. 11. N. D. Belnap, Jr., and T. B. Steel, Jt. The Logic of Questionsand Answers, Yale University Press, New Haven, CT, 1976. L2. D. Harrah, in Ref. 9, pP. 715-764. 13. H. R. Otto, The Linguistic Basis of Logic Translation, University Press of America, Washington, DC, 1978. L4. M. L. Schagrrn, The Language of Logic: A Self-Instruction Text, 2nd ed., Random House, New York, L97915. D. Kalish, R. Montague, and G. Mar, Logic: Techniquesof Formal Reasonirg,2nd ed.,Harcourt Brace Jovanovich,New York, 1980. 16. A. N. Whitehead and B. Russell,Principia Mathematica,2nd ed., Cambridge University Press, Cambridge, U.K., L927. L7. R. Montague, The Proper Treatment of Quantification in
544
LOGIC PROGRAMMINC
nary English, in K. J. J. Hintikkd, J.M. E. Moravcsik, and p. Suppes(eds.),Approaches to Natural Language: Proceedingsof the 1970 Stanford Workshopon Gramrnar and Semantlcs,D. Reidel, Dordrecht, pp. 22L-242, 1970. 18. S. C. Shapiro, "Numerical quantifiers and their use in reasoning with negative information," Proc. of the Sixth Int. Joint Conf. Artif. Intell., Tokyo, Japan, pp. 79I-796, 1979. 19. J. Barwise and R. Cooper, "Generalized quantifiers and natural language:' Ling. Philos. 4, L59-2Lg (1981). 20. J. McCawley, Euerything that Linguists haue Always Wanted to Know about Logic but wereashamedto ask, University of Chicago Press,Chicago,1981. 2L. M. Brown, "Generalizedquantifiers and the square of opposition," Notre Dame J. Form. Logic 25, 303-322 (1984).
46. C.-L. Chang, and R. C.-T. Lee, Symbolic Logic and Mechanical TheoremProuing, Academic, New York, L973. 47. Z. Manna, Mathematical Theory of Computation. McGraw-Hill, New York, L974. 48. B. Raphael, The Thinking Computer: Mind Inside Matter, W. H. Freeman, San Francisco,1976. 49. E. Rich, Artificial Intelligence,McGraw-Hill, New York, 1983.
50. P. H. Winston, Artifi,cial Intelligence, 2nd ed., Addison-Wesley, Reading,MA, 1984. 51. P. F. Strawson,On Referring, in A. P. Martinich (ed.),The Philosophy of Language, Oxford University Press, New York, pp. 220235, 1995. 52. A. Meinotrg, Uber Gegenstandstheorie,in R. Haller (ed.), Alexius Meinong Gesamtausgabe,Vol. 2, Austria: Akademische Druck-u. 22. J. Hintikka, "Studies in the logic of existence and necessity," Verlagsanstalt, Graz,pp. 481-535, I9TJ- English translation (The Monist 50, 55-76 (1966). Theory of Objects)by I. Levi et al., in R. M. Chisholm (ed.),Real23. D. Scott, Existence and Description in Formal Logic, in R. Schoenism and the Background of Phenomenology,Free Press, New man (ed.),Bertrand Russell: Philosopher of the Century, Allen and York, pp. 76-LI7, 1960. Unwin, London,pp. 181-200, 1967. 53. H.-N. Castafleda,"Perception,belief, and the structure of physical 24. H. Leblanc and R. H. Thomason, "Completenesstheorems for objectsand consciousness," Synthdse35,285-35L (L977). somepresupposition-freelogics,"Fund. Math. 62, 125-L64 (1968). 54. A. C. Leisenrirg, Mathematical Logic and Hilbert's e-Symbol, 25. K. Lambert, Philosophical Problems in Logic: SomeRecentDeuelGordon and Breach, New York, 1969. opmentsD. Reidel, Dordrecht, 1970. 55. D. Kaplan, What is Russell's Theory of Descriptions?,in D. F. 26. K. Lambert, "On the philosophical foundations of free logrc,"InPears (ed.),Bertrand Russell: A Collectionof Critical Essays,Douquiry 24, I47 -203 (1981). bleday,Garden City, NY, pp.227-244, 1972. 27. K. Lambert, Meinong and the Principle of Independence,Cam- 56. Reference60, p. 394. bridge University Press,Cambridge,U.K., 1984. 57. A. Church, Introduction to Mathematical Logic, Princeton Univer28. H.-N. Castafleda,"Thinking and the structure of the world," Phisity Press,Princeton, NJ, 1956. losophia4,3-40 Q974); reprinted in Critica 6, 43-86 (1972). 58. A. E. Blumberg,Logic, Modern, in P. Edwards (ed.),Encyclopedia 29. W. J. Rapaport, "Meinongian theories and a Rusellian paradox," of Philosophy, Vol. 5, Macmillan and Free Press, New York, pp. Nofi,s12, 153-180 (1978);errata, Notr.s13, L25 (1979). L2-34, L967. 30. W. J. Rapaport, "How to make the world fit our language: An 59. Reference2, pp. 110-118. essay in Meinongian semantics," Graz. Philos. Stud. 14, L-21 60. S. C. Kleene, Introduction to Metamathematics,Van Nostrand, (1 9 8 1 ) . Princeton, 1950. 31. W. J. Rapaport, "Critical notice of Routley [Ref 34]:' Philos. 61. B. Russell, On Denoting, in R. C. Marsh (ed.;,Logic and KnowlPhenomenol.Res., 44, 539-552 (1984). edge,G. P. Putnam's Sons,New York, pp. 39-56, L97L 32. W. J. Rapaport, "Meinongian semantics for propositional semantic networks," Proc. Assoc. Computat. Ling. 23, 43-48 (1985). W. J. Repeponr 33. W. J. Rapaport, "To be and not to be: Critical study of Parsons[Ref SUNY at Buffalo 351,"NoAs L9,255-271 (1985). 34. R. Routley, Erploring Meinong's Jungle and Beyond, Australian National University, ResearchSchool of Social Sciences,Department of Philosophy,Canbema,1979. 35. T. Parsons,Nonexistent Objects,Yale University Press, New Haven, 1980. 36. E. Zaltl, Abstract Obiects,D. Reidel, Dordrecht, 1983. 37. M. L. Schagrin, W. J. Rapaport,and R. R. Dipert, Logic: A Com' puter Approach, McGraw-Hill, New York, 1985. 38. W. V. O. Quine, Mathematical Logic, rev. €d., Harper & Row, 1 9 51 . 39. W. V. O. Quine, Elementary Logic, rev. ed., Harvard University Press,Cambridge, 1980. 40. W. V. O. Quine, Methods of Logic,4th ed., Harvard University Press,Cambridg", 1982. 4L. I. M. Copi, Symbolic Logic, Sth ed. Macmillan, New York, 1979.
LOGICPROGRAMMING
Significanceto logic and Computing. Logic programming can be defined broadly as the use of symbolic logic for the explicit representation of problems and their associated knowledge bases, together with the use of controlled logical inference (qv) for the effective solution of those problems. At present, logic programming is generally understood in more specific terms: The problem-representation language is a particular subset (Horn-clause form) of classical first-order predicate logic (qv), and the problem-solving mechanism is a particular form (resolution qv) of classical first-order inference. Logic programming results from developmentsin the meapplication of logical inference to the solution of probchanical Van Logic, 2nd ed., Mathematicql 42. E. Mendelson,Introduction to lems represented in symbolic form. In this respect it repreNostrand, New York, L979. 43. R. Jeffrey, Formal Logic: Its Scopeand Limits,2nd ed., McGraw- sents the contribution of logic to the practical problem-solving technology needed by a variety of disciplines, of which comHill, New York, 1981. puter-assisted mathematics and AI are notable examples. It Intelligence, Artificial in Methods 44. N. J. Nilsson, Problem-Soluing also draws upon developments within computer scienceconMcGraw-Hill, New York, L97t. 45. N. J. Nilsson, Principles of Artifi,cial Intelligence,Tioga, Palo Alto, cerned with placing computer systems, programming langgages,and prog1amming methodology upon a logical and cocA, 1980.
LOGICPROGRAMMING 545 hesive foundation. Logic programming has aimed to bring together and solve two distinct aims in logic and computing: *.kitrg logic more practical and making computing more logical. Theorem-ProvingParadigm. Logic programming historically owesits existenceto the theorem-proving paradigm (seeInference;Theorem provirg). Moreover, up to the present day this paradigm has formed the basis of the theory and implementation of logic programming. There are some settings, notably in its use as the programming language PROLOG, in which logic programming can be presented less formally solely in terms of computing instances of relations. Whichever view is taken, however, the common basic assumption is that conclusionsare inferred from logical sentences.In most manifestations of logic programming these sentencesare written in clausal form. It is well-known that all knowledge expressible in firstorder logic can be reexpressedin clausal form. A clause is jrttt a disjunction of literals, each one of which is either a predicate (positive literal) or a negated predicate (negative literal). A predicatehas the form p(t) and is usually read as a proposition that the tuple named f belongs to the relation named p. A tuple is just a list of terms, each one of which is either a constant or a variable or a functor applied to other terms. Clauses can therefore be viewed as sentencesdeclaring the logical properties of named relations. An example is (Vx) (likes(x John) V -'likes(father(John) x)) Variables (such as x) and constants (such as John) are distinguished in this paper by the convention that variables' names begin with lowercaseletters. Typical logic programming systemspermit a softer syntax. Universal quantifiers, such as Vx above,can be omitted; predicates can be rewritten in infix form; the connective pattern V-, can be replacedbV if. Theserevisions yield the more intelligible format x likes John if father(John) likes x "John is liked by everyone whom his father likes" Sentencesof this sort, which express the conditions upon which certain relationships hold, serve in logic programs to describe the problem domain. It is also useful to rewrite any clause whose literals are all negative, such as (Vx) (-likes(x John) V -likes(x Peter)) in the equivalent form of a negated conjunction (Vx) -'(likes(x Johtt) A likes(x Peter)) Negated conjunctions serve in logic programs to expressqueries as potentially refutable conjectures.In this role the negated conjunction above can be rewritten in the softer syntax: ? x likes John & x likes Peter and read as the query likes both John and Peter?" Both knowledge bases and queries can thus be expressedin clausal form. Processinga query such as "who likes John?" can
be regarded as trying to prove, as logical consequencesof the knowledge base, clausal-form conclusions,such as Peter likes John which answer the query. In this view, query processing is equivalent to theorem proving, treating the knowledgebaseas a set of axioms (assumptions) and the answer-containing clausesas theorems (conclusions). Forward versus Backward Reasoning.There are two contrasting modes of reasoning for deriving conclusionsfrom assumptions: the forward mode and the backward mode. From two assumptions PifQ
a the forward mode infers the conclusion P. The chief drawback of forward reasoning is its undirectedness;its repeated application generally produces an explosive growth of conclusions irrelevant to the one specifically sought. By contrast, from some desired conclusion P and the same assumptions, backward reasoning infers a new subgoal, conclusion Q. This mode is highly directed; each inference step is relevant to the initially desired conclusion (see Processing, bottom-up and top-down). Resolution.Both modesare specialcasesof a more general inference method called resolution (qv) applicable to clausalform logic. In general, each resolution step begins with some pair of clauses (the parents) such that some literal in one of them can be made complementary to someliteral in the other. This requires that the literals' predicates be unifiable, meaning that they can be made identical by the application of some substitution 0 of terms for variables. The two literals chosen are said to be "resolved upon." The rule then infers a new clause (the resolvent) as follows: The parents' literals, except those resolved upon, are disjoined, and then 0 is applied to the result. For example: First parent: -'Peter likes y. Secondparent: r likes John if father(John) likes tr. Literals resolved:those underlined. Unifier: 0 - {x :: Peter, y :- John}. Resolvent:-father(John) likes Peter. 'Since formulas' conjunctions can be interpreted as queries, this resolution step can be viewed as using the knowledge expressedby the secondparent clause to reduce the (first parent) query Peter like anyone?" to the (resolvent) query "Does John's father like PETER?" Resolution is refutation-complete in the sensethat it can always be used to derive an explicit contradiction from any inconsistent set of clauses.Therefore, the use of resolution to implement forward or backward reasoning requires that the problem of showing that given assumptionsA imply a desired conclusion C be reformulated to show that A is inconsistent with -rQ. For example, to show P from assumptions
546
TOGIC PROGRAMMING
P if Q
a backward reasoning by resolution derives -rQ
(from -rP and P if Q) (from -rQ and Q)
1. conditionless assertions (or simply "assertions"), 2. conditional assertions(or "implications"), e.g.,Peter likes r if x likes LOGIC and Peter knows r; 3. denials, €.8., -(r likes LOGIC & r likes BASIC).
Horn clauses enjoy a simple procedural interpretation. Clauses of kinds 1 and 2 are interpreted as procedures, and then contradiction clauses of kind 3 are interpreted as goal statements: In alt This example is typical of backward reasoning by resolution as cases antecedent predicates (negated predicates) are interpreted as procedure calls. Each step of backward reasoning is employed for the standard execution of logic programs. then a procedure-calling operation, which begins by selecting HistoricalOrigins. Clausal form is one of the normal forms some call from the current goal; next, some procedure is seof first-order logic. The uniformity of clausal-form sentences lected whose heading (the positive literal) unifies by somesubstitution 0 with that call; finally, the procedure's body (its facilitates both their representation in a computer and the structuring of algorithms for manipulating them. Early re- other literals) is substituted for the call and 0 applied to the search in automatic theorem proving therefore concentrated result. Unification is interpreted as communication of data between calls and procedures.A procedurewhoseheading union clausal-form refutation procedures. Two of the most important developmentsin automatic theo- fies with the selectedprocedure call is said to respond to that rem proving (qv) in the 1960s,Robinson'sresolution (1) and call. Loveland's model elimination (2), were basedon clausal form. And versus Or Nondeterminism.Logic programs do not However, by the end of the 1960s,there was a prevailing view themselves determine the order in which calls and procedures in AI that resolution was too general purposeand model elimifor execution. In a goal are selected practical mathematically for useful, nation too obscure appli-'(P & cations. Hewitt's AI language PLANNER (3) was put forward Ql ["show P and Q"] as a procedural alternative to the uniform theorem-proving the calls can be processedin parallel or in sequence(in any methods based on resolution. how Independently, Kowalski and Kuehner showed to com- order) to solve the conjunction stipulated by the goal. Also, if bine resolution with model elimination's backward reasoning several procedures in a system called SL resolution (4). Related discoverieswere P if R made about the same time by Loveland (5) and Reiter (6). The basic idea of Horn-clause logic programming arose in P if S the early 1970s from collaboration between Kowalski at the University of Edinburgh and Colmerauer and Rousselat the respondto, say, the call P, then any or all of them may be used, University of Aix-Marseille. During a visit to Marseilles in in any order, to show P. These freedomsof choiceare referred L97I, Kowalski and Colmerauer discusseddifferent ways in to as AND and OR nondeterminisffi, respectively. which resolution theorem provers could be used for naturalIdeally, whichever selection rules are imposed in practice language parsing (qr). Kowalski made a secondvisit in the they should influence only execution efficiency and should not spring of L972;but before that Rousselhad already shown how affect what the execution achieves logically. Standard PROthe combinatorial explosionassociatedwith using equality ax- LOG systemsdo not realize this ideal completely becausethey ioms could be avoided in certain casesby using SL resolution employ a depth-first search (qt) stratery committed to pursuand reformulating the axioms. At the same time Hayes in ing each computation to its conclusion before exploring unEdinburgh was arguing in general terms that eomputation tried alternatives. Consequently,if execution should enter an could be regarded as controlled deduction (7). infinite computation, it will fail to discoverwhatever solutions These discoverieswere consolidatedduring Kowalski's sec- may be logically afforded by those untried alternatives. ond visit to Marseilles in discussionswith Rousseland Colmerauer. By the end of the summer of L972 Kowalski published IUSH Coroutining versusthe PROLOG Strategy.Standard his results in abstract form (8). Also, Colmerauer and Roussel PROLOG always selectsthe first call in the current goal. From had completed the design and implementation of the first those proceduresthat respondto it, the one selectedis the first PROLOG (Programmation en Logique) interpreter (9) and had untried one appearing in the program text; the others, if any, already implemented a substantial French-languagequestion- will potentially be tried in due course as a consequenceof the answering system. interpreter backtracking (qv) to the call in its searchfor alterMany early versions of PROLOG exploited the discoveryby native solutions. In programming terms this strategy is conBoyer and Moore (10) of the compact storage technique for trol-flow governed by static text ordering. resolution known as structure sharing. More modern versions, The order of call selection doesnot affect the solutions comhowever, mostly rely on the structure-copying principles de- puted. IJnconstrained selection correspondsto LUSH resoluvelopedby Bruynooghe (11). tion (12). The completenessof LUSH, proven by Hill (13), €rsures the completenessof certain strategies more liberal than PROLOG: For instance, IC-PROLOG (14) and certain other Horn-ClauseLogicProgramming logic programming systems support a coroutining scheme in ProceduralInterpretation. Logic programming in its narrow which calls are selected according to the data flow through senseis based on Horn clauses.These are clauses having no their variables, potentially improving effrciency. As stated above, PROLOG's depth-first strategy can make more than one positive literal and are of three kinds:
LOGICPROGRAMMING the computability of solutions sensitive to the order of procedure selection.Different searchstrategies can explore alternative computations in parallel. Breadth-first search in particular is guaranteed always to find a solution if one exists, provided the program contains only a finite number of clauses. But "fair" strategies of this kind require prohibitively large amounts of memory and consequently have received comparatively little attention.
547
ism catering for infinite data structures. An example of such a system is PROLOG II, developedby Colmerauer (15). The formal semantics of logic programs supporting infinite terms are investigated in Lloyd's book (16).
Minimal Model Semantics.The semantics of logic programs can be formulated in the framework of model theory. Every set P of Horn-clause procedures (excluding goal statements) is models.A model associatespredisatisfiable, that is, possesses in some domain D of individrelations P with in cate symbols in logic data structure The basic RecursiveData Structures. programming is the functional term. All such terms are syn- uals. When D is specifically chosen to be the Herbrand universe He) of P (the set of terms constructible using only P's tactically tree structured. A binary tree like constants and function symbols), the models based on it are called Herbrand models. It is provable that any set of Horn clausesis satisfiable if and only if it has a Herbrand model. Every Herbrand model for P canbe characterizedby the set of those variable-free assertions, constructible from P's predicate symbols and the terms of H(P), which are true in the model. The relation of set-inclusion orders the set M(P) of all for example,is easily representedby the term t(t(A B) t(C D)). P's Herbrand models so characterized into a complete lattice A list (A, B, C) is just a specialkind of tree, possessinga minimal member. This minimal model is the intersection of all of the models in M(P). It links the modeltheoretic semantics of Horn clause'logic with the least fixpoint semantics, according to which the denotation of any predicate symbol p in P is the relation named by p in the minimal model. Any variable-free assertionp(t) that is true in the minimal model ofP is then necessarilylogically implied by NIL P and, owing to the completenessof Horn-clause logic, must therefore be provable from P. The connections between the and can be represented by t(A t(B t(C NIL))) or, in sugared model-theoretic, least fix-point, and operational (proof-theoinfix notation, by A.B.C.NIL. retic) semantics were first articulated by van Emden and KoTrees and lists are examples of recursively definable data walski (17). The book by Lloyd (16) describesthe theory and structures. One PROLOG definition of a list could be applications of the minimal-model semantics and presents a greatest fix-point theory of perpetual processesgenerated from List(NIL) nonterminating logic programs. List(u.x) if Element(u) & List(x) The recursive nature of the definition is inherited by procedures designed to accessor manipulate the data structure. For example, to compute the length w of the list (A, B, C) we might use the recursive procedure set Length(NIL 0) Length(u.x y) if Length(x z) & y - z*L and the goal statement ? Length(A.B.C.NIL w) In contexts where PROLOG is required only to compute finite data structures, it is theoretically necessaryto include in the interpreter's unification algorithm a particular test called the "occur check," which eliminates the possibility of producing self-referential bindings [e.g., X i: f(x)] denoting infinite terms. The occur check significantly impedes the speedof unification. Most PROLOG implementations omit the occur check on the assumption that their users will take care not to implement programs capable of generating self-referential bindings. In other contexts, however, omission of the occur check can be combined with special arrangements for processing selfreferential bindings in order to yield a programming formal-
PROTOC SequentialControl. PROLOG has been designed to exploit the computational possibilities associatedwith the sequential execution of programS,though other nonparallel schemeslike coroutining have also been implemented. Were it not for the novel features of unification and OR nondeterminisffi, sequential PROLOG's behavior would resemble that of many block-structured procedural languages; calls are executed in text order, and each such call is represented internally by a procedureentry requiring a fresh memory allocation for the procedure'svariables. It is thus usual to employ one or more stacks to record the resulting locus of control together with the bindings made to variables. The PROLOG programmer's main control over execution behavior is via the text ordering of calls and procedures.For
exampt"' :; ;:il:nt-or
y ir x is-parent-or z& z is-parent-of y
will in general perform far more efficiently when given x to find y than the logically identical procedure x is-grandparent-of y if
z is-parent-of y & x is-parent-of z
TOGIC PROGRAMMING
whose first call is highly nondeterministic when neither z or y Addition and Deletion of Clauses.Most PROLOG systems are given. Thus, the order of call selection is a strong control permit special calls that alter the program text during execuover the behavior of PROLOG programs. The order in which tion. The simplest of these merely add clauses to or delete proceduresare selectedin responseto a call doesnot affect the clausesfrom the program. efficiency of exploring the whole search space;however, it does The use of such devices gives rise to a strongly procedural affect the efficiency with which individual solutions are com- programming style and, generally, is liable to conflict with the puted because,under the depth-first strategy, it determines declarative reading of programs. It is neverthelesspossibleto justify usage of them when efficiency demandsit, provided one the order in which they are discovered. pays sufficient regard to the logical semantics. The Cut Operator. The "cut" (or "slash") operator I enables Run-time deletion of a clause not only usefully deallocates the programmer to exciseunwanted computations during exe- the memory assigned to it but also eliminates possibly uncution. Inserted into any procedure or goal statement as an wanted computations dependent on it. Provided subsequent extra call, its function when executedis to eliminate all poten- execution never requires the clause, the deletion carries no tial alternative computations that have becomependi.tg upon logical penalties; otherwise, the penalty is incompletenessin the solutions computed. or since entering the statement. Adding a clause during program execution is often used to For example, execution of the program assert some already computed result as a lemma, making the ? P(x) ["find x such that P(x) holds"] result more accessibleto future subgoalsthan it would be if it had to be recomputed from the original clauses. This use of P ( x )i f x : 1 adding clauses respects the logical semantics and improves P(x) if x * 3 efficiency. On the other hand, adding an arbitrary clause may extend the set of computable solutions to include one incapaP(x) if P(y) & x - y+L & I ble of logical justification in terms of the original program. Addition of clauses is also used to implement metalevel will yield solutionsx:- 1, then x:: 3, then x:- 2, and then progfamming, manipulating program text, and then adding terminate. This is because,when execution invokes the third resulting clauses, thereby passing from metalevel to object (recursive) clause about P in order to solve P(x), the ensuing level as in the use of Weyhrauch's (18) reflection rules. call P(V)is solved (using the first clause)to computey : - L; the The combination of addition with deletion promotes a pronext call x y + 1 yields x : 2 and the only outstanding "call" gramming style analogous to the use of scratch memory. In is the cut that removes the remaining alternative ways of particular, one can simulate destructive assignment. For exsolving the former call P(y), namely, via the use of the second ample, we may wish to update a sequenceL of length N, in clause or by reinvoking the recursive one. Without the cut it membership of any element u at position i can be reprewhich will yield an infinite seriesof solutions(1,3,2,4,3, 5, . .) by sented by an assertion Element(L i u), by replacing its negainfinite recursion. Thus, its usual effect is to make execution tive members by zeros, so producing a new sequencenamed more OR deterministic, though with a potential sacrifice of next(L). Use of the following clauseseffectively overwrites the completeness. sequenceL by assertions of the form Element(next(L) i v). The cut is subject to serious misuse. Suppose,for instance, it is required to devise a program that, given a list x, conUpdate(L i n) if i > n structs a list y either by deleting a given element u from x or Update(L i n) if i < n & Element(L i u) & by inserting u into x, according to whether or not x already contains u. The PROLOG program DELETE (Element(L i u)) & Construct(u x y) if x Contains u & I & Delete(u x y; Construct(u x y) if Insert(u x Y) is operationally correct, since the cut permits insertion by the second procedure if and only if the call x Contains u fails. Logically, however, that procedure is wrong since it states that insertion is always Permissible. The aboveis an example of cut's role in partially simulating negation. Its most celebrated use is in representing negation as failure by Not(z) if z & / & Fail Not(z) intended for processinga quasi-negatedcall such as Not(P(a)). If the call P(a) passing through the metavariable z can be solved in the first procedure, the cut blocks use of the second procedure and the call Fail (which unifies with the heading of no procedure)forcesfailure of the invoking call Not(P(a)); otherwise, execution backtracks to the secondprocedure and so makes the invoking call succeed.
Substitute(v u) & ADD (Element(next(L) i v)) & Update(L i+ 1 n) Substitute(u u) if u > 0 Substitute(0 u) if u < 0 ? Update(L 1 N) Destructive assignment is often used in order to reuse names as well as memory. The example above could be modified to make L serve as the name of both old and new states of the sequence.This use of names is ambiguous and is logically unsound, yet is permissible in PROLOG, as it is in the use of destructive assignment within conventional languages. of Horn-ClauseProgramming Extensions Negationas Failure. Although Horn-clauseform is adequate for computation, extensionsof its logic can greatly improve its suitability for practical applications. Negation as failure is typical of such extensions.
LOGICPROGRAMMING 549 Negation as failure treats failure to prove as proof of negation: A negated procedure call Not(P) is deemedto hold if and only if P cannot be shown to hold in a finite amount of time. For example, given P(a) P(b) ? Not(P(c)) the query Not(P(c)) is solved becauseP(c) fails finitely. Clark (19) has shown that negation as failure is a correct approximation to standard classical negation, provided that the implicit closed-world assumption (CWA)-that the program contains a completedescription of the relations named in it-is made explicit. In the example above,the CWA is explicitly expressedby P(x)iff(x: aorx-b) and a complete specification of :, including inequalities such as a+b,b+a,. This more complete specification of P then classically implies -'P(c). The negation-as-failure rule has been proved complete in a restricted context by Jaffar, Lassez,and Lloyd (20). More generally, however, it is incomplete and oversensitive to context of use. In particular, it cannot deliver values for variables in negatedcalls. Thus, for example, given appropriate clausesfor Q, it can deal with the query ? Not(Q(a)) but not with the logically identical ? Not(Q(x)) Negation as failure was first given prominence in Hewitt's language PLANNER (3). ConditionalSubgoals.Horn clausescan be extendedto admit subgoalsthat are themselves conditional in structure, as in ? (V x) (P(x) if Q1v)) Such an extension bridges much of the gap between Hornclause logic and full first-order logic. Programs using it can be converted to Horn clauses augmented by negation as failure. For instance, the query above can be rewritten as ? Not(Q(x) & Not(P(x))) The original query requires that all solutions of Q(x) also solve P(x). PROLOG execution of the rewritten form treats this as the task of showirg, by iterating through the solutions of Q(x) (if any), that none of them fails to solve P(x). Setsof Solutions.Another useful extension-essentially a metalevel feature-is the ability to collect all solutions to a call into a set represented by a single term. This facility is commonly called "aggregation." For example, to construct and then count the set y of all personsx liked by John, one could write ? y set-of (x: John likes x) & Length(y n)
The set-of call can be implemented by posing the call John Iikes x and collecting all the distinct solutions for x into y until failing to generate any more. The soundness of this is, Iike that of the implementations suggestedfor negation as failure and universally quantified subgoals,dependent on the CWA. Also, like both those extensions, it is susceptibleto incompleteness, looping, and context sensitivity through falling short of full classical logic. Horn-ClauseMetalevelProgramming.Horn-clauselogic and its extensionscan be used at the metalevel to enable programs to describethe logical and behavioral properties of themselves and other programs. This use of metalevel logic is exemplified by the expert equation solver PRESS implemented in PROLOG by Bundy and Welham (21). Instead of solving equations by using object-level rules of algebra, PRESS uses metalevel rules describing mathematical problem-solving expertise (see Meta-knowledge,-rules, and -reasonitg). Bundy's use of metalogic operatesentirely at the metalevel. Systems that amalgamate object-Ievel and metalevel uses of logic have been devised by Weyhrauch (18) and Bowen and Kowalski (2D. The basis of these amalgamatedsystemsis the use of a proof predicate Demo(x y) expressing "conclusiony is provable from assumption set r," whose role is analogous to the EVAL function of LISP. The proof predicate can itself be defined in logic and executed either by running its definition directly or by running the object-level logic system recursively. Including the proof predicate in the logic language makes it possibleto formalize and reason with such subtle distinctions as that between "a person is innocent if not guilty" and "a person is innocent if not proven guilty." It also allows one to formulate self-referential sentences,not those like "this sentence is false," which is a paradox and not expressibleusing Demo, but rather ones like "this sentence is unprovable," which is true but unprovable. This is a direct analogue of Godel'sproof of the incompletenessof axiomatic arithmetic. SpecificationversusProgramming ExecutingSpecifications.Logic has been used traditionally in computing to expressdeclarative specificationsserving program analysis and construction. However, the mechanization of logic through computer-basedproof procedures has made such specificationsexecutable in their own right, so that, €.9., they may be tested on small-scale data or debuggedor treated as prototypes in program development. As a preliminary problem description, a naive, nonprocedural, logical specification is, or at least ought to be, both simpler to reason about and more flexible to modify than a program containing greater commitment to a particular problem-solving method. Such specificationscan be used not only as precursors to program developmentbut also as queries to a database. A declarative style for such queries is essential to users unconcerned with the database's storage and access mechanisms. Horn-clause logic and its extensions can be used both for specifying and for programming. These usesare distinguished only by their intent and by their relative degrees of proced.uralness.The sentence y is-sort-ofx if
y is-permutation-ofx & y is-ordered
550
LOGIC PROCRAMMINC
is more like a declarative specification of the sorting relation than is the sentence y is-sort-of x if
x decomposes-into(xr xz) & y1 is-sort-ofx1
&
y2 is-sort-of x2
&
y is-merge-of (yt yz)
together with specificationsof Member and is-lower-bound-for and with the assumption that < is transitive. The PROLOG clausesare each logically implied by the specificationand easily derivable from it using first-order inference. Program derivation can also be applied to given logic programs in order to transform them to equivalent but computationally different ones. For instance, an alternative, nonlooping program for the former connectivity problem is ? x connects-toy
which anticipates a specificmerge-sort algorithm. Both, however, are directly executable, with varying efficiencies, in PROLOG. Logic can also be used to encodeand animate (possibly incomplete) knowledge formulated prior to formal specifications in the early stages of user-requirements definition and systems analysis. For instance, the knowledge content of conventional data-flow diagrams can often be transcribed directly into executable Horn clauses. Although mere run-time inefficiency may be tolerable when experimenting with specifications,the deliberate disregard of the problem solver's behavior raises the hazard of nonterminating loops. The naive specification x is-joined-to y if y is-joined-to x of connectivity in a network defined by a is-joined-to b b is-joined-to c etc. when executed by PROLOG to determine the network's connections through a query ? x is-joined-toy will loop indefinitely-without computing any solutions and without even accessingthe data defining the network-if the general rule for is-joined-to textually precedesthe data. Fundamentally this is due to the "unfairness" of depth-first search commented on above. However, if the data precede the rule instead, all solutions are generated infinitely often. In either caseone is penalized for ignoring the procedural consequences of what one has written. One way of overcoming this while preserving declarative freedom of style is to incorporate loop detection into the problem solver. Deriving Programsfrom Specifications.A specificationassumed to be a logically correct problem description can be used to derive a more efficient description (program) for the problem. If the specificationis itself written in logic, it can serve as an axiom set for deducing computationally useful theorems. For example, the PROLOG clauses w is-least-ofw.NIL w is-least-ofu.v.NIL if u < v & w is-least-ofu.NIL w is-least-ofu.v.NIL if u > v & w is-least-ofv.NIL witl answer the query ? w is-Ieast-of 3.2.1.4.NIL much more efficiently than will the naive specification w is-least-of x iff Member(w x) & w is-lower-bound-forx
x connects-toy if x is-joined-to y x connects-toy if y is-joined-to x a is-joined-tob b is-joined-toc etc. and is derivable from the previous looping program using the bridging specification x connects-toy iff
(x is-joined-to y or y is-joined-to x)
As a trivial example, from this specificationone can infer the conditional sentence x connects-toy if (x is-joined-to y or y is-joined-to x) and from this the two principal Horn clauses x connects-toy if x is-joined-to y x connects-toy if y is-joined-to x of the desired program. Studies of logic program verification and derivation can be found in works by Clark (23), Hogger (24), Clark and Darlington (25), and others. FunctionalProgramming Functional programming can be regarded as logic programming in the broad sense of computation by deriving consequences from assumptions. Assumptions in functional programs are expressed as equations between individuals constructed by means of variables, constants, and function symbols. For example, the equations length(Nll,) - 0
Ll
Iength(*.y)_length(V)+ 1
L2
recursively define the function that computes the length of a list in terms of the addition function and the list constructor function ("."). To compute the length of the list D.A.D.NIL it is necessaryto derive a conclusion of the form length(D.A.D.NIL) - t where f is expressedonly in terms of undefined functors (such as "."). The derivation is performed by using the equations as rewrite rules:
TOGIC PROGRAMMING
length(D.A.D.NIL) _ length(A.D.NIL) + 1 - (length(D.Nll,) + 1) + 1 _ ((length(Nll,) + 1) + 1) + 1 _((0+1)+1)+1
-(r+1)+1 _2 + 1 -3 In logic programming understood in its narrow sense as backward reasoning applied to Horn clausesand their extensions, defined function symbols such as length are represented by relation symbols and term rewriting is replaced by problem reduction. Thus, the equations L1 and L2 would be expressedas Horn clauses: Length(NIL 0) Length(x.y u) if Length(y v) & Plus(v 1 u) The computation of the length of a list is performed by backward reasoning:
can be represented by equations
member(x x.y) : TRUE member(x z.y) : member(x y) The problem with this correspondenceis that with the normal algorithm for rewriting terms the equations can be used only to test for membership, whereas the Horn clausescan be used to generate members equallY well. The relationship between Horn-clause programming and functional programming is a very active research subject. Much of this activity is centered around the development of hybrid languages that combine the two kinds of programming. The language LOGLISP (27), which combines Horn-clause programming and LISP, was the first of these hybrids. In addition, much attention is being given to the extension to Hornclause programming by features that have received greater attention within the framework of functional programming. The most important of these features are the treatment of data types, higher order functions, and the development of highly parallel computer architectures. Some higher order effects can be achieved even in unextended Horn-clause logic. For instance, the solution x - u.v.NIL computed from the query
? Length(D.A.D.NIL u) ? Length(A.D.NIL ul) & Plus(uL 1 u) ? Length(D.NIL u2) & Plus(u2 1 ul) & Plus(ul 1 u) ? Length(NIL uB) & Plus(u3 1 u2) & Plus(u2 1 ul) & Plus(ul 1 u)
551
? Length(x 2) can be regarded as a binary function: given values for u and v (as input) it yields a list containing those two items (as output). This example demonstratesthe special power of the logical variable in being able to communicate partially instantiated data.
? PIus(0 1 u2) & Plus(u2 1 ul) & Plus(ul 1 u) ? Plus(l 1 u1) & Plus(ul 1 u) ? Plus(2 1 u) giving the solution u : 3. The example illustrates the general fact that computation by means of equations used as rewrite rules can be simulated by means of backward reasoning using Horn clauses.Since all computable functions can be representedby rewrite rules, this suggestsa particularly simple and transparent proof that all computable functions can be represented by means of Horn clauses and all computation can be performed by backward reasoning. The adequacyof Horn-clause logic for computation was first proved by Aanderaa and Borger (26), who showedthat every computable n-ary function can be computed by some Hornclause program using only terms constructible from the constant 0, unary functor s, and n * 2 vanables. The representation of n-ary functions by means of n * 1-ary relations, illustrated above,is only one of several possiblecorrespondencesbetween functional programming and Hornclause programming. Another correspondenceuseful for expressing Horn-clause programs as functional programs makes use of n-ary Boolean-valuedfunctions to represent n-ary relations. For example, the Horn clauses Member(x x.y) Member(x z.y) if Member(x y)
Logic Databases Logic Databases= DeclarativeLogic Programming.The notion of logic databasearose out of work on question-answering (qv) systemsin AI. The main impetus to this work was Green's demonstration that resolution logic could be used for question answering (28). Logic databasesshare with logic programs the use of logic to represent knowledge (see Representation, knowledge) and the use of deduction to derive solutions to problems. However, whereas logic programming admits both declarative and procedural modes of use, logic databasesconcentrate on the declarative. In the declarative use of logic the user represents knowledge and formulates problems without concern for the problem-solving process.The problem solver, whether human or machine, is conceptually distinct from the user and can use any problem-solving strategy, including backward reasonirg, to solve the problem. In the procedural use of logic, the user formulates knowledge and posesproblems bearing in mind the problem solver's problem-solving stratery. The user programs the problem solver by assessingthe effect of his statements on the problem solver's behavior. The more sophisticated the problem solver, the more effective it is for declarative modes of use but the more difficutt it is for the procedural programmer to predict or control its behavior.
552
TOGIC PROGRAMMING
Relational Databases.Relational databases have emerged from the world of commercial data processing.Like logic databasesthey use logic declaratively to express queries to databases;unlike logic databasesthey treat databasesas modeltheoretic relational structures rather than as sentences expressedin formal logic. Question answering is a model-theoretic process of unraveling truth definitions to evaluate the query in the relational structure. Queries can be arbitrary formulas of first-order logic augmented with aggregation operators such as set-of. Alternatively, and equivalently, a relational databasecan also be viewed as a special caseof a logic databasein which the database consists of variable-free atomic assertions. Queries are equivalent to Horn-clause queries augmented with negation as failure (29). These two contrasting views of the logical nature of relational databaseshave led to great confusion, includirg, €.g., conflicting claims that recursion can be represented in firstorder logic and that it cannot (30). Query Optimization. Many relational databasesystemsuse query optimizers to analyze the form of queries and to determine appropriate evaluation strategies. The resulting strategies are generally sensitive to the I/O pattern of relation arguments. Thus, e.9., a query of the form ? John supplies x & x costs y "Find the cost of articles supplied by John" will be evaluated left to right, whereas a query such as ? x supplies y & y costs l0A "Find suppliers of articles which cost tUg" will be evaluated right to left. It can be argued therefore not only that relational database systems can be regarded as logic databasesand consequently as declarative logic programs but also that for certain classes of "programs" they are more sophisticatedthan PROLOG. Of course, this argument ignores the fact that PROLOG has to deal with more complicated databasesand has to cater to both declarative and procedural modes of use. Nonetheless,the argument doespoint out someof the possibilities for improving logic programming languages such as PROLOG. Greater use can be made both of compile-time program transformation and of more sophisticatedrun-time execution strategies. Program transformation as originally developed for functional progTamming and extended to logic programming can be viewed as subsuming query optimization. More intelligent execution strategies are discussedbelow (seeIntelligent Execution Strategies).
Element(A L 2) Element(A 2 4) : Element(A 100 200) or by a general rule, Element(A i x) if 1 = i & i = 100 & Times(2 i x) The use of databases as data structures has the advantage that accessto the data can be obtained by arbitrary queries, €'8'' ? Element(A 50 x) ? Element(A x 100) ? Element(A x y) & Element(B z y) The implementation is responsible for arranging efficient acC ESS.
The problem with databasesas data structures arises when they are updated. Considerthe operation that interchangesthe ith and jth elements of an array represented by the relation Element as above. The goal ? Interchange(AZ 3 x) for example, can be solved for x by a single clause, Interchange(x i j inter(x i j)) However, it requires an inordinate amount of computation to determine the elements of the new array inter(A 2 3), which results from the interchange; the following clauses define its elements: Element(inter(x i j) i u) if Element(x j u) Element(inter(x i j) j u) if Element(x i u) Element(inter(x i j) t
&k+i&k+j The last clause is a frame axiom, and the inefficiency inherent in its use is called the frame problem. With regard to the present example this inefficiency would be manifested wherever the frame axiom had to be repeatedly invoked (e.g.,in the execution of some sorting algorithm using interchange operations) in order to compute the elements of the intermediate arrays. Several proposals have been made to deal with the frame problem in logic programming, but there is currently no generally agreed solution. ExpertSystems Relationwith Rule-Based
Databasesas Data Structures.Many uses of addition and deletion of clausesduring program execution in PROLOG can be interpreted as manipulating databasesas data structures. Databasescan perform all of the functions of recursive data structures as well as the functions of conventional data structures such as arrays. For example, the arcay A that consistsof the first 100 even numbers can be defined either by clauses that enumerate its elements,
ProductionRules. Many expert systemsare implemented in the form of production rules (seeRule-basedsystems).Sometimes these have a logical form if conditions then conclusion Sometimesthey take the form if conditions then actions
LOGICPROGRAMMING 553 which is more suggestive of stimulus-response theories of behavior. Both kinds of rule-based system have traditionally been implemented in LISP. Several experiments have been performed to reimplement in PROLOG expert systemsoriginally implemented in conclusion-conditions rule-based form. Hammond in particular has comparedPROLOG with EMYCIN (31). EMYCIN rules do not contain explicit variables. The effect of variables has to be obtained by indirect reference to global contexts, which are reminiscent of object-oriented proglamming (qv). PROLOG (and logic programming in general) requires that such contexts be represented explicitly as relation parameters. Hammond has shown that the lack of explicit variables in EMYCIN means that many rules are needed in situations where PROLOG could use only a single rule and many items of data (conditionlessrules). For example, the two EMYCIN rules
tion may rely upon side effects.An arguably more transparent formulation is x is-a-citizenif x born-in-the-USA In responseto a query ? x is-a-citizen execution will reduce the query to the subgoal ? x born-in-the-USA which can be regarded as a request for input in the form of atomic assertions, John born-in-the-USA Mary born-in-the-USA etc.
if problems-entered- pain and not aspirin-unsuitable then aspirin-recommended if problems-entered - diarrhea and not Lomotil-unsuitable then Lomotil-recommended could be rewritten in PROLOG as x recommended-fory if y has-problem z & x suppressesz & not x unsuitable-for y aspirin suppressespain
This input data might be made available as an integral part of the sourceprogram, by a separatefile of assertions,or by interacting with the user. In the latter two casesthe total knowledge base used to answer the query comprises not only the clausesof the sourceprogram but also the clausessupplied by the file or the user. In this view, input is just a componentof the total knowledge base. Analogously, output can be regarded as consisting of those consequencesof the knowledge base that are answers to the query. Thus, the output elicited by the query ? x is-a-citizen' is a set of proven logical consequencesof the total knowledge base of the form John is-a-cituzen
Lomotil suppressesdiarrhea Both PROLOG and EMYCIN reason backward. In contrast, expert systemsthat use condition-action rules first test the conditions and then perform the actions. Sometimes the actions are simply conclusions,in which case the system implements forward reasoning. More often the actions side effect someglobal database.The logical status of such side effects is far from obvious. Query the User: Declarative Input-Output. Input and output facilities are notable among those features of typical PROLOG systems whose side effects compromisethe declarative meaning of programs. The increasing role of logic programming in interactive, knowledge-based applications makes it particularly important to employ logically satisfactory ways of characterizing IIO processes. Logical transparency suffers when I/O is expressedmerely through extralogical read-write calls, as in the following: x is-a-citizen if print (was x born in the USA?) & read (yes) & print (I confirm x is a citizen) This clause'smeaning is potentially obscuredby the presence of the system-providedread and print calls whoseimplementa-
is-a-cittzerl I:" This treatment makes I/O a purely logical concept,free from system-dependentside effects. It is a dominant feature of the query-the-user system developedby Sergot (32) and incorporated in the augmented-PRoloG expert-system shell APES built by Sergot and Hammond (33).There, the knowledgebase is seen as distributed between both machine and user, each of whom may query the other in order to establish the knowledge base and its logical consequences. Heuristic Programmingversus Algorithmic Programming. Many of the advantages claimed for expert systems are arguably advantagesofthe manner in which they are usually implemented-using AI languages and knowledge-representation schemesthat separate knowledge from the inference mechanisms that put that knowledge to use. Such separation of knowledge from use renders knowledge easier to understand and easier to change. It also facilitates the development of systems that can explain and justify their conclusions. Somecommentators on expert systemr (qr) seemto identify expert systems with the general methodology of separating knowledge from use. If their analysis were correct, every wellstructured PROLOG program would be an expert system. But this fails to recognizethe heuristic, as opposedto algorithmic,
554
TOCIC PROGRAMMING
nature of expert systems. PROLOG and logic programming Next(R G) more generally are equally suited to the representation of both Next(Y R) algorithms and heuristics (qv). A heuristic is a problem-solvittg method that is useful for Next(Y B) solving some class of problems but is not guaranteed to solve Next(Y G) all of them. The use of heuristics to represent domain-specific knowledge and the successiverefinement of heuristics by trial Next(B R) and error are characteristic features of expert systems. Such Next(B Y) heuristic, trial-and-error programming contrasts with traditional software engineering methodology, which favors the deNext(B G) velopment of correct and complete programs from rigorous Next(G R) specifications. Next(G Y) The separation of knowledge from use,which is characteristic of rule-based languages such as PROLOG, is especially Next(G B) well-suited for heuristic, trial-and-error programming. It facilitates the assimilation of additional heuristics and the correcgiven in arbitrary order. Suppose tion of eruors.The traditional software engineering methodol- The clausesand subgoalsare to the problem. The same solution find a to is used PROLOG ory aims instead to eliminate errors by the sound derivation of to solve the first three used be Y) will Next(R first clause programs from specifications. the subgoals leaving subgoals, Logic programming is conduciveto both styles of programming; indeed, becauseof its declarative nature based on for? Nextff Y) & Next(Y u) & Next(Y Y) & mal logic, it has greater potential for rigorous program develprogramming formalisms. have conventional than opment Next(Y u) & Next(Y Y) & Next(u Y)
IntelligentExecutionStrategies Roleof lntelligent Execution.The use of text order for goals and clauses gives the PROLOG programmer a powerful and relatively easy-to-usetool for controlling program execution. More sophisticated tools are comparatively more difficult for the programmer to control. Such tools are likely to be used, therefore, only by relatively sophisticatedprogrammers or autonomously by the system itself.
The first remaining subgoal is now unsolvable, due to the substitutions
Dependency-DirectedBacktracking.Some of the problems with PROLOG's problem-solving stratery are exemplified by Luis Moniz Pereira's formulation of the map-coloringproblem. The problem is to show that all of the regions in a map such as
trying to solve it in the next available way. But no new solution of the subgoal can affect the substitutions for y and z that causedthe original failure. The subproblem
Y_Y
and z-Y
PROLOG therefore backtracks to the previous subgoal Next(R v)
Nextfi Y) repeatedly fails as before. Depend.n.y-directed backtracking (qv) strategies have been devisedto identify the causeof failure and to direct backtracking to a previous subgoal whose different solution can remove the .urrc. of failure. In this case such a backtracking strategy would try a different way of solving the secondsubgoal
can be assigrledone of four colors,R, Y, B, or G without assigning the same color to two adjacentregions. The problem can be specified by the goal
Next(R z) Dependency-directedbacktracking can be likened to learning from one's mistakes.
? Next(* y) & Next(x z) & Next(x v) & Next( y z) & Next(y u) & Next(y v) & Next(z u) & Next(z v) & Next(u v) The relation Next defines all acceptablepairs of colors that can be assigned to adjacent regions. It can be defined by the clauses Next(R Y) Next(R B)
Subgoal Selection. An alternative to dependency-directed backtracking is often to select the right subgoal in the first place. In the example of the map-coloring problem above,having solved the first two subgoals as before, leaving the subgoals ? NextG v) & Next(Y Y) & Next(Y u) & Next(Y v) & Next(Y u) & Next(Y v) & Next(u v)
LOGICPROGRAMMING sss an intelligent subgoal-selectionstrategy would focus on the secondremaining subgoal,
transformation can often obtain at compile time the same effect as improved program execution at run time.
Next(Y Y) would recognize its unsolvability, and would backtrack normally to the previous subgoal. Several useful heuristics can be used to guide the selection of subgoals.For example, select a subgoal containing fewest variables, or select a subgoal having fewest expectedsolutions. Both of these heuristics are especially suitable for declarative programming, where the user delegatesresponsibility for efficiency to the program executor.However, neither of these heuristics is foolproof. Examples can easily be constructed in which they lead to disastrous results. Other subgoal-selectionstrategies are better suited to procedural, user-controlled progTamming style. The most common of these strategies are probably eager, Iazy, and parallel or pseudoparallel execution. Eager execution is controlled by working forward from inputs to outputs, executing subgoals eagerly as soon as sufficient data is available. Eager execution is also associatedwith data-flow evaluation of functional programs. Lazy execution is controlled by working backward from outputs to inputs, executing subgoalslazily only when their outputs are required as inputs for other subgoals.Lazy execution is also called "call by need" in functional programming languages. Subgoalsthat share no variables can often usefully be executed in parallel or at least in pseudoparallel on sequential machines. This can be especially useful when one of the subgoals proves to be unsolvable. IC-PROLOG and some functional programming languages provide facilities for eager,lazy, and pseudoparallelexecution. Loop Detection. Declarative use of PROLOG can often give rise to nonterminating loops. Consider, for example, the program John likes Mary Mary likes x if x likes Mary ? Mary likes Mary
Parallelismof Logic Programs Scopefor Parallelism.The lack of commitment of logic programs to particular processingmethods makes them receptive to parallel processing.Procedural calls can be executedin parallel rather than in PROLOG'sleft-to-right sequence;multiple procedures responding to any call can be tried in parallel rather than one at a time; backward reasoning can be combined in parallel with forward reasoning; and at the lowest processing level, argument matching can be achieved using parallel unification algorithms. All such parallel processing schemes can be applied to logic programs while respecting their logical semantics. Comparisonwith ParallelEvaluationof FunctionalPrograms. Several parallel machine architectures have been designed. Until comparatively recently, their support of declarative programming was mostly restricted to functional languages. A typical example is the implementation of the HOPE language on the ALICE graph-reduction machine (34). There, &il equational function definition like f(x y) _ B(x) x h(x y) is regarded as a rewrite rule which, when invoked to evaluate an expressionlike f(a b), rewrites it as a set of expressionsg(a), h(a b), and their product, to be evaluated. These functional expressions,linked in a dynamic graph structureo are represented by communicating packets in a pool to which many parallel processorshave common access. The Horn-clause analog of the equation above is F(x y z) if G(x u) & H(x y v) & Times(u v z) This could be used to processa goal ? F(a b z) by executing the G, H, and Times calls in AND parallel; the fact that the logical outcome of this is independent of the relative speedsof the ensuing parallel evaluations is a direct consequenceof the AND nondeterminism of logic. A variety of controls can be superimposedupon this general idea in order to constrain parallel execution in computationally useful ways. The best known controls are the eager and lazy "producer-consumer" protocolsthat make the activation and suspension of call evaluations contingent upon the data flow through the calls' variables. Such protocolsare useful even for single-processorsystems implementing (quasiparallel) coroutining, as demonstrated by the IC-PROLOG system (L4). The eight-queensproblem provides a simple demonstration of this. The conventional algorithm for the problem can be regarded as a consumer-producer execution of the logic specification
The secondclauserepeatedty reducesthe original goal to itself without end. It is relatively easy to design and implement loop-detection strategies that eliminate this type of loop. In general, however, it is impossible to detect and avoid all possible loops becausethe existence of a general loop-detectionalx solves-the-8-queens-problem if x is-an-8-queen-configuration gorithm would contradict the unsolvability of the halting & x has-no-takeable-queens problem. Moreover, even relatively straightforward types of loops can be prohibitively expensive to detect. in which partially completed configurations in the form of parLooping is a greater problem with declarative uses of logic tially instantiated terms generated by the first call are reprogramming than it is with procedural uses. In both cases, jected immediately whenever they violate the requirements of however, some improvement can be obtained by performing the second.By contrast, the normat PROLOG execution of the "compile-time" program transformations. In general, program calls in sequencewould pointlessly generate many completed
556
TOGIC PROGRAMMING
configurations from the first call, which could not possibly satisfy the second. SearchParallelism.The OR nondeterminism of logic programs raises the possibility of executing alternative computations in parallel. OR parallelism can be approximatedin single-processor machines through the use of breadth-first search, &s exemplified by the LOGLISP system of Robinson and Sibert (27). The closestanalog of OR nondeterminism in functional languages is the nondeterminism of the order of evaluation of different branches of conditional expressions.Such branches can even be evaluated in parallel. However, OR nondeterminism is closerto the searchnondeterminism associatedwith database-queryevaluation. Unrestrained exploitation of OR parallelism, both by itself and, even more, in combination with AND parallelism, can overwhelm the resourcesof even the most powerful architectures. Consequently,most designs for parallel logic systems constrain, by appropriate language restrictions, the degree of parallelism attainable in both AND and OR modes.Thus, from practical necessity, they exploit less nondeterminism than that inherent in logic itself. PARLOC and Concurrent Prolog. Two parallel logic programming languag€s, PARLOG and Concurrent PROLOG, have recently been developed to exploit AND parallelism. PARLOG was developedby Clark and Gregory (35) and Concurrent PROLOG by Shapiro (36). Both draw upon features of IC-PROLOG and of the parallel relational language developed by Clark and Gregory (37). For efficiency's sake, both languages restrict the degreeto which the inherent nondeterminism of logic is exploited for parallel execution. PARLOG distinguishes between singlesolution and all-solutions relations. A definition of the former kind consistsof a set of clausesaccompaniedby a mode declaration governing the ll0 pattern of calls to which the definition may respond. In general, the clausescontain guards that are conjunctions of subgoals executed before the main subgoals in the clause bodies. The guards of all clauses that respond to a call are executed in parallel: The first guard to be solved commits execution to the use of this guard's clause in order to processthe invoking call, the other clausesbecoming inoperative. This is the committed-choice(don't-care)form of nondeterminisffi, first incorporated in Dijkstra's (38) guarded commands.Conjoined calls to single-solution relations are executedin AND parallel, with their shared variables serving as two-way communication channels. It is the special quality of the logical variable, in its capacity to receive or transmit arbitrarily instantiated arguments representing messages,that confers the extra power that parallel logic languages have over most other languages designed for parallel processing. The combination of modes and committed-choice nondeterminism ensures that shared variables becomebound to a single solution, facilitating parallel execution. These facilities in combination enable the programmer to expressa wide range of "concurrent process"behaviors. All-solutions relations provide for a form of OR parallelism (don't-know nondeterminism). An all-solutions call invokes an unmoded, unguarded clause set, and the resulting separate solutions are collected into a single-term aggregate solution. The combination of these AND and OR parallel features of PARLOG achieves less parallelism than that theoretically
possiblefor computing all alternative solutions to an arbitrary conjunction of calls. Concurrent PROLOG offers comparablepower but provides for read-only annotation of variables in place of mode declarations. As a result of such distinctions in language features, Concurrent PROLOG and PARLOG make differing demands upon their compilers, their host architectures, and their users' programming styles. They are similar, however, in requiring, for their effective use, considerablesophistication on the part of their users. Both have exerted a strong influence on the design of the guarded Horn-clause kernel language (39) being developedfor the proposedParallel Inference Machine in the JapaneseFifth Generation Project. Relationwith Object-OrientedProgramming Object-orientedprogramming (qv), as exemplified in such programming systemsas SIMULA, Smalltalk, Actors, and Loops, offers an attractive paradigm for "programming in the large." Several attempts have been made, therefore, to understand the relationship between object-oriented programming and logic programming. Zaniolo, in particular, has shown how to implement object-orientedprogramming in PROLOG (40), and Shapiro and Takeuchi have shown how to do so in Concurrent PROLOG (41).Thesetwo implementationsare basedon different interpretations of object-oriented programming in logic programming terms. These interpretations are referred to as the abstract-data-typeinterpretation (Zaniolo)and the process interpretation (Shapiro and Takeuchi), respectively. Object-Oriented Programming.Object-oriented programming can be viewed as combining three independent but related features: abstract data types, inheritance hierarchies (qv), and parallelism. Knowledge is organized around "objects" which, like abstract data types, consist of individuals or classesof individuals. The methods used to determine attributes of objectsor to perform operations on them are encapsulatedwithin the object bodies and can be implemented or altered without affecting methods encapsulatedwithin other objects.This facilitates using diverse methods for dealing with similar problems within different objects and even using diverse formalisms. Objectscan be organized in hierarchies, and methods associated with objects can be inherited by subobjects.As in the notion of frames, inheritance of a method may be blocked by associatinga more specificmethod directly with the subobject. Computation is performed by objects sending messagesto one another. A typical messageis a request to solve a problem. Messagescan be sent and executedin parallel. Interpretation.In this interpretation arAbstract-Data-Type guments of relations are assigned a data type. For example, the first and secondarguments of the relation Length(* y) might be assignedthe types "list" and "nonnegative-integer," respectively. The single argument in the relation Sentence(x) might be assigned the type "word-list." The definitions of Length and Sentence would be associatedwith the objects, which are the data types of their arguments. Objects, such aS "word-list" and "Iist" can be arranged in
LOGICPROGRAMMING hierarchies, and subobjectscan inherit methods from superobjects. For example, the object"word-list" can inherit the definition of Length from the superobject"list." Messages, which are problems to be solved, are sent to the objects,which have the methods to solve them. Such a notion of object-orientedprogramming provides an abstract, overall structure within which lower level clauses can be organized. It can be superimposedon logic programs without affecting their logic in any way. A form of objectoriented programming based on the abstract-data-type interpretation has been incorporated in ICOT's systems programming language ESP, which is implemented in the PROLOG-like kernel language KLO (42). ProcessInterpretation.This is almost the opposite of the abstract-data-type interpretation. Objects are processes,and messagesare data items flowing in I/O streams. Processesare representedby relations and llo streams by relation arguments, as in PARLOG and Concurrent PROLOG. The processinterpretation can be illustrated by the following example of two generators connected to a printer via a merger in a data-flow network:
s57
oriented programming is the basis of ICOT's knowledge-representation Language Mandala implemented in KLl, and KLz, the successorsto KLO. Conclusion Logic programming attempts to unify different formalisms in different areas of computing. Logic programming is generally regarded as a development of AI. However, it also has important links with formal methods of software engineering and with the field of databases. Another important relationship within the area of computer science, which has not been developed here, is with formal language theory and computational linguistics. Logic programming is often identified with the language PROLOG. Indeed, to a large extent the successof logic programming is due to the successof PROLOG. This must not distract attention, however, from other logic programming languagesunder development,from the developmentof parallel architectures, or from other longer term developments. Logic programming is as much a research project for the future as it is a collection of results from the past.
Generate
BIBLIOGRAPHY 1. J. A. Robinsor, "A machine-orientedlogic basedon the resolution principle," JACM 12,23-41 (1965). 2. D. W. Loveland, "Mechanical theorem proving by model elimination," JACM L5,236-251 (1968).
Generate
Items to be merged and printed are sent as messagesfrom one object processto another. The data-flow diagram can be representedby the collection of goals: ? Generate(x)& Generate(y) & Merge(xy z) & Print(z) where x, y, and z range over possibly nonterminating lists that represent I/O streams. The subgoalscan be executedas parallel processes.The programmer can control execution by IIO mode declarations in PARLOG or by read-only variable annotations in Concurrent PROLOG. An unannotated definition of "merge" is Merge(NIL NIL NIL) Merge(u.x y u.z) if Merge(x y z) Merge(x u.y u.z) if Merge(x y z) Both PARLOG and Concurrent PROLOG make a committed choice to one of the merge procedures.This commitment is time dependent-the first stream that has input available transmits its input to the output stream. The merge definition is also nondeterministic in the sensethat if both input streams have items to be merged, which item will be transferred first to the output stream dependson whether the secondor the third clause is used. The processinterpretation is well-suited for concurrent programming applications. However, reliance on time dependency, committed-choicenondeterminisffi, and, possibly, infinite processescan sometimescompromisethe declarative logic of the resulting programs. The processinterpretation of object-
3. C. Hewitt, PLANNER: A Language for Proving Theoremsin Robots," Proceedings of the First International Joint Conferenceon Artiftcial Intelligence,Washington, DC, pp. 295-301, 1969. 4. R. A. Kowalski and D. Kuehner, "Linear resolution with selection functiott," Artif. Intell. 2,227-260 (1971). 5. D. W. Loveland, A Linear Format for Resolution, Proceedingsof IRIA Symposium on Automatic Demonstration, Versailles, France, Lecture Notesin Mathematics, Vol. 125, Springer-Verlag, Berlin, pp. L47-L62, 1970. 6. R. Reiter, "Two results on ordering for resolution with merging and linear format," JACM 18, 630-64G (1971). 7. P. J. Hayes, Computation and Deduction, Proceedingsof the Second Symposium on the Mathematical Foundations of Computer Science, CzechoslovakAcademy of Sciences,High Tatras, Czechoslavakia,pp. 105-118, L973. 8. R. A. Kowalski, The Predicate Calculus as a Programming Language, Proceedings of the First Symposium on the Mathematical Foundations of Computer Science,Jablonna, Poland, 1972. 9. A. Colmerauer et al., Un Systeme de Communication HommeMachine en Francars, ResearchReport, GroupeIntetligence Artificielle, Universite d'Aix Marseille, Lumiry, lg7g. 10. R. S. Boyer and J. S. Moore, "The sharing of structure in theorem proving programs," Machine Intell. ?, 101-116 (1972). 11. M. Bruynooghe, The Memory Management of pROLOG Implementations, in K. L. Clark and S.-A. Tarnlund (eds.),Logic programrning APIC Studies in Data Processirg, Vol. 16, Academic Press,London, pp. 83-98, L982. L2. R. A. Kowalski, Predicate Logic as a Programming Language, Proceedings of IFIP-74 Stockholm, Sweden, North Holland, Amsterdam, pp. 569-574, L974. 13. R. Hill, LUSH resolution and its completeness, DCL Memo No. 28, Department of Artificial Intelligence, University of Edinburgh, u.K., r974. 14. K. L. Clark and F. G. McCabe,The Control Facilities of IC-PRO-
558
15.
16. 17. 18.
LOGIC,PROPOSITIONAT
LOG, in D. Michie (ed.),Expert Systemsin the MicroelectronicAgr, Edinburgh University Press Edinburgh, U.K., Lg7g. A. Colmerauer, PROLOG and Infinite Trees, in K. L. Clark and S.-A. Tarnlund (eds.),Logic Programming APIC Studies in Data Processirg, Vol. 16, Academic Press, London, pp. 2BL-2s1, 1982. J. W. Lloyd, Foundations of Logic Programming, Springer-Verlag, Berlin, 1984. M. H. van Emden and R. A. Kowalski, "The semanticsof predicate logic as a programming language:' JACM zg,7gg-742 (lgTG). R. Weyhrauch, 'nProlegomenato a theory of mechanizedformal reasonirg," Artif. Intell. 13, 133-170 (1980).
19. K. L. Clark, Negation as Failure, in H. Gallaire and J. Minker (eds.),Logic and Data Bases,Plenum, New York, pp. 293-322, 1978. 20. J. Jaffar, J-L. Lassez,and J. W. Lloyd, Completenessof the Negation as Failure Rule , Proceedingsof the Eighth International Joint Conferenceon Artifi,cial Intelligence, Karlsruhe, FRG, 1983. 2L. A. Bundy and R. Welh&ffi, Using Meta-Level Inference for Selective Application of Multiple Rewrite Rules in Algebraic Manipulation, Proceedings of the Fifth Conferenceon Automated Dedution, Les Arcs, France, (Springer-Verlag,Berlin) pp.24-38, 1980. 22. K. A. Bowen and R. A. Kowalski, Amalgamating Language and Metalanguage in Logic Programming, in K. L. Clark and S-A. Tarnlund (eds.),Logic Programming, APIC Studies in Data Processing,Vol. 16, Academic Press,London, L982,pp. 153-L72. 23. K. L. Clark, "Predicate Logic as a Computational Formalism, Ph.D. Thesis,Imperial Collegeof Scienceand Technology,University of London, 1979. 24. C. J. Hogger, "Derivation of logic programs," JACM 28, 372-392 (1981). 25. K. L. Clark and J. Darlington, "Algorithm classificationthrough synthesis," Cornput. J. 23r 6L-65 (1980). 26. S. O. Aanderaa, "On the decision problem for formulas in which all disjunctions are bin ary," Proceedingsof the SecondScandinauian Logic Symposium,Oslo, Norway, 1970,pp. 1-18. 27. J. A. Robinson and E. E. Sibert, Logic Programming in LISP, Research Report, School of Computer and Information Science, SyracuseUniversity, New York, 1980. 28. C. C. Green, "Theorem proving by resolution as a basis for question-answering systems,"Machine InteIL 4, 183-205 (1969). 29. J. W. Lloyd and R. W. Topor, "Making Prolog more expressive,"J. Logic Program. 1,225-240 (1984). 30. D. Harel, "Review of the book Logic and Databases,"Comput.Reu. 2L,367-369 (1980). 31. P. Hammond, Micro-PROLOG for Expert Systems,"in K. L. Clark and F. G. McCabe (eds.),Micro-PROLOG: Programming in Logic Prentice-Hall, Englewood Cliffs, NJ, pp. 294-319, 1984. 32. M. J. Sergot, A Query-the-User Facility for Logic Programming, in P. Degano and E. Sandewall (eds.),Integrated InteractiueComputer SystemsNorth Holland, Amsterdam, pp.27-41, 1983. 33. P. Hammond and M. J. Sergot, A PROLOG Shell for Logic Based on Expert Systems, Expert Systems,Proceedingsof the Coruference British Computer Society, Churchill College,University of Cambridge, U.K., pp. 95-104, 1983. 34. J. Darlington and M. Reeve,ALICE: A Multi-processorReduction Machine for the Parallel Evaluation of Applicative LanguaB€s, Proceedings of the ACM Conferenceon Functional Programming Languages and Computer Architecture, Portsmouth, NH, pp. 6575,1981. 35. K. L. Clark and S. GregorY,"PARLOG: Parallel programming in logic," ACM Trans. Progr. Lang.8(1), I-49 (January 1986). 36. E. Y. Shapiro, A Subset of Concurrent Prolog and its Interpreter, ICOT Technical Report TR-003, Institute for New Generation Computing Technology,Tokyo, Japan, 1983.
37. K. L. Clark and S. Gregory, A Relational Language for Parallel Programming, Proceedings of the ACM Conferenceon Functional Programming Languages and Computer Architecture, Portsmouth, NH, pp. 171-178,1981. 38. E. W. Dijkstra, A Discipline of Programrning, Prentice-Hall, Englewood Cliffs, NJ, L976. 39. K. Ueda, Guarded Horn clauses,ICOT Technical Report TR-103, Institute for New Generation Computing Technology, Tokyo, Japatr, 1985. 40. C. Zaniolo, Object-Oriented Programming in PROLOG, Proceedings of the International Symposium on Logic Programming, Atlantic City, NJ, IEEE Computer Society Press, New York, pp. 265-270, 1984. 4L. E. Y. Shapiro and A. Takeuchi, Object Oriented Programming in Concurrent Prolog, in T. Moto-oka (ed.), New Generation Computing, Vol. 1, Springer Verlag, Berlin, pp. 25-48, 1983. 42. T. Chikayama, Unique features of ESP, Proceedingsof the International Conferenceon Fifth Generation Computer Systems, pp. 292-298, 1984. General References J. A. Campbell (ed.), Implementations of PROLOG, Ellis Horwood, Chichester,U.K., 1984. K. L. Clark and F. G. McCabe, Micro-PROLOG: Programming in Logic, Prentice-Hall, Englewood Cliffs, NJ, 1984. K. L. Clark and S-A. Tarnlund (eds.), Logic Programming, APIC Studies in Data Processi^g,Vol. 16, Academic Press, London, L982. W. F. Clocksin and C. S. Mellish , Programming in PROLOG, Springer Verlag, Berlin, 1981. R. Ennals, Beginning Micro-PROLOG, Ellis Horwood, Chichester, U.K., and Heinemann Computersin Education, London, 1983. C. J. Hogger, Introduction to Logic Programming, APIC Studies in Data Processitg, Vol. 2L, AcademicPress, London, 1984. R. A. Kowalski, Logic for Problem Soluing, Elsevier North Holland, New York, L979. D. H. D. Warren and M. van Canaghem (eds.),Logic Programming and its Applications, Ablex, Norwood, NJ, 1985. R. A. KowalsKr ANDC. J. Hobcnn University of London
LOGIC,PROPOSITIONAT Propositional logic is the study of inferencesthat can be made from propositions. Roughly, propositions are the "meanings" or "thoughts" expressedby declarative sentences(1,2).Secondarily, it is also the study of the representation of information by propositions. Other names for it are propositional calculus, sentential logic, and-when its subject matter is taken to be those things that can have truth values (i.e., that are either true or false)-it is often called truth-functional logic. The bearers of truth values are sometimesconsideredto be propositions, (declarative) sentences,or truth functions. Typically, propositional logics are distinguished from first-order logics by their lack of internal analysis of propositions (e.g.,they do not distinguish between subject and predicate (see Logic, predicate). Such logics also exist for other types of sentences(such as imperatives) and for more than two truth values. [For details, see the logics of imperatives presented in Refs. 3 and 4, the many-valued logics discussedin Ref. 5, and the various
559
LOG|C, PROPOSIilONAI
Froceedings of the International Syrnposium on Multiple-Valued Logicl. The representational system of propositional logic is its underlying language. This consists of propositions and propositional (or, in the case of truth-functional propositional logic, truth-functional) connectives; a syntax that specifies the grammar of propositions; and a semantics that provides the "meanings" of propositions in terms of their truth conditions. The deductive system of propositional logic consists of rules that only permit inferences that lead from truths to truths (thereby preventing inferences that would lead from truths to falsehoods).Its (deductive) syntax consists of such rules and' axioms, and its semanticscharacterizesthese rules in terms of truth values. (This entry is concernedprimarily with truthfunctional propositional logic, except where indicated.) Languageof Propositionallogic There are two kinds of propositions: atomic and molecular (also called simple and compound).Molecular propositions are formed from (one or more) atomic ones by means of truthfunctional connectives(or truth-functional operators).For instance, Andrea is a philosopher is atomic;
Table 1. Common Two-Place Truth-Functional Conjunction Inclusive disjunction Material conditional Material biconditional Exclusive disjunction Joint denial Disjoint (or alternative) denial
Andrea is a philosopher Mike is French by the connectiveand and the connective(or operator) not. But Ruth believes that Marvin is a logician
(and) (or) (if. .then. .) (if and only iff) (xor; i.e., either . . or . not both) (nand) (nor)
but
discussion in an AI context and Ref. 9 for a discussion in a Iinguistic context.) Syntax. A formal syntax for a language of propositional logic can be presented by a recursive definition of well-formed proposition. One such definition, using the most common connectives, follows. 1. The letters A, . . ., Z and these letters subscripted with positive-integer numerals (e.g., AL, Bzt) are well-formed (atomic) propositions. 2, If P and Q are well-formed propositions, then so are P
(the negation of P)
(P A q)
(the conjunction of P with Q)
(P V Ql (P -+ q)
(the inclusive disjunction of P with Q)
Andrea is a philosopher and Mike is not French is molecular, formed from the two atomic propositions
Connectives
(the material conditional whose antecedent is P and whose consequentit Q)
(P <+ q)
(the material biconditional of P with Q)
(P + q)
(the exclusive disjunction of P with Q)
(P lQl
(the joint denial of P with Q)
(P J Q)
(the disjoint denial of P with Q)
3. Nothing else is a well-formed proposition. The boldfaceletters in the secondclause of this recursive definition are metavariables ranging over propositions.The outer parenthesesin clause 2 prevent ambiguity. For instance, P A Q V R is ill-formed; instead, one must write either ((P A q) V R) or (P n (Q V R)), depending on which is wanted. On occasion,parenthesescan be dropped for purposesof readability. Other systems can be defined by using different symbols or by using different conventions for disambiguation-such as precedenceof connectives. There are several ways to express the connectives in English, many of which have non-truth-functional connotations; however, propositional logic studies only the truth-functional properties of such phrases. Thus, propositional logic ignores the important distinctions between
is not molecular (rather, it is atomic), since the operator.Euth belieuesthat is not truth-functional. [A branch of logic that doestreat the latter proposition as molecular is modal logic, in particular, doxastic modal logic (see Modal logic; Belief systems).1In propositional logic atomic propositions are considered to be unanalyzable; the branch of nonmodal logic that analyzes atomic propositions is called predicate logic (see Logic, predicate). In truth-functional propositional logic a molecular proposition must be such that its truth value is a function of the truth values of its atomic parts. The particular function is determined by the connectives.In addition to the one-place"connective" or operator, negation (usually expressedin English by Marie is a vice-president and Ben is a clerk not or it is not the case that), there are 16 two-place truthfunctional connectives.Of these, the most common are given and in Table 1. For a list of others, seeRefs. 1 and 11. There are, of course,other n-placeconnectives(for n * 2), for example, the Marie is a vice-presidentbut Ben is a clerk three-place connective if . . then . . . else (see Ref. Z); but these are either trivial (when n - 1) or (for n > 2) expressible (the latter suggesting, perhaps, that Ben is merely a clerk) as in terms of two-placeconnectives,as discussedbelow (Minimal well as the important distinctions between Sets of Connectives).(For interesting generalizations of these connectivesas operators on sets of propositions,seeRef. 8 for a I got into bed and I fell asleep
s60
LOGIC,PROPOSITIONAT
Table 2. Semantics of Molecular -t P is true if and only if (P A Q) is true if and only if (P V Q) is false if and only if (P -' Q) is true if and only if (P *' Q) is true if and onlv if
propositions
Semantics.The semanticsof molecular propositionscan be given by means of the equivalencesin Table 2. The semantics can also be given by means of truth tables. Typically, these have two sets of columns, one for the atomic propositions (the "input") and one for the rnolecular proposition (the "output"); and 2n rows, where n is the number of distinct atomic propositions,one for eachpossiblecombination (P + Q) is true if and onlv if of truth values of the atomic propositions.Samplesare given in Table 3 (T and F stand for true and false, respectively). (P I Ql is false if and only if (P J Q) is true Truth tables can also be used to compute the truth values if and only if of more complicatedmolecular propositions.Sometimesthis is done using a third set of columns for intermediate computations of "subpropositions,"as in Table 4. (Algorithms for comand puting with truth tables are given in Ref. 11.) I fell asleep and I got into bed Two propositions are logically equivalent if they have the sametruth values for all possiblecombinationsof truth values (the latter suggesting, perhaps, that I sleepwalk). of their atomic parts. Table 5 lists some of the important logiThere are also several other families of symbolsused for the cal equivalences. connectives,most notably Polish (or prefix) notation. Polish Minimal Sefs of Connecfives.The choice of which connecnotation has the advantage of not requiring parenthesesfor tives to use dependson one's purposes.Generally, if the landisambiguation. The first five connectivesof clause2 expressed guage of propositonal logic is to be used in a representational in Polish notation are system,especially one for natural language, then a large set of connectives is appropriate. This permits distinguishing NP between distinct but logically equivalent propositions.How/(PQ ever, for deductive purposes,a smaller number of connectives APQ is better, both becausefewer inference rules are then needed CPQ and because metatheoretic proofs about propositional logic then becomeeasier. EPQ It can be shown that all n-place truth-functional connecAs an example, the ambiguous proposition in infix notation tives can be expressedusing only negation and conjunction,or discussedearlier cannot be written in Polish notation. Instead, else negation and disjunction, or else negation and the mateone is forced to write either A/GQR or KPAQR. (It is stan- rial conditional. They can also all be expressedusing only one dard practice in Polish notation to use the letters N, K, A, C, connective,either joint denial or disjoint denial. (For further and E instead of the connectivesymbols -, A, V, -, and <+, discussionand proofs, see, e.9., Refs. 10 and L2.) Usually, a respectively.For a thorough discussionof these notational is- compromiseis found between the extremes of using all of the sues, consult a standard introductory textbook, such as Refs. connectives(for representational adequacy)and only one or 10 and 11, or, especially,Ref. 6.) two (for eleganceor metatheoretic simplicity)r It is commonto It is also important to recall that propositional logic does use negation, disjunction, and conjunctionto expressa proponot provide an analysis of the internal structure of proposi- sition in either conjunctive normal form (CNF) or disjunctive tions; thus, it does not provide a way to represent or reason normal form (DNF). In the former a propositionis expressedas about individuals or classes;that is the province of first-order a (logically equivalent) conjunction of disjunctions of atomic (or predicate) logic. propositionsand negations;in the latter, as a (logically equivP is false. both of P and Q are true. both of P and Q are false. P is false or Q is true (or both). P and Q have the same truth value. P and Q have opposite truth values. both of P and Q are true. both of P and Q are false.
Table 3. Sample Truth Tables nput P
T F
Output -rp F T
Input
Output
Input
Output
Input
P
a
(PAq;
P
(PvQl
P
T T F F
T F T F
TTT FTF FFT FFF
a
T T T F
Computations -'(PAQI (PAq;
T F F F
(P--' q;
TTT TTF TFT FFF
Table 4. Truth Table with Intermediate
(PVQI
a
Output
F T T T
T F T T
Computations Output ((PyQln-(P/tQll
LOGIC, PROPOSITIONAT
561
Table 5. Some Important Logical Equivalences Double negation Idempotency Commutative
laws
Associative laws Distributive
laws
De Morgan's laws Contraposition Material conditional Exportation
P P P (PA q; (PvQl (PA(QnR)) (PvlqyR)) (PAlqyR)) (Pv(QAR)) -'(PAQl -'(PVQ) (P' q; (P-+ q; (P -' q; (P- 1q-- R))
is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to is logically equivalent to
alent) disjunction of conjunctionsof atomic propositionsand negations.For example, the proposition (((Pt Q) V Q) -'(R A Qll is logically equivalent to the CNF proposition (- PV -, Qv R) n (P V -' Qv R) A (Pv Q V -' R)
P (PAP) (PvP) (QnP) (QVP) ((PnQ)nR) ((PyQ)vR) ((PAQ)v(PAR)) ((PYQ)n(PvR)) (- PV -'Q) (-PA-Q) (- Q --+ -r P) (-'PVQ)
-'(pA-e) ((Pn q) -- R)
velopedalong these lines are modal logic and relevancelogic. (For the former seeLogic, modal, and Ref. 13; for the latter see Ref. 14.) DeductiveSystemsof PropositionalLogic
Syntax.A deductive system for any logic can be presented in one of two ways: as an axiomatic system or by means of a natural deduction system. as well as to the DNF proposition Axiomatic Propositional Logic. An axiomatic system typi(-r Pn Qn R)V(PA-' QA- R)V(PA-' en R) cally has several axioms (which ought to be tautologies) and a minimal number of rules of inference (which ought to lead V(PAQnR) from truths to truths). To present propositional logic axiomatiAlgorithms for converting a propositioninto a logically equiv- cally, the well-formed propositions (WFPs) are restricted here alent propositionin CNF or DNF may be found in Refs.10 and to thosewhoseonly connectivesare - and -+. All WFPs of the 11. following three tautological forms, called axiom schemata, Tautologies, Contradictions,and Contingent Propositions. may be taken as axioms (others are possible;note, again, that Propositions that are true for all possible combinations of boldfaceletters are metavariables rangittg over propositions): truth values of their atomic parts are called tautologies; those (confirmation of the consequent) that are false for all possible combinations are called contra- (A1) (P + 1q - P)) dictions;and the others are said to be contingent.For example, ( 4 2 ) ( ( P - - ( Q - R ) ) - ' ( ( P Q)+ (P-R))) ((P n (P -- Q)) - Q) is a tautology; (P [ -r P) is a contradic(self-distribution) tion; and (P - Q) is contingent. Becauseof the semanticsof (contraposition) negation, the negation of a tautology is a contradiction, and (A3) ((- P--+- Q)- 1q- P)) vice versa. Alt tautologies are logically equivalent to each other, as are all contradictions. This fact is of some signifi- There is one rule of inference: cancefor representational issues since all tautologies clearly (MP) From P and (P - e), infer e (modus ponens) do not "say" the same thing. For example, (P y -' P) and ((P A (P --' Q)) -' Q) are both tautologies and hence logically Aproofof a wFP P, is a sequence P1 , . . . ,Pnof WFPs such equivalent; y€t, in an important sense,they do not "mean" the that for each k (1 < k = n), either P6 is an axiom or there are i, same thing. j < ft such that P; : (P; -- Pr). (Note: P; is not merely logically The Paradox of the Material Conditional. There are other equivalent to (P; -t Pa); it ls (P, - Pa).) A theorem of our limitations on the use of the language of propositionallogic as propositional logic is a WFP P such that there is proof a of P a representational system for natural-language sentences. (viz., Pr Most notably, the semanticsof the material conditional do not theorem-the notation for this is: rP. (Sometimes, the turncorrespond to the ordinary English use of if-then. For instile, F, is subscripted by the name of the system of logic of -r stance,((P /\ P) - Q) is a tautology simply becauseits an- w h i c h P i s a t h e o r e m . ) tecedentis a contradiction and, hence,false. But a correspondAs an example, a proof of (P + P) is given in Table 6. i n g E n g l i s h s e n t e n c e s u c h a s1 " I+f 1 - 2 a n d l + l + 2 , t h e n Comments, preceded by semicolons, are not formally part of Bertrand Russell is the Pope" does not seem to be true even the proof. Note, however, that they would be formally part of a though it is a tautology. For this reason, such phenomenaare proof that the proposition is a theorem of propositional logic, called "paradoxes of the material conditional." Attempts to that is, of a proof that F(P -+ P). overcomethese "paradoxes" have generally taken the form of The notion of "proof" can be extended to "proof from hyintroducing new, non-truth-functional operators and connec- pothes€s," where the hypotheses are nonlogical principles (or tives whose semantics are closer to their natural-language postulates) typically belonging to some particular subject matcounterparts.The two main kinds of logic that have been de- ter (e.g., laws of physics or "world knowledge"). They would A(PVQVR)
562
LOGIC,PROPOSITIONAL
Table 6. A Proof from Axioms
Table 7. A Natural-Deduction
1. ((P -- ((P -- P) + P)) ((P- (P- P))(P -- P)))
; this is an axiom, since it is a ; WFP with the form of axiom ; schema (A2), with P and R both ; replaced by P, and Q replaced '(P -- P)' ; by
2. (P -' (P -+ P))
; (A1), with Q replacedby P
3. (P -* ((P -' P) --- P))
; (A1), with Q replacedby ;'(P- P)'
4. ((P -' (P -- P)) -
;from 3, 1 by (MP)
(P + P))
5. (P -+ P)
; from 2, 4 bV (MP)
not usually be tautologies.Formally, a sequenceof WFPs Pr, . ,Pn is a proof of P, from a set of hypothesesH iff for all ft (1 < k - n), either Pp is an axiom, or Pp € H, or Pp is inferred by (MP) from previous WFPs in the sequence.The notation for t h i s i s :H t s P n i i f H : { H r , . . . , H * } , t h e n t h e n o t a t i o n i s H r , . , H* F P,, (for complete details of an axiomatic propositional logic, see,e.9.,Refs. 12 and 15). A Natural DeductionSysfemfor PropositionalLogic. A natural deduction system typically has no axioms but has several rules of inference, and it allows for the possibility of introducing assumptionsin the middle of a proof. These can be viewed as "temporary axioms" that are "discharged" when no longer needed. To present propositional logic in this fashion, the WFPs are -r and A. restricted here to those whose only connectivesare The following may be used as rules of inference: (A Introduction)
(a) From P and Q, infer (P A Ql.
Proof
1. -(AAB) 2.A
this is the first premise this is the secondpremise -r Introduction to prove -r B ; BEGIN subproof using {<3.B an assumption for use by - Introduction * 4. A sent in from line 2 of main proof (similar to parameter passing in procedures) * 5. (An B) A Introduction using lines 3, 4 {< 6. -'(AnB) sent in from line 1 of the main proof -r Introduction, from lines 3, 5, 6 -t$ * 7. -r $, by - Introduction ; END of subproof that proved -tB 8. ; returned from line 7 of subproof ; (similar to a procedurereturning a ; value)
first 52 theorems of Whitehead and Russell'sPrincipia Mathematica (19).A successorprogram, the General Problem Solver (20,21)used means-endsanalysis to solve problems in a variety of domains, including propositional logic. (Discussionsof these programs may be found in Refs. 22 and 23) Another important propositional logic program is Wang's algorithm, which is a more efficient method for determining whether a given argument is valid than using truth tables. This algorithm attempts to interpret the premisesof the argument as all true and the conclusionas false. If it succeedsin this attempt, the argument is shown to be invalid; if it fails, the argument is shown to be valid. (For details, see Ref. 11). A rule of inference that has proved to be of importance in AI and automated theorem provitg, in part becausemost of the introduction and elimination rules can be shown to be instancesof it, is:
(b) From P and Q, infer (Q A P).
(Resolution) From (-' P V Q) and (P V R), infer (Q V R)
(A Elimination)
(a) From (P n Q), infer P.
(For a discussionof AI systems that use propositional logic inferencetechniquesbasedon Resolution,seeResolution;Theorem provingi as well as such AI texts as Refs. 7 and 24-28.)
(-r Introduction)
ft) From (P A Q), infer Q' -' If both Q and Q can be inferred from
Semantics.An argument is any inference from hypotheseE (or premises) to a conclusion. Thus, rules of inference are essentially forms of argument. A rule of inference or an argument should never lead from truth to falsehood.To say that a rule of inference or an argument rs ualid means:if the hypothalong rules, A notion of subproofsis neededfor the last two are true, then the conclusionmust be true. Validity, thus, eses "sent" into the subproofs be to propositions with rules allowing be construed as a notion of truth relatiue to the premises. can and "returned" from them under certain restrictions. Subis said to be sound (in one sense)iff it is valid argument An proofs can be indicated by prefixing stars to the lines of a are, in fact, true. its hypotheses and nestlevel of the subproof,where the number of stars indicates be used for semantic inference-as opcan tables Truth As example, an (for 11). Ref. see details, ing of the subproof inference discussedin the previous secposed syntactic to the Table 7 contains a natural deduction proof of the argument tion. Here, a truth table is constructedwhose "input" columns contain the premises and whose "output" column contains the conclusion. The argument is valid iff there is no line of the semicoron, the p*;ltl."tl#;,1"*,'g rineorthe Each truth table with T in all premise columns and F in the concluand subproofs have "begin" and "end" comments; these are not sion column. formally part of the proof, but they play the same role that Propositional logic is also said to be sound in the sensethat rules for and comments do in computer programs. [For details of its theorems are tautologies. It is also complete:aII propoall 16' 11, and 10, Refs. see connectives, of inference for other sitional tautologies are theorems of propositional logic. There Natural deduction systems have been extensively investigated is a link between a proposition's being a tautology and an (17).1 p' argument'sbeing valid: For any proof Pr,. . .' Pn-t ts simon's and shaw, Logic.Newell, AI and Propositional ((P1 n proposition: there correspondsthe material-conditional Logic Theorist program (18), considered by some to be the first . . . is a tautollatter iff the is valid former The P,,). n P prove n-r) AI program, used a breadth-first search procedure to This follogic. It successfully proved 38 of the ogy (for details on these topics, see Refs 12 and 15).
(-E,imina,i.n) *dffit#;fitt}rr.m
theorems of propositional
LOGO
lows fty soundness and completeness) from the Deduction Theorem, which states that the former is valid iff the latter is a theorem. Finally, propositional logic is also decidable:There is an algorithm such that for any WFP, the algorithm decides whether the WFP is a theorem (i.e., one can use a truth table to determine whether the WFP is a tautology). However, the decidability of propositional logic is an NP-complete problem and hence computationally "intractable"; this fact has been employed in philosophical arguments to the effect that such logics are not well-suited to computational models of rationality (see Ref. 29).
BIBLIOGRAPHY 1. A. Church,Introductionto MathematicalLogic,PrincetonUniversity Press,Princeton,NJ, pp. 23-31, 1956. 2. R. M. Gale, Propositions,Judgments, Sentences,and Statements, in P. Edwards (ed.),Encyclopediaof Philosophy,Vol. 6, Macmillan and Free Press, New York, pp. 494-505, L967. 3. N. Rescher,The Logic of Commands, Routledgeand Kegan Paul, London and Dover, New York, 1966. 4. H. N. Castafreda, Thinking and Doing, D. Reidel, Dordrecht, r975. 5. N. Rescher,Many-Valued Logic, McGraw-Hill, New York, 1969. 6. M. L. Schagrin, The Language of Logic: A Self-Instruction Text, 2nd ed., Random House, New York, 1979. 7. Z. Manna, Mathematical Theory of Computation, McGraw-Hill, New York, Chapter 2, L974. 8. S. C. Shapiro, The SNePS Semantic Network ProcessingSystem, in N. V. Findler (ed.), AssociatiueNetworks, AcademicPress,New York, pp. 179-203, L979. 9. J. D. McCawley, Euerything that Linguists haueAlways Wantedto Know about Logic but wereashamedto ask, University of Chicago Press,Chicago,IL, 1981. 10. I. M. Copi, Symbolic Logic, 5th ed., Macmillan, New York, 1979. 11. M. L. Schagrin, W. J. Rapaport, and R. R. Dipert, Logic: A Computer Approach, McGraw-Hill, New York, 1985. L2. E. Mendelson, Introduction to Mathematical Logic, 2nd ed., Van Nostrand, New York, L979. 13. G. E. Hughes and M. J. Cresswell,An Introduction to Modal Logic, Methuen, London, 1968. 14. A. R. Anderson and N. D. Belnap, Jr., Entailment: The Logic of Releuanceand Necessify,Princeton University Press, Princeton, NJ, T975. 15. S. C. Kleene, Introduction to Metamathematics,Van Nostrand, Princeton, NJ, 1950. 16. D. Kalish, R. Montague, and G. Mar , Logic: Techniquesof Formal Reasoning,2nd ed., Harcourt Brace Jovanovich,New York 1980. L7. M. E. Szabo (ed.;, CollectedPapers of Gerhard Gentzen,NorthHolland, Amsterdam, 1969. 18. A. Newell, J. C. Shaw, and H. Simon, Empirical Explorations of the Logic Theory Machine, in E. Feigenbaum and J. Feldman (eds.),Computersand Thoughf,McGraw-Hill, New York, pp. 109133,1963. 19. A. N. Whitehead and B. Russell,Principia Mathernatica,2nded., Cambridge University Press, Cambridge, U.K. , LgZ7. 20. A. Newel, J. C. Shaw, and H. Simon, A Variety of Intelligent Learning in a General Problem-Solver, in M. C. Yovits and S. Cameron (eds.),Soft-organizing Systems,Pergamon, New York, pp. 153-189, 1960. 2I. G. W. Ernst and A. Newell, GPS: A CaseStudy in Generality and Problem Soluing, Academic Press, New York, 1969.
s63
22. J. R. Slagle, Artificial Intelligence: The Heuristic Programming Approach, McGraw-Hill, New York, 1971. Zg. A. Barr and E. A. Feigenbaum (eds.),The Handbook of Artificial Intetligence,Vot. 1, William Kaufmann, Los Altos, CA, 1981. 24. N. J. Nilsson, Problem-Soluing Methods in Artifi.cial Intelligence, McGraw-Hill, New York, L97L. 25. N. J. Nilsson,Principles of Artificial Intelligence,Tioga, Palo Alto, cA, 1980. ZG. B. Raph ael, The Thinking Computer: Mind Inside Matter, W. H. Freeman, San Francisco, 1976. 27. E. Rich, Artificial Intelligence,McGraw-Hill, New York, 1983. 28. P. H. Winston, Artifi,cial Intelligence, 2nd ed., Addison-Wesley, Reading,MA, 1984. 29. C. Cherniak, "Computational complexity and the universal acceptance of logic,"J. Philos.81 739-758 (1984).
General References A. E. Blumberg, Logic, Modern, in P. Edwards (ed.),Encyclopediaof Philosophy,YoL S, Macmillian and Free Press,New York, pp. 1234, L967. R. C. Jeffrey, Formal Logic: Its Scopeand Limits, 2nd ed., McGrawHill, New York, 1981. W. Kneale and M. Kneale, The Deuelopmentof Logic, Oxford University Press, Oxford, L962. W. V. O. Quine, Mathematical Logic, Rev. ed., Harper & Row, New York, 1951. W. V. O. Quine, Elementary Logic, rev. ed., Harvard University Press, Cambridge,MA, 1980. W. V. O. Quine, Methods of Logic,4th ed., Harvard University Press, Cambridge, MA, L982. W. J. Rnpaponr SUNY at Buffalo
LOGO LOGO is a programming language in the spirit of LISP invented in the MIT AI Lab to teach mathematical conceptsto little children (see Computers in education) [see S. Papert, "Teaching children to be mathematicians versus teaching about mathematics," Int. J. Math. Educ. Sci. Technol. 3, 249262 (L972)1.A program in LOGO manipulates a little device called the "turtle." This turtle moves on a large flat surface. With two commands,PENUP and PENDOWN, it is possibleto create a trace of the turtle movements. The goal of a program is usually to draw a certain figure; therefore, programming in LOGO is referred to as "turtle geometry." Typically, commands would be FORWARD 100, RIGHT 60, and BACK 100. It is possibleto define procedures.In later researchLOGO has been used to help people learn about powerful ideas and to study the acquisition of computational skills by young children (see C. J. Solomon and S. Papert, A Case Study of a Young Child Doing Turtle Graphics in LOGO," Proceedingsof the National Computer Conference,AFIPS, pp. 1049-1056, 1976). More information can be found in the MIT AI Lab LOGO Memos. J. GBr,mn SUNY at Buffalo
s64
tooPs
LOOPS
TUNAR
LOOPS is a programming environment developed at Xerox PARC that combines procedure-orientedprogramming, such as LISP, object-orientedprogramming, as in Smalltalk, u....roriented programming, which is useful for monitoring programs, and rule-oriented programming, which allows the construction of production systemslike OPS-5(qv).LOOPS allows the integration of these programming paradigms to build knowledge-basedsystems[seeM. Stefik, D. G. Bobrow,S. Mittal, and L. Conway, "Knowledge programming in Loops: Report on an experimental cours€," Artif. Intell. 4(g), g- 18 (1983);seealso D. G. Bobrow and M. Stefik, The LOOPS Manual, Technical Report KB-VLSI-81-18, Knowledge Systems Area, Xerox Palo Alto ResearchCenter (PARC), 19811.
An information retrieval (qv) system with an ATN-based nat_ ural-language I/o front-end, LUNAR was designedby woods in 1972 at BBN [seeW. Woods,R. Kaplan, and B. Nash-Webber, The LUNAR SciencesNatural LanguageInformationsystem: Final Report, BBN Report No. zBT8, Bolt Beranek and Newman, Cambridge, MA, Lg72l. A. HaNyoNG Yuunx SUNY at Buffalo
J. RosnNBERG SUNY at Buffalo
MACHACK6 Also known as The Greenblatt ChessProgram, MACHACK G was written by Richard Greenblatt, Donald Eastlake, and Stephen Crocker at MIT. It was the first computer chessprogram to play well against humans in a chesstournament. It uses a method of forward pruning to variable depths along with the alpha-beta algorithm to determine its next move (see R. D. Greenblatt, D. E. Eastlake, and S. D. Crocker, The Greenblatt ChessProgram, Proceedingsof the Fall Joint Computer Conference,1967, AFIPS Press,Montvale, NJ, pp. 801810, 1967;also M. Newborn, ComputerChe.ss, AcademicPress, New York, 1975). J. RosENBERG SUNY at Buffalo
MACHINETRANSTATION Relationship of Al to MachineTranslation:Two Views Machine translation (MT) is a specter that returns to haunt AI work on natural-language processing. MT was the first large-scale enterprise in natural-language computation, one that appeared to many to collapse after the 1966 negative report on its future (1) as an intellectually and commercially viable enterprise, though it is now clear it did not in fact do so. MT continues to provide a touchstone in AI work and is a sourceof dispute about the relation of language understanding to knowledge of the world (seeNatural-language understandirg; Representation,knowledge). At the one extreme there are those who argue that MT cannot be "solved" until AI has achievedfull understanding of
natural language with programs because any of the wellknown ambiguities of senseand reference, such as who is referred to by "several" in The soldiers fired at the women and I saw seueralfall, not only require AI methods for their solution but can also be the basis for a translation problem. So, in the examplejust given, if one had to translate that sentenceinto a language with genders-such that soldiers and women had different genders and were referred to by different forms of the translation of seueral-then AI techniques would be required for that example, such as knowledge of what happens when things are fired as a causal consequence(seeRef. 2 for treatment of this type of example). On that view, for any ambiguity of word-senseor -reference,there may be a language that will need its resolution for MT from English to that language. This argument is analogous in form, though not in conclusion, to Katz and Fodor's claim (3) that any fact whatsoever about the world could be the basis of a linguistic ambiguity. The opposingview, within AI, is that levels of knowledge and processing may suffice for MT that would not constitute "langu age understanding" in the sensenormally used in AI. On this view, MT is not an appropriate reali zation for a language-understanding program, as compared with dialogue (see Discourse understanding), question answering (qv), or the expectation of commands in a real environment. The difference of opinion is precisely the commonsenseone of whether one needsto understand something in order to translate it, a question that turns on what precisely, if it can be made precise,one means by "understand." Roleof the Term "ComputationalLinguistics" The last issue is made more complex by the related problem of what computational linguistics (CL) (qv) is, and what its relationship is to AI. CL effectively started with MT for, as noted
MACHINE TRANSLATION
above,MT was the first large-scale,widely funded, enterprise in langu ageand machines, largely becauseof its strategic importance: the prize it offered of being able to read Prauda daily without having to read Russian. There can be no doubt that MT is within CL even if that subject has expanded to other areas such as concordancesand spelling correctors and has focused at times exclusively on parts of the translation process,such as parsing from English to a representation or generating from a representation to, say, French without discussing the fact that putting the two together gave one MT. Subterfuges like that were necessary both for funding and intellectual support during the times (discussedbelow) when MT became known as a disreputable activity. However, many areas of CL do not involve any kind of processingof "world knowledge" of the kind that has become the distinctive feature of AI for many observers:A clear case would be the theory and use of Augmented transition networks (qt) (4). This is why the taxonomy problem is more difficult than just the status of MT. These were certainly CL, not knowledge-basedsystems in any sense(unless one adopts the linguistic trick of speaking of "knowledge of language," in which case all CL becomesAI, simply by definition), and yet were widely reported and discussedin AI journals and conferences.The point of this section is that it is not only MT but many areas of CL that suffer from problems of classification with respectto the term AI. It would be quite possibleto argue that MT is AI, even if it does not require knowledge processirg, on the ground that many areas of CL to which that applies have traditionally been consideredAI. However, the normal defenseof the claim that in its general form as a model of a human skill MT is within AI has been based on the need for knowledge representation and processing to carry out the task. MT as an lnitial Sourceof Al Workers,Problems,and Theories MT was one of the two strands of research that fed workers, and unsolved problems, directly into the stream that has become AI work on natural langUage.The other, of course,was mechanical logic, or theorem proving (qv) by machine. The notion that logic (qv) provides an appropriate structure for whatever information language contains in a loose, less explicit form predates Aristotle. In his own work logic appears as a language formal enough to expressdeduction but is still recognizable as a natural language. By the time of Leibnitz (L646-I7L6), the goal had clarified: Logic was the "universal langua ge" that could function as a natural language, even for communication, but was also wholly formal and separatefrom any existing language. Moreover, the link between that formal, artificial language and mechanical computation had become explicit. Only the technology was missing. With the advent of a formal syntax for logic with Russell and Whitehead (5) and then a formal semantics (6), the wrangle about the applicability of advancesin logic to the problem of natural language continued, from Tarski's denial that his semantics for logic had consequencesfor natural language to Reichenbach's(7) and others' detailed efforts to give representations of phenomena like belief, cause,and time within standard, or suitably extended, logics. The distinctive, and distinguished, work in computational logic had its high point in the work of Robinson(8), who set out a format for theorems and a procedure of proof that was one of the few clear advancesin the field of mechanical theorem prov-
565
ing. Until relatively recently, though, there were few or no connectionsbetween the AI tradition of mechanical theorem proving and work on the understanding of natural languoge, although, interestingly enough, the theorem proving and MT streams of input to AI correspond almost exactly, 4s fat as individuals are concerned,to the two opposedviews set out above on the relationship of MT as a task to the necessityof a knowledge representation. There are three points given below in which the logic-representation and theorem-proving traditions impinged on language understanding and, on one view of its nature, or MT. 1. The indirect influence of Montaguegrammar on AI naturallanguage work has been something like it has been on generative linguistics, particularly the pressure to provide an explicit semantics of concepts like quantification rather than leaving them to ad-hoc processes.Given the existing influence of the predicate calculus in AI, this additional influence might have been thought egregious, but it has been real. 2. There has been a revival of interest in an explicit logical semantics for belief. Work on dialogue and models of diaIogue partners (also needed for MT on the broad view adoptedearlier) seemedto require both this and speechact theory; the logic of belief (seeBelief systems)seemedreadymade and available to provide subtheories that could be incorporated into AI work. In fact, of course,the demandsof processesmade any such importation impossible, but the influence was real. 3. The programming language PROLOG, whoseorigin was as a representation of predicate logic together with a proof algorithm based on the resolution principle, has had two effectson AI and natural-language processing.First, it provided (as a universal programming language) a method of bringing proofs of truths in a knowledge structure into direct relationship with the parsing (qv) of natural language. Second,and as an expansion of the last point, it provided a natural and perspicuous form for the expression of grammar provided they could be contained within context-free gTammars. This last clause was vital, and its extent is well-known to be a material point of contention within linguistics. Colmerauer had argued for PROLOG as a language for grammars-long before the recent revival of interest in the adequacy(9) of such context-free grammars-on the grounds that in a declarative language like PROLOG, to provide the grammatical rules was also to provide the parsing mechanism, i.e., the structure of PROLOG itself. The history of PROLOG provides a magnificent irony in terms of the real opposition posedhere between logic-based and MT-based influences on early AI: Although PROLOG is now the standard bearer for the logic-basedapproach, it was for MT (with the Montreal-based TAUM system) that Colmerauer originally devised the language. History of MT in its Relationshipto Al There is no spacehere for any kind of adequatehistory of MT. What will be mentioned is only those aspectsof MT that can be brought into relationship with AI. The aim here is to show that the intellectual relationship of MT to AI is somewhat closerthan might have been thought, certainly closerintellectually than the provision of some research works for retrain-
566
MACHINETRANSLATION
ing and a budget of unsolved problems about the nature of language itself. Those who require surveys ofthe field should consult Refs. 10 and 11 (including the bibliographies). MT was always the intellectually serious part of CL and was always defined by a concrete task, looming and oppressive, rather than by theories or schools. Its history is very complex and quite different from the popularly accepted versions. Those indeed embrace both the views (of some newspapers) that MT has been solved and (of some theoreticians) that it is impossible. It is a unique distinction of MT that some people appeared at times to hold both positions. The Golden Age of MT was from Warren Weaver's memorandum (12) to the 1966 memorandum (1), which caused the end of much public funding in the West. In that time a vast amount of money was spent by the governments of the United States, U.K., France, and USSR, and it is widely believed that nothing came of it. The work, it is said, began with no theories, produced none in the course of its development, and its final failure was powerful, if unwilling, evidence supporting Chomsky's position (13) that there could be no serious CL work until a proper theory had been established, by which he meant transformational grammar (qv). This cartoon-style history is in many ways false to the facts. First, the 1966 memorandum did not make any claims about the possibility of high quality MT. What it said was that^given the 1966 costs of human input of text to computers and of the revision that contemporary MT required, the cost per word of MT compared badly with human translation. The falling cost of computation and the advent of optical character readers have totally outdated that judgment. Second,as a glance at the proceedingsofa meeting like the 1961 Teddington Conference (14) will show, a great deal of high-quality theoretical work on the nature of natural language went on under the aegis of MT, much of which has been forgotten or reinvented but not refuted. Third, although it was certainly the case that most U.S. support of MT from public funds was withdrawn as a result of Ref. 1, much was not. Moreover, other substantial chunks of the work went underground, seeking private funding or nonU.S. public funding, and only emerged into public view again in the seventies. Only the U.K. mindlessly cut off support, largely becausethe United States did so, though, ofcourse, the reality was far more complex. France continued its support of the Grenoble project. And Canada, under growing pressuresof bilingualism, began public support of MT after 1966' Part of the U.S. government MT program simply continued from sheer inertia. The Oak Ridge and Federal Tlanslation Division at Dayton, Ohio, continued to produce large volumes of Russian-English technical translation daily, and the Dayton installation, at least, was regularly upgraded from copies of the original programs undergoing development by private companies (see below). One of the major shifts in the perception of MT-at least among those who knew all this was still going on-was that the large daily outputs ofinstallations like Dayton continued to get large numbers of satisfied consumers' e.g., scientists reading raw English versions of Russian texts in their own field as a way ofkeeping abreast oftheir subjeet.Tlanslations obtained more rapidly and cheaply than by any other method (if they received a version of Ref. L.) should simply have confused or misled the readers. This work underwent a number of stringent evaluations and clearly passedtests basedon usabilitv (15). But this was not the case, and the fact should come as no
surprise to any reader well disposedto the original principle of the role of knowledge in language understanding; the scientists understoodbad-to-mediocretranslations of Russian physics texts because they knew physics. Problems of ambiguity and even incomprehensibility were simply pushed forward onto the end users who could, in general, solve them, while functioning as knowledge- or semantics-drivenparsers. A crucial factor here was that there was a far greater and informed demand for the product of mediocre MT systems than had been envisaged by those who wrote and read the 1966 document. Fourth, the early MT systems that were criticized (1) have reemerged on the market after years of sustained effort, having developed,in the meantime, a number of features of organization that might reasonably be called AI features. Again, this should be no surprise; if the core of AI is heuristic programmirg, under pressures of time the MT entrepreneurs have developedthe skills of heuristic programming (flexibility, modularity, and robustness)that are no field's own proprietary methods. Was ThereAny TheoreticalBasisto EarlyMT? In spite of the above considerations, there were intellectual elements in the MT crash of the 1960s.The work had begun without the aid of linguistic theories-the ruling structuralism neither offered nor claimed to offer much to computational processing,except through the work of Harris (16), and it was hopedthat after each stage of linguistic analysis, in the classical sense of morphology, syntax, etc., a theory could be constructed to deal with the next stage. This hope proved quite false. Syntax-analysis programs were indeed built lof which the best known was the Harvard analyzer of Oettinger (17)1,but they yielded enormous numbers of valid parsings for quite simple sentences,and no principled methods were derived for deciding among them. The point here is much like that of the example given above, but it can be put more simply. A sentence could hardly be simpler, or clearer, BS to its natural reading than He droue down the road in a car where there are possibledependenciesof in a car o'nroad or droue,corresponding to the interpretations of a person in a vehicle on a road that is situated inside a car or the normal one of driving in a car on a road. The secondis the natural one becausethere are not roads in cars. But there are rivers in BrazII and so the natural structure for He canoeddown a riuer in Brazil would have the opposite dependency.The latter example is important in showing that one is not dealing here with what some (18) have called principles of association or attachment. The matter is a function of knowledge of how things are, no matter how one choosesto express it. The failures to provide an appropriate semantic theory within MT were more widespread than the above.There were above all the widely publicized issues of word-senseambiguity and pronoun reference (the former giving rise to all the wellknown joke translations supposedly produced by MT programs), and, less publicized, but more structurally important, tn" problem of pronoun ambiguity and its role in the attachment of prepositional Phrases. It was easy to show, €.8., that a preposition like out of in English can be translated into French in many different ways (as-d.e,horsd,e,d,ans,po,r,etc.) and that the choiceis not simply a function of the verb used but of the semantics of the object. Without a "solution" to this problem, the attachment or depen-
MACHINETRANSTATION 567 dency of the phrase containing out of cannot be determined, as it could not in the car and canoe examples above. It was clear to everyone that semantics was in some sense required, but the problem was in what sense.The semanticsof markers (3) was becoming known in linguistics, but it rapidly becameclear (more rapidly, perhaps, than in linguistics itself) that that technique had little to offer. It is churlish, these days, to offer further demonstrations of that inadequacy-so much has been done in so many fields-but the point is perhaps worth making once more from the point of view of MT. The following is not a sentence,just a (real and attested) complex noun phrase: Analyse d'une methode dynamique specifique d'etablissementde balance materiel d'une installation de retraitementde combustionnucleairepar simulation.It cannot be given a syntactic structure, let alone translated, without a decision on the dependenceof par simulation. There are a number of candidate nouns analyse, etablissement,even bal' ance, installation, etc., of which the first two are prime candidates and the first the correct one. The relevance of the example is seen if one considerswhat it would be like to attach semantic markers to the dictionary entries of the items in the noun phrase in order to settle the issue on the basis of any kind of relevanceoverlapiintersection determined by the presenceof the markers. In the absenceof any highly detailed, technical vocabulary of markers (and that exclusion may be significant for later considerations of knowledge and expertise), the only natural marker is PROCESS. But, of course, that applies equally well to analyse, etablissement,and simulation and hence renders its application vacuous. That is clearly not a general panacea for the established semantic problems. One could summaruzethe discussion of this section by saying that it was nevertheless a unique and beneficial experience in highlighting and exposirg, in enormous quantity and detail, the real on-the-ground in-the-text problems of computer language processing.This is always important to stress,since so much of the classicwork in natural-langUage processing in AI has been on toy problems and toy, unrealistic, English sentences.Real sentences,or even noun phrases,like the French one above,often comeas a shock when explicitly considered, even to those who read and write thousands of them a day. One effect on AI of the problems mentioned above (the extent of word-senseand preposition ambiguity of the uselessness of the output of early syntactic analyzers)was a powerful motive in the setting up of AI systemsof language understanding based not on syntactic theory (of Chomsky's or any other type) but on semantic codings more complex than those of Fodor and Katz and designed to settle exactly the budget of problems laid about above.These approacheswithin AI can be grouped under a category such as "semantics-basedparsing" and some of them (such as the work of Schank and Wilks (21&22), to be discussedbelow) did actually produce smallscale MT output.
ment for the impossibility of high quality MT. He presented the following example, one which presents no problems of understanding (at least not to any speaker of American English who takes pen to mean not only the writing instrument but also what the British would call a playpen: Little Johnny was unhaPPY. He had lost his box. Suddenly he found it. The box was in the Pen. He was happy again.
Bar-Hillel's point was that any translation of the above (given that a target language will be unlikely to usepen with just the ambiguity noted above, and so the ambiguity must be resolved for MT) requires that the translator, machine, or human correctly identifies pen with the container for children, as any casual reader does. To do that, Bar-Hillel argued, requires knowledge of the world and in particular the relative sizesofstereotypical boxes and pens (both senses).That is real knowledge, but also vague, since one could not specify the size of a typical box, although sure that it would not go into a writing pen. But, argued BarHillel, a machine could not possess such knowledge in coded form, and hence MT is impossible; QED. The interesting point to note is that the first premise is exactly what has been characterized as the standard AI position. The difference is in the conclusion: Bar-Hillel assumes such knowledge cannot be coded; the AI Researcher assumes it can and sets about the task. Both accept its necessity. Bar-Hillel extended his argument to physics, using examples containing the word force, precisely denying, therefore, the possibility of a computational representation of physics, whether classical or naive (seePhysics, naive). It is interesting to note, too, that the knowledge required for the pen example is not the knowledge of standard expert-system or AI examples, i.e., simple Boolean functions of predicate expressions. The knowledge required for Bar-Hillel's MT example is vague, to say the least. The point can be seen by considering a computational solution to the pen example proposedby Enea: He proposedthat all objects,from molecules to galaxies, be assigned a size number in the range 1-10 (the range chosen not being the issue). Then, if boxes were 5, playpens would be 6, hopefully, and writing pens 4. Thus, the inclusions could be read off from a simple rule and the example disambiguated. It is easier to see that such a numerical "solution" to a problem of vague knowledge is wrong than it is to say precisely why. It is partly what Wittgenstein meant when he said (20) that "stand roughly there" is a precise command, though to have given a range within which to stand would not have expressedwhat his command did. No solution is available to this problem [and certainly not from"fuzzy logic" which falls precisely into the trap Wittgenstein setl, and it is only faith and optimism that carArgumentin MT Roleof "Knowledge-Based" ries AI researchers past such examples at the moment. Many of those in AI work faced with the sorts of problems in Bar-Hillel also made clear that the problem posed by his the French nuclear-combustion example of the last section example was not to be solved by resorting to marker semanwould say that the matter was to be settled by understanding tics; he changed the example so that an inkwell was lost, but the text, i.e., having a knowledge structure available for the the rest remained the same. The invariance of the point of the world of nuclear engineering. example under such a change, he said, would refute any atThis point was first put clearly, not in AI, but in MT discus- tempt to assign markers like CHILD to Johnny and (play)pen sions.It was in fact a crucial premise of Bar-Hillel's (19) argu- and WRITING to (writing)pen and,inkwell, in an attempt to
568
MACHINE TRANSLATION
select (play)penbecauseonly it shared a marker with Johnny. In the "inkwell version" there are recurrencesof both markers, and so deadlock results, even though Bar-Hillel claimed the inkwell should be locatedin the (play)penby the utterance
hardly ever touched, and only under extreme pressure from errors). In this way the system grows rapidly, improving strikingly as it does so. From the evidence available and from the known impossibility of editing the central routines of the system (now over 20 The inkwell was in the pen. years old), one may speculatethat those routines do very little For once, a theorist may have (unintentionally) assisted, to achieve the final translation. The work is nearly all doneby rather than refuted, Fodor and Katz's (3) marker techniques the long diction ary idioms and the routines that apply them since it can be argued that the "inkwell-pen" sentence is for they express layer upon layer of actual errors made by the highly anomalous and does not resolve to (play)pen in the system in its initial functioning. The skill lies in minimizing straightforward way Bar-Hillel (19) seems to have thought. the degree to which these patches interfere with one another The general moral here, though, is the same as in the PRO- and cancel out each other's benefits. Studies of SYSTRAN erCESS marker's application to the French example above: ror feedback (14) have shown that there is a (monotonically There is no hope of a solution to the problems of MT from any increasing) effect of degradation on output from editing, one straightforward use of semantic markers or primitives (qv). that is at a lower level than the improvements but rises at a The only claims that have been made for the adequacyof prim- slightly faster rate. It is that, in the end, which would put a itives for MT have been in connectionwith more distinctive AI limit on the ability of a SYSTRAN system to improve indefitechniques (2L,22). nitely by editing. All this has something of the feel of a natural process,with built-in limits to survival, like an organic system. A more apt EarfyMT Resurrected:Has lt BecomeAl? metaphor is probably that of tree growth, of expanding bands Here is a last outlook at the later fortunes of the style of MT of life about a center (the core routines) that may be effectively that crashed in 1966. It was referred to as going underground dead. at that time. The surprise was its resurrection in the 1970s A number of general points follow. First, the defectsof SYSand particularly the successof the SYSTRAN system based TRAN have a certain paradoxical appropriateness to features on the original generation of work at Georgetown Univerof real text, ones that are often ignored by conventional, more sity (10). theoretically motivated approaches. Consider the following The SYSTRAN system is essentially the core of the George- "sentence": town Russian-English system adapted over several years to a number of other language couples.Very little is available de- First, the argument about Ph.D. completion rates hauing bescribing SYSTRAN, and much of what follows is informed come embroiled in the parallel and rather rnore sophisticated speculation.What is not speculation is its commercial success debatesabout the educational ualue of the traditional Ph.D. by and the fact that it has passedso many tests by disinterested thesis as opposedto other typesof Ph.D. and alternatiue typesof postgraduate study, and about its functional ualue as an acaobservers. SYSTRAN appearsto be a multipass system (up to L6, arch- demic apprenticeshipat a time when employmentprospectsfor ing for entities like noun phrases, subjects, etc.) based on a young academicshaue been uery much diminished. very large dictionary (350,000Russian word stems) and semiAt first sight this seemsjust a very long sentenceofjournalsentences(350,000for Russian-English). Its rapid adaptation large amounts core to suggest that of ese, whose sense is perfectly clear to anyone who takes an is said to other couples program developedfor Russian is carried over, which may ac- interest in such matters. But that is not so, it is in fact an count for the intractability of many of its errors and for the anacoluthon, a would-be sentence whose writer has lost the fact that the core program is not of much use in achieving the thread, and so is, by default, just a very long noun phrase. Now it is clear that any system that requires a successful successthat SYSTRAN undoubtedly has. SYSTRAN seems to possesslittle or no semantics and no syntactic parsing of such a sentence in order to achieve a intersentential capacity and producesnothing analogousto a translation will not produce one. SYSTRAN, of course, will: complete syntactic analysis. The dictionary of idioms, or semi- Neither the anacoluthon nor the length and complexity are barriers to its methods. It is indeed arguable that, provided its sentences,contains very long patterns. In the French-English product is at least acceptable, that feature is a definite addictiondty, €.9., vantage. Second, one could suggest that a SYSTRAN has in fact SEE ALSO FOLLOWING ABSTRACT developedcertain AI features without being the explicit intention of its programmers. One might cite the robustnessnoted above(which is a weak form of what is describedas the domain ABSTRA.T ^- #*ING ttr ^"t" t-.t":.. of flexible parsing); the skill of the programming of SYSare separate entries even though there is clearly a generative TRAN, particularly the efficiency of the pattern matching (qt) relationship between them, and SYSTRAN doesalso appearto and the modularity of the parts of the program, enable it to possessthe capacity to insert variables (expressed,usually, as translate a whole book in a few minutes. One could mention, microprograms in macro functions) within dictionary pat- too, the fact that there is no distinction between linguistic and programming labor (those who program, or rather, tend, SYSterns. The heart of SYSTRAN's method is the use of feedback TRAN represent both), and routines are written directly into from corrections.When the system is bought, a service is also dictionary entries. Above aII, there is the treatment of linguisbought that feedsall the errors found by the user back into the tic phenomena as problems to be solved. It is, of course, an old progaffi, now effectively out of condiction ary as semisentence patterns (the core program is
MACHINETRANSLATION 569 theoretical basis of second-generationMT as noted above-the adequacy of a family of phrase-structure grammars for MT and natural-langu ageprocessinggenerally-is identical with the much more recent developmentsin AI and computational linguistics in which a resurrected form of phrase-structure grammar (9) has seizedthe center stage from more semantic and knowledge-based methods in natural-language processing. Later work following in this tradition of using more perspicuousand context-free syntax rules for analysis in MT included Melby's work on "junction grammar" at Brigham Young University (27) and Slocum's METAL system at Austin, Texas (28). A later addition was the EUROTRA system (26), under developmentfor the European Community in Luxembourg since L982. This attempted initially to blend a GETA-like syntax with some of the insights from Al-based natural-Ianguage understanding, at its semantic levels, of the sort describedin the next section. However, it now seemsthat this has been abandoned and that the system to be implemented will be some variant of definite-clause grammar (qv) (33), which is to say, in MT-historical, rather than Al-historical, terms, staying firmly within second-generationtechniques even though those now happen (as noted above) to be fashionable again in linguistics and areas of AI. Perhaps the surviving distinctive feature of EUROTRA among MT systemsis the drive for a well-defined perspicuoussoftware. This has almost always been lacking in the past: SYSTRAN being perhaps the extreme example of the opposite tendency, even though it is arguably the best perfarming MT system. It is hoped that EUROTRA does produce reasonable output within the next few years, or it will tend to establish an opposition in MT between software design and actual performance. A principal and misleading feature of the "generation" analogy,'when applied to MT, is that it suggestssuccessive time segments into which the different methods fall. As already seen,that can be highly misleading; the SYSTRAN system, for example, is a surviving form of first-generation system, existing alongside second, and later, generational developments. Evolutionary phyla would be a much better metaphor here becauseearlier systems, like sharks, survive From First Generationto Secondand On to Al-BasedMT perfectly well alongside later developments, like fish and Although this entry is in no way a complete history of MT, whales. some mention must be made of the more linguistically based Along with early MT work and the more linguistically soapproachesto the problem that succeededthe so-calledfirst- phisticated work that succeededit in France and Canada, generation approacheslike the Georgetown system (10) from there grew another tradition. It was closer in many ways to whose code SYSTRAN and other commercial systems were the preoccupationsof AI, at least as those were set out earlier developed. in contrast to conventional computational linguistics. That is, In the wake of the report of Ref. 1, attempts were made to MT has emphasison the role of meaning and knowledge strucprogram systemswith rules closer to systemsof contemporary tures. It is not important whether the use of such methods in syntactic theory: these included GETA (24), a system at Gre- MT work is called another generation. noble, France, attributed to the late B. Vauquois, and basedon The early roots of this movement were in Britain, with the a version of valency theory; and TAUM (25), a descendantof work of Masterman (29), and the Cambridge Language Rethat system,&t Montreal, Canada,by Kitteredge and co-work- search Unit, though similar contemporary developmentscan ers. Both these systems contained tree-structure representa- be traced in the USSR (30). The work was based on the astions that allowed complex structures to be attached to nodes sumption that an interlingua could be definedfor MT: a formal and, in that sense,were richer than those then available in language independent of the structure of any particular lansyntactic theory in the Chomskyan tradition. guage and adequate for the intermediate stage of coding beThis type of work, begun in the late 60s, is often referred to tween source and target language. This language was asas second-generationMT, but one must be very careful with sumed to be highly semantic, content-directed, and not that phrase; as with Nth-generation computing (seeComputer identical with any interlingua of the sort that formal logic was systems)its role is essentially rhetorical rather than descrip- traditionally taken as providing (though traditional theorists, tive and is used to claim novelty for the product of the speaker. like Leibnitz, would admit no distinction betweenthose two, of Such usage is full of historical irony, in that, €.g., the broad course).
trol. But there, too, is a feature of what will soonbe AI, if it is not now [and certainly if Michie's view (23) of the future role of AI is correctl: that is, the servicing, maintaining, and rendering comprehensibleand usable on a continuing basis of very old, large programs (kluges) of no perspicuous internal structure. The number of these will grow like a natural genus in the years to come, as large institutions are unable or unwilling to abandon the enormous investments in software they have made. The implicit challenge offered by SYSTRAN, and systems like it, is to both theoretical linguistics and AI. For Chomskyan linguistics (qt) it raises the practical question of just how much of real language performance might be covered by a diction ary of semisentencesthat approachesthe old possibility of a sentence dictionary, finite and nongenerative. It is no longer a sufficient answer to say it does not matter how closely such performance approacheshuman performance because no consequencescan possibly follow for cognitive science (qv). More generally, SYSTRAN-like systems remain a benchmark and an implicit challenge to AI and theoretical linguistics because the growth and success seem both nontheoretical, and, in someway, organic. AII theoreticians retain someform of belief in the existenceof a simplifying, reductive theory underneath the complexity of human language behavior. The persistent challenge of SYSTRAN and commercial systems like it is to weaken the support for the position that real performance will only be achieved with the aid of such theories. The above digression has in no way been intended as a survey of the state of the art in MT. On the contrary, there are or have been at least three major MT systems with far more interesting theoretical structure and assumptions than SYSTRAN TGETA (2+7, TAUM (25), and EUROTRA (26)1.The relevance of SYSTRAN , dt least for the points made here, has lain precisely in its lack of theoretical interest and the problem its successtherefore posesto both theoretical linguistics and AI.
570
MACHINETRANSTATION
Development of this style of work becamesignificant in AI in the United States during the 70s, particularly with the work of Schank and his school and that of Wilks. Schank's MARGIE (21) system took as input small English sentences, translated them into a semantic-network-basedinterlingua for verbs called conceptual dependency(qv), massagedthose structures with inference rules, and gave output in German. In Wilks's system (22), called preferencesemantics,there was also an interlingua (based on 80 primitives in tree and network structures) between input in English and output in French, but there the emphasis was less on the nature of the representation than on the distinctive coherencealgorithms (preferences) for selecting the appropriate representation from among candidates. The reality and ubiquity of wordsense and structural ambiguity were a driving force behind that system. Both systems shared the assumption that traditional syntactic-based methods would not be able to solve that class of probleffis, and neither had separatesyntactic components;the work of a syntactic component was performed under a semantic description. Again, both used MT only as a testbed or application of more general claims about AI and natural-language processing. There were also other significant differences between the two systems from an MT point of view; in particular, in Schank's system the interlingua carries the whole weight of translation. In the MARGIE system the generator of German has accessonly to the conceptual dependencyrepresentation and not at all to the original English, nor to the range of possibletranslations for any given English word or structure. In Wilks's system the interlingua is used as a heuristic to make all semantic choice decisionsbut only between sets of alternatives available from a bilingual dictionary. So, €.9., given a translation of hammer into the interlingua, Schank's system would have to decide between the German for hammer and for mallet while looking only at a semantic coding of the object itself (which might or might not contain the significant feature of its source substance). In the Wilks system the program would know that it was heading only toward output words that could have come from just one of those sources (henceit would not need accessto the English lexeme, only to the range of its associatedtargets). This type of approach has continued its peregrinations and is now most prominent in Japan, where a semantics-based approach may be most suitable for MT taking Japaneseas its source. Another strand of MT work done in close associationwith AI has beenthat of Martin Kay at XEROX-PARC (31) (he was a pupil of Margaret Mastermatt). He has emphasizedthe role of morphology, of machine-aidedtranslation, and of the structure of dictionaries on MT, but his most recent theme has been that of functional grammar, a formalism for syntax rules, usabte in both analysis and generation, and hence part of the extensions of the linguistically based movement in MT that began with GETA and TAUM, though now, once again a subject of independent interest in AI. Conclusion The role of MT in AI, and natural-language processingin general remains what it always was: a permanent reminder of the importance and difficulty of the tasks posed by real texts as opposedto the simplicities of the usual toy examples, and,
word-senseambiguity, vagueness,structural ambiguity, etc., are essential and not accidental features of natural language. A feature of MT that may be worth mentioning in conclusion is that it is perhaps the only area of AI that has any kind of effective evaluation procedures,i.e., those with any wellestablishedstatistical basis of significance.That is a matter of the distinctive procedures that have been tried and tested (e.g.,the Clozetest) and of the large samplesof output that MT has been capable of providing test. It is sometimes said that the evaluation of MT is now a better specifiedfield than MT itself (32).
BIBLIOGRAPHY in translationand 1. ALPAC,"Languages andmachines:computers linguistics,"National Academyof Sciences, National Research DC, 1966. CouncilPublication1416,Washington, 2. Y. Wilks,"MakingPreferences MoreActive,"in ArtificialIntelligence, volume 11, pp 1978; reprinted in AssociatiueNetworks (N. V. Findler, ed.),AcademicPress,NY, pp. 239-265. 3. J. Katz and J. Fodor, "The structure of a semantic theory" Language 39, 170-2L0 (1963). 4. W. Woods, "Transition network grammars for natural language analysis," Communicationsof the ACM 13(10),591-606 (1970). 5. B. Russell and A. N. Whitehead, Principia Mathematia, Cambridge University Press, Cambridge, UK, 1905. 6. A. Tarski, Logic, Semantics and Metamathematics,Oxford University Press, Oxford, UK, 1956. 7. H. Reichenbach,Elements of Symbolic Logic, Macmillan, New York, L947. 8. J. A. Robinson,"A machine-orientedlogic basedon the resolution principle," J. of the ACM 12, 23-44 (1969). 9. G. Gazdar, "English as a Context-FreeLanguagQ,"mimeo, Cognitive Studies Programffi€, School of Social Sciences,University of Sussex,Sussex,UK, April L979. 10. H. Bruderer, Handbook of Machine Translation and MachineAided Translation: Automatic Translation of Natural Languages and Multilingual Terminology Data Banks, North-Holland, Amsterdam, L977. 11. J. Slocum,"METAL: The LRC Machine Translation System," presented at the ISSCO Tutorial on Machine Translation, Lugano, Switzerland, 2-G April 1984. Also available as Working Paper LRC-84-2, Linguistics Research Center, Univ. Texas, Austin, April 1984. 12. W. Weaver,"Translation,"reprinted in F. Lockeand T. Booth (eds.). Machine Translation of Languages,Wiley, pp. 15-23, L955. 13. N. Chomsky, Syntactic Structares, Mouton, The Hague, 1957. L4. Teddington Conference, Proceedings of the First International Conference on Machine Translation, Teddington, Middlesex, 1961, (Her Majesty's Stationery Offrce,London, 1961). 15. Y. Wilks and Latsec Inc., "Comparative Translation Quality Analysis," Final Report of AFOSR Contract F 33657-77-C-0695, 1978. 16. Z.Harcis, "Co-occurrenceand Transformation in Linguistic Structure," Language 33, 383-340 (1957). L7. A. D. Oettinger, Automatic Language Translation, Harvard Univ. Pr., Cambridg", MA, 1960. 18. L. Frazier and J. D. Fodor, "The sausagemachine: a new two stage parser," Cognition 6, 29I-325 (1979). 19. Y. Bar-Hillel, "Somelinguistic obstaclesto machine translation," Proceedings of the Second International Congress on Cybernetics (Namur), pp. L97-207, 1960;Reprintedin Y. Bar-Hillel, Language and Information, Addison-Wesley,1964.
MANIPULATORS 20. L. Wittgenstei n, Philosophical Inuestigations,Blackwell, Oxford, uK, 1953. 2L. R. schank (ed.), conceptual Information Processing,North HolIand, Amsterdam, 1975. y. Wilks, "Good and Bad Arguments for Semantic Primitives," ZZ. Communicationand Cognition 10(3/4),181-221 (1977)' ZB. D. Mich ie, On Machine Intelligence,Edinburgh Univ. Press, 1974. 24. B. Vauquois, La trad.uctionautomatiquea Grenoble,Dunod, Paris, L975. 25. R. Kittredge, R. Ayott€, G. Stewart, J. Dansereau,G. Poulin, A. Ambrosi, and I. Bellert, "TAUM 73:',Projet de Traduction Automatique de I'Universite de Montreal, publications internes, Jan.
577
No. 2, Computer Science Laboratory, MIT, Cambridge, MA, 19751. M. Tarp SUNY at Buffalo
MANIPULATORS
A manipulator is defined as a mechanical device having the ability to skillfully manage or handle objects.The human arm might be consideredthe ultimate goal for a pattern of a manip1973,pp. l-262. ulator. Mechanical manipulators were initially developedin 26. M. King and s. Perschke, "EUROTRA," presentedat the ISSCO an attempt to develop a mechanical arm. The rationale for Tutorial in Machine Translation, Lugano, Switzerland, 2-6 April doing so is basedon many factors including the needto manip1984. ulate objectsin hazardous environments and the desire to re27. A. K. Melby, M. R. Smith, and J. Peterson, "ITS: Interactive lieve humans of tedious and tiresome jobs. Translation System," Proceedingsof the Eighth ICCL (COLING The basic elements of a robot (seeRobotics)consistof inter80), Tokyo, September30-October 4, 1980,pp' 424-429' connectedlinks arranged such that relative motion produced 28. J. Slocum, "Machine Translation: its HistorY, Current Status, and between successivelinks can allow the manipulation of an Future Prospects,"Proceedingsof COLING 84: the 10th Internaobject at the end link (seeFig. 1). A power sourceto drive the tional Conferenceon Computational Linguistics and the 22nd Anlinks is also necessary. In terms of the human arm the links Linguistics, nual Meeting of the Association for Computational joints with muscles and tendons supJuly 2-6, 1984, Stanford Univ., Calif., pp. 546-561. Also avail- are bones connected at power necess ary to provide relative motion at each plying the Center, Research able as Working Paper LRC-84-3, Linguistics joint. that a sophisticatedcontrol strathowever, It is obvious, 1984. May Austin, Texas, of University joint motions. The brain conthe coordinate to is needed egy Translation," Machine of Mechanism "Essential 29. M. Masterman, trols the human arm, whereas a computer controls the manippresentedat BCS 79, London, January 4-6, 1979. g0. I. Mel'cuk, "Grammatical Meanings in Interlinguas for Automatic ulator. The existenceof this controller is another key necessity Translation and the Concept of Grammatical Meaning," in V. J. since it allows the manipulator to be programmable, a key Rozencvejg(ed.), Machine Translation and Applied Linguistics, feature of a robot. Athenaion Verlug, Frankfurt am Main, FRG, L974,vol. I, pp. 95However, the human arm doesnot act blindly, i.e., it does 113. not act without the aid of an external sensing system (sight) g1. M. Kay, "The Proper Place of Men and Machines in Language and that is interconnected via the brain to the hand. Thus, one Translation," Technical report CSL-80-11. Xerox Palo Alto Re- looks at an object and proceedsto pick it up, avoiding obstacles searchCenter, California, Oct. 1980. in the processand adapting to an ever-changingenvironment. g2. Z. Pankowicz,"Commentary on ALPAC Report," Rome Air Force Current manipulators are not nearly as sophisticated as the Development Center, Rome, NY, 1967human arm; however, the analogy clearly definesthe key comBg. D. Arnold and R. Johnson, "Robust Processingin Machine Translation," Proceedingsof COLING 84: the 10th International Conferenceon Computational Linguistics and t}'.e22nd Annual Meeting of the Association for Computational Linguistics, July 2-6, 1984' Stanford Univ., Stanford, CA, pp. 472-475' Y. WIlxs New Mexico State University
MACHINES, SEIF-ORGANIZINC. See Robotics.
/ /
MACLISP. See LISP.
./ /--/
/ /
ManiPulator
MACSYMA A system for solving problems in symbolic mathematics, such as integration and algebraic manipulation, MACSYMA was designed in 1968 by J. Moses and co-workers and written in Lg71by Martin and Fateman at MIT (seeLambda calculus). [seew. A. Martin and R. J. Fateman, The MACSYMA System, Proceed,ingsof the SecondSymposium on Symbolic and Algebraic Manipulation, ACM SIGSAM, New York, pp. 59-75, Lg71;seealso J. Moses,A MACSYMA Primer, Mathlab Memo
/
Figure 1. Manipulator reaching for a block.
572
MANIPULATORS
ponents of a robot: a linkage arrangement, a drive system, external sensirg, and a controlling system. The remainder of this entry focuseson robot kinematics, including robot configuration, forward- and inverse-position solution, velocity and acceleration, trajectory planning, and robot calibration.
Workspace
/
RobotConfiguration
I
The kinematic structure of robots deals with the way in which the links and joints are arranged in order to allow dexterous motion at the end effector (seeRefs. 1-8). A robot is essentially a series of rigid links interconnected by constraining joints. For example, a hinge connecting two links constrains the former to rotate about the latter in a plane orthogonal to the axis of the pin, as shown in Figure 2. Adding additional links and joints increasesthe dexterity of the device. Figure 3 shows a two-link device that can only move in a plane but can accessany point within its work space. The concept of work spaceis very important since it dictates the reach of a robot. As shown in Figure 3, the robot's reach is the area within two concentric circles. If a third link is added to the device in Figur€ 3, the capability to operate in 3-D space can be added,as shown in Figurc 4. The work-spaceconfiguration is now spherical since the circle from Figure 3 is rotated through 360o.The key point is that it is possibleto position the end effector in three-dimensionality with three links and joints. So far only pin joints are considered.Another possibility is translational or sliding joints. An example of this type of joint is shown in Figure 5. Link a can only translate with respectto Iink b. Figure 6 shows a configuration that is essentially the same as the device in Figure 3; however, its work spaceis only limited by the stops within the slider joint. Figure 7 shows a 3-D device having one prismatic (1) and two revolute (2 and 3) joints. Its work spaceis a hollow cylinder. These simple examples show it is possible by means of a linkage system to move an end effector about in space.These mechanical linkages can be classified according to whether they form closedor open chains, how many links to the joints they have, and how dexterousthey are or how many degreesof freedom (DOFs) they provide. Typically, robots are configured as open-loopdevices, as shown in Figure 7. Closed-loopdevices, such as the device shown in Figure 8, also exist; however, they are less common. In general, an object in 3-D spacehas six global degreesof freedom: three Cartesian coordinates 3c,!, and z and three
I
0r\
\ \ \ \
/ \
/
t-. .a/
\./ \ - -
t - ' / \
"
-
-
e
Figure 3. Planar robot with indicated work ,pu.*. H, and H, are vectors attached to links 1 and 2. Rn is a vector from the origin of the coordinateframe on the ground to the end-effector.0t and 02give the angular orientation of R1 and R2 respectively. orientation coordinates that ffi&Y, for example, be defined as roll, pitch, and yaw. A manipulator that can position and orient an object in 3-D space is called a six-axis device and has six actuators to drive it. A planar-type manipulator only needs three actuators to drive its three axes, which in turn enables it to position in the rc-y plane and orient about the z axis. Up to this point a few simple robot configurations have been described and work space has been defined. The purpose of a robot is to position an object somewhere within its work space. Now consider how a mechanical linkage can position an object
-\\
,//." ,/' / /\
\
/
\
Spherical workspace
\
a b o u t h i sa x i s
\
/
./ \ - -
- - 1 /
\ Figure 2. Pin joint between two links.
Figure 4. Spherical robot configuration'
MANIPULATORS 573 and what relative motions are needed at the joints for the object to move along a specifiedpath. PositionSolution Once a manipulator configuration has been established, the next step is to relate the motions at eachjoint to the motion of the end effector.Much work has been done in this area and can be found in Refs.9-13. Given a 6-DOF robot, two position problems can be defined. First, what position can be reached,given values for the joint angles? Second,what should the joint angles be in order to reach a given position with a specifiedorientation? The first problem, direct kinematics, involves solving for the end-effector position given the joint variables. The second problem, inverse kinematics, involves solving for joint angles given the end-effectorposition and orientation. There are two approachesthat are very similar. The first approachutilizes a vector loop and is shown on a planar device to illustrate the idea. The secondis a transformation approach as applied to a 6-DOF device. The vector-loop approach (14) essentially attaches a vector of known magnitude (link length) to each link from joint to joint and then writes an equation describing the addition of these vectors to form a path from ground to the end effector, as shown in Figure 3. This equation can be written as
Figure 5. Slider-type joint.
Slider
(1)
Rn:Rt*Rz
Figure 6. Two-link manipulator having a pin and slider.
and separated into X and Y componentsas XB - Rlcos 01 * R2cos(0r+ 0z)
J o i n t3 motion
J o i n t2 motion
Ys - Rlsin 01 * R2sin(gr+ 0)
(2)
With 01 given it is an easy matter to compute the position of the end effector by direct substitution. If the X and Y positions are given, the joint angles are found by solving a nonlinear algebraic equation (nonlinear due to the sine and cosine terms). The solution for the simple two-link deviceis
ot:tan-'lH)+cos-'lW) oz:tan-'[ffi]
o1
where
Joint I motion
A: \ffiE
(3)
Note that two possiblesolutions exist. If a velocity solution is desired, Eq. 2 can be differentiated to yield Figure 7. Cylindrical 3-D robot.
*t
- Rlblsin 91 - RzG, + bz)srnoz (4)
Yt - Rlblcos 01 + Rz(il + bz)cos0z The solution here is linear and, in fact, it can be set up in terms of a Jacobian matrix:
,r, Input A
\ \
Input B
Figure 8. Closed-looptwo-input planar device.
dr- R2sin 0z -R2sin F,'l _ t-*rfin 2r1hr1 (b) Rlcos 01 * R2cos0z R2coslrJLlrl
lYuJ
L
Accelerationscan also be determined.One now has a position, velocity, and acceleration solution and can proceedto actually moving the robot. Notice that, unlike the position solution, there exists in Eq. 5 a linear relationship between the end-
574
MANIPULATORS
effector velocities (Cartesian coordinates)and the joint rates (joint coordinates). This simple example illustrates the main conceptsof direct and inverse kinematics and the velocity problem. As the robot becomes more complex and additional links and joints are added to provide more degfees of freedom at the end effector, the vector-loopapproachincreasesdramatically in complexity. Consider an alternative approach based on coordinate transformations. Homogeneoustransforms were applied to closed-loopkinematic chains by Denavit and Hartenberg (15). Roberts (16,17) used homogeneous transforms to represent objects in vision work, and Pieper (18) applied homogeneoustransforms to robots. A transform defines the relationship between two coordinate frames. Hence if these frames are attached to each of two Iinks where a joint exists, a relationship between the two coordinate frames can be defined that will indicate the relative position of one link with respect to the other. Figure g shows a general transform between two spatial coordinate frames. This relationship can be expressedas
where Tu defrnes the relationship between the end effector and the base frame for this six-link manipulator. In addition, the end-effector frame can be defined by giving its position and orientation via a Tw matrix. Then the following equation holds: numher
(g)
Tw: T*:t'$o,,,*,
which is similar to the vector equation given in Eq. 1. Figure 10 illustrates this relationship. The direct kinematics problem involves solving Eq. 9 by substituting joint angles in each A matrix and solving for the position and orientation of the end effector by multiplying the right side out. The inverse kinematic problem involves solving for the joint angles given Tw, a much more difficult problem. For more information see Refs. 22-27.
TrajectoryPlanning. In order for a robot to do a useful task, such as moving an object from point A to point B, it is necessary to actually plan the motion of the robot. This phase is known as traj*.toty planning and effectively plans the path of (6) : tAl the robot end effectors. Since the inputs to the robot arm are joint changes,it would be simplest to determine the values of trr. joint variables at points A and B and linearly change the where A represents a combined translation and rotation be- joini values until the motion is complete. This type of path tween the two systems. If the two coordinate frames are on planning is called joint interpolated motion and is the easiest successivelinks, A;; represents a transform from link i to link to implement. and Hartenberg 7. The matrix A canbe defined using Denavit Unfortunately, there exists a nonlinear relationship bejoint matrix constraint a into notation (1b,1g),or it can be split joint unglu motion and end effector motion. In other tween (20,2I)' Thus S; and a shape matrtx Tii as done by Uicker words, a linear change in joint variables will not in general (7) Aij QiTii producea linear motion of the end effector in cartesian coordinates or simply stated the robot will not necessarily follow a where 0; defines the type of joint and rij defrnesthe shape of joint interpolated joint and simple straiihi line between points using the link. Once these matrices are defined for each path, additional points motion. To predictably follow a desired link, they can be multiplied together to relate the end-effector coordinates to the base coordinates: (8) Tu : ArzAzsAsnAnsAsuAu,
vtlw)
x
.'Y:" 2
TM
/l
n Y
Coordinate transform
L2
"l-
A
,,/
Figure 9. Coordinate transform'
rrl&nrPulator Figure 10. Coordinate transform graph for a srx-tlnK Eqs. 7, 8, transforms-see coordinate are Au, . (T;, Tu, Aa,, , and 9).
MANIPULATORS
joint along the path must be defined and the comesponding in a joints incremented then are angles resolved. The robot addithe passes through effector end the that sequencesuch tional points along the path. This strategy is called Cartesian interpolated motion and is more complicated since additional points are involved. In addition, the degree to which the end lffector follows the path is a direct function of the number of additional points. For additional information on this topic of trajectory planning see Refs. 28-33. Summary The thrust of this article has been to provide an overview of manipulator mechanicsincluding: robot linkage arrangement, workspace limits, position solution stratery, and trajectory generation. It illustrates various robot configurations and showshow the joint positions can be determined given gripper location. Then a trajectory generator can be used to plan the robot motion history in order to follow a specifiedpath. Once it is determined what the joint angles for the manipulator should be, it is necessaryto actually move and control the manipulator. The subjects of drive systems, controllers, and sensorsdo not fall within the scopeof this entry; however, they are vital to proper robot motion. Additional related topics such as: interfacing robots with a task, end-effector design, and higher level programming of robotic devicesaddresssome of the issuesof actually applying manipulators. The proper integration of robots with their environment is a complex task; however, it is extremely important in order for manipulators to be productive. The overview of robot mechanics presented here provides the first step towards an understanding of manipulators and their usefulness.
BIBLIOGRAPHY
s7s
t 2 . D. L. Pieper and B. Roth, The Kinematics of Manipulators under
Computer Control, Proc. II. Int. Cong. Theor. Mach. Mechan. 2, 159-168 (1969). 1 3 . Y. Koren, Roboticsfor Engineers, McGraw-Hill, New York, 1985.
t4. A. S. Hall, Noteson Mechanism Analysis, BALT, West Lafayette, IN, 1981.
1 5 .J. Denavit and R. S. Hartenberg, "A kinematic notation for lowerpair mechanisms based on matrices," J. Appl. Mechan. 77, 21522r (L955). 1 6 . L. G. Roberts, Machine Perception of Three-Dimensional Solids, Report No. 315, Lincoln Laboratory, MIT, Cambridge, MA, 1963. 1 7 . L. G. Roberts, HomogeneousMatrix Representationand Manipulation of N-Dimensional Constructs, Document No. MS1045, Lincoln Laboratory, MIT, Cambridge, MA, 1965. 18. D. Pieper, The Kinematics of Manipulators Under Computer Control, Ph.D. Thesis, Stanford University, Stanford, CA, 1968. 19. J. Denavit, Description and Displacement Analysis of Mechanisms Based on (2 x 2) Dual Matrices, Ph.D. Thesis, Mechanical Engineering, Northwestern University, Evanston, IL, 1956. 20. J. J. Uicker, On the Dynamic Analysis of Spatial Linkages Using 4 x 4 Matrices, Ph.D. Dissertation, Northwestern University, Evanston, IL, 1965. 2L. J. J. Uicker, Jr., J. Denavit, and R. S. Hartenberg, "An iterative method for the displacement analysis of spatial mechanisms," Trans. ASME J. Appl. Mechan. 3l(Series E), 309-314 (1964). 22. C. H. Suh and C. W. Radcliffe, Kinematics and MechanismsDesign, Wiley, New York, 1978. 23. M. S. C. Yuan and R. Freudenstein,"Kinematic analysis of spatial mechanisms by means of screw coordinates (two parts)," Trans. ASME J. Eng.Ind.93(1), 61-73 (1971). 24. D. E. Orin, R. B. McGhee, M. Vukobratovic, and G. Hartoch, "Kinematic and kinetic analysis of open-chainlinkages utilizing newton-euler methods,"Math. Biosci. 43, 107-130 (1979). 25. A. K. Bejczy,Robot Arm Dynamics and Control, Technical Memo 33-669,Jet Propulsion Laboratory, Pasadena,CA,I974. 26. J. M. Hollerbach, "A recursive Lagrangian formulation of manipulator dynamics and a comparative study of dynamicsformulation complexity," IEEE Trans. Sys/. Man Cybernet.SMC-10(11),730736 (1980).
1. J. Duffy , Analysis of Mechanismsand Robot Manipulators, Wiley, New York, 1980. 2. J. F. Engelberger, Robotics in Practice, AMACOM, New York, 27. D. E. Whitney, "The mathematics of coordinatedcontrol of pros1980. thetic arms and manipulators," Trans. ASME J. Dynam. Sysf. Measur.CtrI. L22,303-309 Q972). 3. V. Milenkovic and B. Huang, Kinematics of Major Robot Linkages, Proceedings of the Thirteenth International Symposium on 28. R. A. Brooks, "Planning collision-free motion for pick-and-place Industrial Robofs,Chicago,IL, pp. 16-31to L6-47,1983. operations,"Int. J. Robot.Res. 2(4), 19-44 (1983). 4. R. P. Paul, Robot Manipulator: Mathematics, Programming and 29. T. Lozano-Perez,"Robot programmingl' Proc. IEEE 7l(7),821Control, MIT Press, Cambridge, MA, 1981. 841 (1983). 5. Y. Stepanenko and M. Vukobratovic, "Dynamics of articulated 30. J. Y. S. Luh and C. S. Lin, "Optimum path planning for mechaniopen-chain active mechanisms," Math. Biosci. 28, 137-170 cal manipulators," Trans. ASME J. Dyndrn. Sys/. Measur. CtrI. (1976). 102, L42-I51 (1981). 6. V. D. Scheinman, Design of a Computer Manipulator, Artificial 31. J. M. Brady (ed.),Robot Motion: Planning and Control, MIT Press, Intelligence Laboratory Memo AIM-92, Stanford University, Palo Cambridgu, MA, L982. Alto, CA, 1969. 32. R. E. Fikes, P. E. Hart, and N. J. Nilsson, "Learning and exe7. W. E. Snyder, Industrial Robots: Computer Interfacing and Concuting generalized robot plans," Artif. Intell. 3(4), 25L-288 trol, Prentice-Hall, Englewood Cliffs, NJ, 1985. (re7D. 8. J. C. Colsonand N. D. Perreira, Kinematic Arrangements Used in 33. K. S. Fu (ed.), Computer L5(L2),(1982) (special issue on robotics Industrial Robots, Proceedings of the 13th International Sympoand automation). sium on Industrial Robots,Chicago,IL, pp. 20-1 to 20-18, 1983. 9. C. S. G. Lee, "Robot arm kinematics, dynamics and control," IEEE R. Cpne Comp. l5(I2), 62-80 (1982). PurdueUniversity 10. K. S. Fu, R. C. Gonzalez,and C. S. G. Lee, Robots:Control, Sensiog, Vision, and Intelligence, McGraw-Hill, New York, 1987. 11. R. P. Paul, B. E. Shimano, and G. Mayer, "Kinematic control MAN-MACHINE INTERACTION.SeeHuman-computer interequations for simple manipulators," IEEE Trans. Sysf. Man Cyaction. bernet.SMC-I 1(6), 449-455 (1981).
576
MATCHING, TEMPTATE
MATCHING,TEMPTATE In its simplest form, template matching is an operation in which a stored fragment of an image (i.e., a template) is compared with all or part of an input image. The output of this processis an array that gives the degreeof match at eachpoint in the input image. Template matching is one of the oldest, simplest, and most widely applied techniquesof computervision; it has beenused in, e.g.,characterrecognition (qv),target detection,and industrial inspection. Over the years many variations on the basic technique have been developedto improve speedand/or reliability, and many instances of special-purposetemplate matching hardware have been built. BasicMathematics Two-dimensional functions can be used to represent the image f(x, y) and the template t(x, y). The principles of template matching are illustrated here with two-dimensional data. However, the techniques can readily be applied to n-dimensional data, for which two-dimensional images and one-dimensional signals are the most common cases.In template matching the template is superimposedon the image at a point (xo, yo), and a degree of match is computed. This computation is repeated at each point in the image to determine the location(s) where the match is best. The degreeof match can be measuredin many ways, Q.9., by summing the squares of the differences of the brightness values at correspondingimage and template points: d,(xo,yo) _ where the sum is taken over all points (r, /) in the template. A more commonly used measure of match is cross-correlation, R, which is computed by summing the products of corresponding image and template values over the fragment of image coveredby the template (L,2): R(xo, yo) _ The above two measures are equivalent if
i.e., if the local energ"yis uniform acrossthe image. The cross-correlationmeasureR is maximum when the portion of the image under r is (most nearly) identical to t. Therefore, the point in the image that best matchesthe template can be computedby shifting the template over the image, computing R at each point, and remembering the maximum value encounteredand its location. If the size of the image or template is sufficiently large, it may be more efficient to compute correlations in the Fourier domain. This is done by computing transforms of the image and template using a two-dimensional fast Fourier transform (FFT) algorithm, multiplying the transformed arrays together, and retransforming the product arcay back into the spatial domain (3). The resulting array contains the crosscorrelation value at each point in the image. The above correlation measure is sensitive to characteristics, such as aver ageimage brightness and contrast and size of template, that are not usually criterial. An object in bright sunlight, e.g., will yield a higher match score than the same object in shade.Various normalized measureshave been used
to ameliorate such effects.One of the most effective is normalized cross-correlation(2,4,5):
E(f x t) - E(f) * E(t) cor(x,y) _ sd(f) x sd(/) where E(f) and sd(f) are the mean and standard deviation of f over the area being matched. The normalized correlation function has the property that it doesnot vary with the size of the correlation area or with the local average intensity level and level of contrast. It is, however, affected by signal-to-noise ratio: Increasing noise increasesstandard deviation,which decreasesthe correlation measure. Extensions Many extensions to the basic template-matching approach have been proposed to speed up computation and to make matching more robust in the presenceof various types of noise and distortion as well as variations in image appearance. Robustness.Once amplitude variations have been removed by normalizing the correlation measure,the principal remaining source of mismatch is geometric distortion. The basic matching algorithm presumes that the object of interest will appear with the same size and orientation in all images so that a single template suffices.When the target object may change its size or orientation, (equivalently, when the camera distance and orientation is not fixed), extensions to the basic matching technique are called for. The simplest approach,albeit a computationally expensive one, is to have many templates depicting the object of interest under various viewing conditions.Alternatively, distortions can be appliedto a single template, generalizing the match algorithm to searchover parameters such as rotation and scale in addition to position. In casesof extreme geometric distortion or where an object is comprised of parts whose spatial relationships may vary (e.g.,a human face or body), it can be more effective to partition the template into subtemplates correspondingto significant features (e.g.,eyes,nose,mouth). The searchfor a match now proceedsin two phases: First find reasonable candidate matches for these parts and then check for a configuration of these matches that satisfies appropriate relational constraints (e.g., nose above mouth). Fischler demonstrated an efficient algorithm (complexity grows linearly with number of parts) for this two-level search based on dynamic programming (6). Widrow took this conceptto its logical limit, viewing a template as a continuously deformable rubber sheet. His search algorithm attempted to maximize degree of match as it minimized distortion of the template 0.Widrow's approachis a two-dimensional analog of well-known dynamic time-warping proceduresused in speechrecognition (8). Sometimes,a parametric model of geometric distortion may be available. For example, a camera model parameterized by location, orientation, focal length, etc., can be used to predict changesin appearanceof a 3-D object resulting from different imaging geometries.In such casestemplate matching can proceedby searchingover the model (e.g.,camera)parametersto obtain a predicted image that best fits a given template. This variant of template matching is appropriately known as parametric correspondence(9). Template matching can be performed on images and templates that have each been preprocessedto enhancefeatures of
MATCHING, TEMPLATE
interest (10). A simple example is to convolvethe image and template with a local operator that detects edges (see Edge detection) and then match the resulting edge arrays. Such preprocessingis appealing, in principle, becauseit transforms the problem into matching shape (i.e., object boundaries) rather than brightness values, which are generally more variable. In practice, the match can be dramatically degradedby slight changesin edge location due to noise or spatial-quantizatton effects. Several approachesfor overcoming this problem have been proposed.The simplest is merely to thicken the edgesprior to matching so that match scorefalls off more gradually. A better technique is to "chamfer" the edge image, producing an array of numbers giving the distance at each image point to the nearest edge (9). If the template is an array of edges,the simple sum-of-products correlation measure gives the total distance between the boundaries in image and template. (Of course, this measure must be minimtzed to find the best match.) This technique has been used in conjunction with parametric correspondenceto compareaerial images with networks of linear features (roads, rivers, etc.) in a ffiap, using a camera model to predict their image locations. A third approach to edgematching is to convolvethe image with a difference of Gaussians(DOG) operator (11) and then threshold to determine regions above and below zero.This yields a binary image with boundaries correspondingto zero crossingsof the convolution. Analogous processingis used to producea binary template, which is then matched against the image (L2).
577
matches. This coarse-to-fine matching process can be iterated with a progression of templates of increasing size and detail. Multiple templates can be used to find image features that may appear at an unknown scale and orientation. A more efficient way to locate such targets is to use a technique known as the generalizedHough transform (qv) (18,19,20).The image is first transformed into an edge array by convolving it with a local operator.Each edgeelement is then used as incremental evidence for all instances of the desired feature that are consistent with it. For example, if the target were a simple straight line, an edge element would provide incremental evidence for the whole family of straight lines passing through that image location at various orientations. Subsequentedge elements will also provide votes for whole families of lines, but only lines that are actually present in an image will receive multiple votes.An array of accumulators(e.g.,one per line) is used to compile these votes. In essence,the Hough technique considersall possiblefeature instances at once,rating each on how well it explains the data. Applications Template matching is one of the most versatile and effective computer-vision techniques yet developed.It is widely used for detecting the presenceof objects in an image whose appearance is known but may be subject to noise or distortion. An attractive feature is that detailed interpretation of the image (e.g.,segmentationinto uniform regions) is not required. The earliest applications of template matchi.g, dating back to the late 1950s,were in OCR machines to recognizeprinted characters and in aerial reconnaisanceto detect targets, landmarks, and change. In the 1960s templates correspondingto local 2-D image features (e.g.,edges,corners)were used in the feature-extraction (qv) stage of pattern-recognition and sceneanalysis systems. During this period template matching was also used to find corresponding points in stereo images, for range determination (see Stereo vision). In the 1920s,when industrial applications became popular, template matching was used to determine the location and orientation of parts on assembly lines and to compare a manufactured part (e.g., &r IC mask) against a known standard. Template matching was also used to inspect the results of assembty operations; questions such as where was the part placed, was it the right pad, and was the part damaged could all be answered on the basis of match location and score. In the 1980s template matching and variations thereon continue to be the technique of choice in a surprisingly large fraction of machine-visionapplications,particularly those involving images of 2-D subjects,where one knows a priori the appearanceof objectsor features. Even in more general cases, template matching can be a valuable first step in transforming image array into a symbolic description of the l*::eric
Speed. Template matchirg, especially with large images and templates,can be very computationally intensive.A number of approacheshave been pursued to achieve reasonable performance, involving both hardware and clever algorithms and combinations thereof. The simple computational structure of the coruelationalgorithm, involving just shifts, additions, multiplications, and comparisons,makes it amenable to efficient implementation in special-purposehardware (13). Data can be streamed through a pipelined processor.Moreover, becausecorrelation is computed independently at all points in an image, an aruay of processorscan be used, each processordealing with a different part of the image. Different templates can also be matched simultaneously using replicated hardware. Processing is particularly easy when the image and template are binary arrays (i.e., thresholdedimages);the sum-ofproducts measure reducesto a count of exclusive-or's.Indeed, a low-cost commercial system has been developed that can correlate a binary image with a large (e.g., G4 x 64) template at video-frame rates (14). Special-purposeboards have been developedfor other variations to correlation discussedabove, including calculation of FFTs and the DOG operator used in Nishihara's matcher. In certain applications optical processing provides an attractive alternative to digital computation of Fourier transforms (15). On the algorithmic side, hierarchical template-matching techniques have been developed that rapidly restrict the searchto promising areas of a large image (16,r7).In essence, BIBLIOGRAPHY the original image and template are first reduced in size by sampling or averaging. The reducedtemplate is then matched 1. R. O. DudaandP. E. Hart, PatternClassification and Scene Analagainst the reduced image to find places where the match is ysis,Wiley,New York, 1973. reasonably good. The original image and template are then 2. D. H. Ballardand C. M. Brown,ComputerVision.Prentice-Hall, matched, but only in the neighborhoodsof good preliminary Englewood Cliffs,NJ, pp. 68-70, L982.
578
MEANS-ENDS ANALYSIS
3. W. Pratt, Digital Image Processirg, Wiley-Interscience, New York, p. 288, 1978. 4. L. H. Quam, Computer Comparison of Pictures, Ph.D. Thesis, Computer Science Department, Stanford University, STAN-CS7I-219, Stanford,CA, 1971. 5. A. Rosenfeldand A. C. Kak, Digital Picture Processing,Academic, New York, p. 298, L976. 6. M. A. Fischler and R. A. Elshlager, "The representation and matching of pictorial structures," IEEE Trans. Comput. C-22, 6792 (January 1973). 7. B. Widrow, "The 'rubber mask' techniqu€," Patt. Recog. 5, L75-
considerableresearch,only someof which was part of the GPS effort. This entry describesmost of this research;it starts with a description of GPS, partly for historical reasons but also becauseit is quite easy to describethe others, given a technical description of GPS. Amazingly enough, the variant of means-endsanalysis used in GPS is still one of the more elaborate and subtle problem-solving methods reported in the literature (i.e., methods that have applicability in several different domains, unlike a method that can only be used, for example, in chess). Over the years research on GPS had at least three distinct zLt (r97U. goals: One was empirical explorations into problem solving 8. H. Sakoeand S. Chiba, "Dynamic programming algorithm optimiand generality. This was important in the earlier years bezation for spoken word recognition," IEEE Trans. Acoust. Speech causelittle was known about how to get a computer to behave Slg. Proc. ASSP-26, 43-49 (February 1978). intelligently. The final version of GPS, which is the culmina9. H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, and H. C. Wolf. tion of this research, is describedin Ref. 2. A secondgoal was Parametric Correspondence and Chamfer Matching: Two New the simulation of cognitive processesfor the purposeof underTechniques for Image Matchirg," Proceedingsof the Fifth IJCAI, standing the extent to which GPS can be used as a model of Cambridge,MA, pp. 659-663, L977. human problem solving. A good reference to this research is 10. Reference5, pp. 303-305. Ref. 3, which also contains several other models of human L1. D. Marr, Vision, W. H. Freeman, San Francisco,1982. problem solving. The remaining contribution of research on L2. H. K. Nishihara, "Practical real-time imaging stereo matcher", GPS is its problem-solving method; it is the only one described Opt. Eng. 23(5), 536-545 (September1984). in this entry. 13. R. W. Berger, VLSI Structures for Real-Time Image Convolution, Another variant of means-endsanalysis is used in FDS (4), Proceedings of the IEEE Conferenceon Systems and Cybernetics, which is described after GPS. This problem solver was deMarch 1985. signed for a certain class of theorem-proving problems. In the t4. R. W. Berger, A High SpeedProcessorfor Binary Images, Proceed- early seventiesthere was a significant researcheffort at Stanings of IEEE 1983CVPR,1983. ford Research Institute on the use of problem solving in ro15. J. W. Goodman,Introduction to Fourier Optics,McGraw-Hill, New botics. This work was based on STRIPS (5), a computer proYork, 1968. gram that used means-ends analysis. It and one of its 16. D. I. Barnea and H. F. Silverman, "A class of algorithms for fast successors,ABSTRIPS (6), is describedafter FDS. digital image registrations," IEEE Trans. Comput. C-21,179-186 The last variant of means-ends analysis in this entry is Qe72). MPS (7). This researchdiffers from the others in that it focuses L 7 . H. P. Moravec, Artificial Intelligence Laboratory Memo 339, on learning good strategies for solving problems. The stratePh.D. Dissertation, Stanford University, Stanford, CA, 1980. gies that are learned use mean-ends analysis and are suffi1 8 . Reference2, pp. 128-131. ciently powerful for MPS to solve difficult puzzles, such as 1 9 . R. O. Duda and P. E. Hart, "LJseof the Hough transformation to Rubik's cube. The entry closes with a discussion of how to detect lines and curves in pictures," CACM 15(1),11-15 (January choosegood differencesfor GPS. The differencesare problemL972). dependent parameters whose purpose is to guide the search for 20. D. H. Ballard, "Generalizing the Hough transform to detect arbi- a solution. (1981). trary shapes,"Patt. Recog. L3(2), LLL-L?2
J. M. TpNnNsAUMANDH. BenRow Computer Aided Systems Laboratory Schlumberger Palo Alto Research
MATHEMATICAL INDUCTION. See Automatic proglamming; Inductive inference; Theorem proving.
ANALYSIS MEANS-ENDS Means-endsanalysls is a term that is quite descriptive. In the context of problem solving it refers to the processof comparing what is given or known to what is desired and, on the basis of this comparison, selecting a "reasonable" thing to do next. This definition is deliberately informal and general becauseit is intended to capture the essential nature of a number of different, but similar, problem-solving methods. The use of means-endsanalysis in computer programs that solve problems dates back to L957 when it was first used in GPS (1). Since then mean-endsanalysis has been the topic of
cPs GPS is an acronym for general problem solver. This name stems from the fact that it was the first problem-solving program that separatedthe problem-dependentand the problemindependent parts of the system in a reasonably clean way. GPS was designed to solve state spaceproblems (8) in which there is an initial state, a set of goal states, and a set of operators. Each operator / is a partial function on states; dom(f) denotes its domain. A solution to a problem is a sequenceof operators that transforms the initial state into a goal state. Each intermediate state produced by one of these operators must be in the domain of the next operator in the sequence. To solve a problem, GPS creates a hierarchy of goals; the first goal is to transform the initial state into a goal state. Assuming that the initial state is not a goal state. GPS detects differencesbetween them and then attempts to reduce one of these differences.For this kind of goal, GPS selectsan operator that is relevant to reducing the difference and creates the goal of applying the operator. A separate goal is used for this becausethe initial state may not be in the domain of the operator that gives rise to a difference and the goal of reducing it.
MEANS.ENDSANALYSIS
This very brief description of how GPS works contains the three different kinds of goals that GPS uses: transform a state into a set of states, reduce a difference possessedby a state, and apply an operator to a state. The method for achieving a transform goal tests if the state is in the set of states. If not, the goal of reducing the largest difference between them is created, followed by the goal of transforming its result into the set of states. GPS requires the differences to be totally ordered. The method for achieving a reduce goal selectsa relevant operator and createsthe goal of applying it. GPS requires a table that indicates which operators are relevant to which differences.The method for achieving an apply goal tests if the state is in the domain of the operator. If not, the goal of reducing the largest difference between them is created, followed by the goal of applying the operator to the result of the reduce goal. To summatize, GPS uses three different kinds of goals, and for each there is a single method for achieving it. This is a slightly simplified picture of GPS. There was a selecttype of goal and other methods also. These were required for operators that mapped two states into a third state. GPS could handle such operatorseven though they are excluded by the state space paradigm; the details are in Ref. 2. This entry only describeshow GPS solved state spaceproblems. Information about differences is a problem-dependentparameter to GPS; its purpose is to make the search more efficient. Most problem-solving methods designedfor more than a single problem have such putu*eters; for example, n it such a parameter to the A* problem-solving method (8) (seeA* algorithm). GPS requires the following information about differences: the differences to be used; an ordering on these differences; and for each difference, the operators relevant to reducing it. Intuitively, the differences are just properties of states that are appropriate for the given problem. Some of these are more difficult to remove than others, and thus they are ordered according to their difficulty. GPS employs the strategy of removing differences in order of their difficulty, the most difficult first. Any operator will be relevant to removing some differences but not others. These intuitive concepts can be used inside of GPS becauseit is parameterized by the information about differences,which varies from problem to problem. A trace of GPS solving a simple problem will make the abovedescription more concrete.The problem is the three-disk Tower of Hanoi whose initial state has three disks of different diameters stacked on the first peg in ascending order; the other two pegsare empty. There is only one goal state in which all the disks are on the third peg. The operators move the top disk from one peg to another, provided that the disk being moved is not placed on a smaller disk. Figure 1 shows how GPS solves this problem. As always, the first goal is to transform the initial state s0 to the goal states G. Comparing the two, GPS detectsthat all of the disks are in the wrong positions. The secondgoal is to reduce the largest of these differences; d3 indicates that the position of disk 3 is incorrect, and GPS notes in goal 2 that it should be on
1. Transform s0 into G. 2. Reduced3 of s0 to Peg-3. 3. Apply disk-3 + Peg-3to s0. 4. Reduced2 of s0 to Peg-z5. Apply disk-2 + Peg-2to s0. 6. Reducedl of s0 to Peg-3. 7. Apply disk-l + Peg-3to s0. sI. - ((2 3) 0 (1)) disk-2 + Peg-2to s1. Apply 8. s2 - ((3) (2) (1)) 9. Apply disk-3 + Peg-3to s2. 10. Reduced1 of s2 to Peg-z. 11. Apply disk-l -' Peg-2to s2. s3 - ((3) (1 2) 0) L2. Apply disk-3 -+ peg-3to s3. s4 - (0 (1 2) (3)) 1 3 . Transform s4 into G. L4. Reduced2 of s4 to peg-3. 15. Apply disk-2 -+ peg-3 to s4. 16. Reducedl of s4 to Peg-z. 17. Apply disk-l + peg-1to s4. s5 - ((1) (2) (3)) 18. Apply disk-2 -+ peg-3to s5. s6-((1)0Q3)) 19. Transform s6 into G. 20. Reducedl of s6 to peg-3. 2L. Apply disk-l + peg-3to s6. s7-(00(1 23)) 22. Transform s7 into G. Success.
Figure l. A trace of GPS solving the three-diskTower of Hanoi problem. peg 3. Goal 3 is to apply the operator that movesdisk 3 to peg 3. Of course,this operator cannot be directly applied because s0 is not in its domain since the other two disks are not on peg 2. Therefore, goal 4 is created to reduce d2, the larger of these two differences. Since disk 2 cannot be moved in s0 (goal 5), disk 1 is moved to peg 3 (goals 6 and 7), which results in the new state s1. Disk2 is then moved in sl (goal 8) resulting in s2. Goal 9 is another attempt to move disk 3, but this cannot be done in s2, and GPS continues in a similar manner. The indentation in Figure 1 is important becauseit shows the hierarchical relationship among the goals. Although this relationship is obvious for the first seven goals, note that goal 9 is a subgoal of goal 3. This is importantbecause the operator of goal 3 is used for the operator of goal 9. Moving disk 3 in goal 12 causesgoal 9 to be solved, which finally causesgoal 3 to be solved, and goal 13 becomesthe secondtransform goal, which is a subgoal of goal 1. The whole processstopswith goal 22,which is trivially solvedbecauses7 is an elementof G. This causesgoals 19, 13, and 1 to be solvedbecausethey are supergoals of goal 22. The behavior depicted in Figure 1 is strongly dependenton the difference information that was used.The differenceordering is d3 > d2 > d1, and each difference indicates an incorrect position of a particular disk. An operator that moves a disk is relevant to reducing the difference that pertains to that disk. The difference information only contains difference types; in solving a problem, multiple instances or tokens of each difference may be encountered.For example, the difference (token) of goal 2 is not just that the position of disk 3 is wrong but also that its goal position is peg 3. In addition to the type of the difference, GPS uses its goal value to select a relevant operator. The version of GPS in Ref. 2 analyzed the specification of
5BO
MEANS.ENDS ANALYSIS
operators to find one that produced the goal value of a difference. This version exhibits the behavior shown in Figure 1. Note that Figure 1 shows GPS at its best since it always selects the right operator for the right reason. Normallyj GpS makes mistakes because its difference information is "weaker," and this gives rise to search. Although Figure 1 gives an intuitively appealing picture of GPS, it does not show how GPS relates to ottrer mechanical problem-solvingconcepts.Nilsson (g) noted that GpS is conceptually based on a form of AND/OR trees. A technical description of GPS basedon this idea follows. This descriptionis necessarily somewhat more complex than the one in Ref. 8. The original problem is divided into a number of subproblems of the form (s,D); s is the initial state of the subproblemandD is the set of desired states.Such a subproblemis trivial if s € D. GPS attempts to solve each nontrivial subproblem in the same way: for each operator that is relevant to reducing the largest difference between s and D, two new subproblemsare created:(s, dom(f)) and (f(r), D). A solution to the first of these results in a state r in the domain of f. The secondsubproblemis to transform the result of applying f to r into D. Acomplication is that eachsubproblemmay have multiple solutions,and thus there may be many subproblemsof the form (f(r), D), one for each different r producedby a solution of (s, dom(f)). This decompositionof a subproblem can be representedby the AND/OR tree in Figure 2. The root is the subproblem of transforming s into D. The operators that are relevant to reducing the largest difference between s and D are f and g and perhaps others. Each r; is in dom(/) becauseit is a result of solving (s, dom(f)); hence,the original subproblemcan be reduced to any subproblem (f(r), D). Since some of these may not have a solution, GPS must be prepared to considerall of them. Similarly, applying f may not lead to a solution, and it may be necessaryto consider other relevant operators like g. As usual, the little arc on the branchesof a nodeindicatesthat all of its subnodesneedto be solved;only one subnodeneedsto be solvedif the arc is missing. Since all of the terminal nodes are labeled with subproblems,this decompositionmethod can be applied recursively until trivial subproblemsare encountered. The example in Figure 3 clarifies this view of GPS. It does not have a physical interpretation, like the Tower of Hanoi, becausesuch problems are either complicatedor do not illus-
(.'D)
a p p l ys
(s,do'''(g))
(f(''),D) (f(",),D)
(e(t,),D) (g(rr),D)
Figure 2. The AND/OR tree for the subproblem(s, D). Each r; is a result of solving (s, dom(f)) and each t; is a result of solving (s, dom(g)).
(r,G)
apply f
(s,dom(f))
apply g
apply h
(s,dom(e))
(s,dom(h))
(g(r),dom(f))
(f(e(r)),G) (f(
(h(.),dom(f)) (f(g(h('))),dom(f))
apply g
(r(r(g(h(')))),c)
(h(r),dom(g))
(g(h(')),dom(f)) Figure 3. An exampleof GPSsearchviewedas an AND/ORtree. trate important features of GPS. The initial state is s, and G is the goal states,as indicated by the root of the tree in Figure 3. GPS attempts to apply the relevant operatorf to s, which is not in its domain. To solve this subproblem (s, dom(f)), GPS attempts to apply g and h to s since they are relevant to the largest difference between s and dom(/). This example assumes that all of the subproblems at the terminal nodes in Figure 3 are trivial except(f(g(s)), G). Thus, g canbe applied to s, and its result is in dom(f).This makes g(s) a result of solving (s, dom(f)), and the subproblem(f(S(s)), G) is created. However, (s, dom(f)) can also be solved by applying h to s. Since h(s) is not in dom(f), g is applied to it. This produces another result of solving (s, dom(f)) becauseg(h(s)) e dom(f), and causesthe subproblem (f(S(h(s))), G) to be created. To solve the latter, / is applied again, which producesa solution; that ir, /( fG(h(s)))) e G. This example illustrates how OR nodesgive rise to multiple subproblemresults. Using the notation in Figure 2, the results of subproblem(s, D) are recursively defined as follows: s is a result of (s, D) if s e D; and the results of subproblems(f(r;), D) and (g(t), D) arc also results of (s, D). This description of GPS depicts one use of the difference ordering-to select the largest difference of a subproblem. Usually a subproblem(s, D) will have several differencesbetween s and D. GPS only considersthe largest of these differencesand is prepared to apply all operatorsrelevant to this
MEANS-ENDS ANALYSIS
581
difference. This use of the difference ordering and operator However, for someproblems other differencesand a difference relevancerestricts the number of operatorsused on a subprob- ordering have proven to be very useful. FDS doesnot appear to be suitable for such problems. lem to some fraction of the total number of operators. Although this makes the search more efficient, the difference ordering has another use that gives a more dramatic improve- STRIPS ment to search.GPS rejects any subproblemthat is more difficult than the subproblem for which it was created. The diffi- STRIPS (5) is a program that was designed for solving probculty of a subproblem (s, D) is the largest differencebetween s lems that a robot might encounter. An applied predicate calcuand D; thus, the difference ordering essentially defines the lus is used to represent problems in STRIPS. The representadifficulty of subproblems.Rejecting subproblemsbecausethey tions used by other problem-solving programs is not discussed are too difficult is a very powerful heuristic becauseit prevents in this entry since the emphasis is on their problem-solving methods. However, in the case of STRIPS the representation large portions of the search spacefrom ever being explored. A precisedefinition of this heuristic can be given in terms of seemsto have had an impact on the way it solvesproblems. A the notation in Figure 2. Subproblem (s, dom(f)) must be state in STRIPS is represented by the conjunction of a set of strictly less difficult than (s,D); otherwise, the former will not literals, such as Inroom(Robot, Room-l), which indicates that be attempted. The same is true of (s, dom(g)).The subproblem the robot is in room 1. (f(r),D) must be less difficult or as difficult as (s, D); otherThe goal states are representedin STRIPS by a formula in predicate calculus; any state that satisfies it is a goal state. wise, searchwill be terminated at the former. The same is true for (g(tr), D).The first of these rules controls GPS's use of Operators are robot actions and can best be describedby an recursion; that is, applying an operator so that another opera- example: Push(x, /) is tor can be applied. In particular, it prohibits the use of f to transform s into dom(f).The secondrule controls GPS'suse of Precondition:Pushable(x),Object(y),Nextto(Robot,x), iteration; that is, once an operator is applied, the result must fr(Inroom(Robot,r) & Inroom (y,r) & Inroom(x,r)); still be transformed into D. This rule allows multiple applicaAt(x,$1,$2), Deletions:At(Robot,$l,$2),Nextto(Robot,$1), tions of an operator becauseits first application may not reNextto(x,$1),Nextto($l,x); and move the difference,but a secondapplication of the same operAdditions: Nextto(x,y), Nextto(y,x), Nextto(Robot,x). ator may remove the difference. Note that an operator that is relevant to a difference is not guaranteed to remove the differ- In this operator the robot pushes object x to objecty, provided ence. This would be too strong since most operators in the that r and y are in the same room with the robot. A state is in various problems that GPS has solved do not possesssuch the domain of the operator if all of the formulas in the preconguarantees. dition are true. The output state of the operator is formed by The above is a detailed description of the GPS problem- modifying its input state; all of the literals in the deletion list solving method and does not attempt to describe how to use are deleted, and the literals in the addition list are added. GPS effectively. The difference information is very important Variables $1 and $2 match anything. to GPS's performance, which may vary from an exhaustive Like GPS, STRIPS starts by comparing the initial state to search to no search at all. It may also causeGPS to "miss" all the goal states to detect differencesbetween the two. A differof the solutions to a problem. What constitutes gooddifference enceis formula a in the goal state (or in the precondition of an information is discussed in the penultimate section of this operator) that is not satisfied by the given state. It then atentry. tempts to apply an operator that is relevant to a difference.If an operator has a literal in its addition list that is part of the difference, the operator is considered relevant to the differFDS ence.In attempting to apply an operator, its precondition may FDS (4) is a program that was designed to solve a class of not be satisfied.In this caseSTRIPScreatesthe subproblemof theorem-provittg probleffis, such as proving algebraic identi- transforming the given state into the domain of the operator ties. The version of means-endsanalysis in FDS is in many and attempts to solve it in the same way that it attempts the respectssimilar to GPS. Differences are used to select opera- main problem. tors, and the same kind of subproblems are created. Unlike The main deviations of STRIPS from GPS is that it uses no GPS,FDS doesnot use a differenceordering. Instead,it orders difference ordering and is committed to the particular kind of the operatorsthat are relevant to reducing the differencesof a differencesdescribedabove.Thus, STRIPS needsno problemsubproblemaccordingto how well they remove differencesand dependent parameters, like GPS's difference information, how difficult they are to apply. This is determined, not by sinceit can determine operator relevanceby a simple analysis applying the operators, but by analyzing their behavior in of the operators.Although no external heuristic information is terms of the differences they introduce and remove. Another given to STRIPS, the formulation of operators is very impormajor deviation from GPS is that FDS uses the same set of tant becausethe differences are embeddedin their precondidifferencesfor all problems. This essentially moves the differ- tions. Reformulating the operatorsmay thus changethe difference information inside of FDS which, consequently,has no ences,which may have a large effect on STRIPS'sperformance problem-dependentparameters. The relevance of operators to Problemslike Fool's Disks (seebelow) require differencesthat differences is determined by FDS through an analysis of the are more or less independent of the problem formulation. operator specification.This has the goodeffect of makitrg FDS STRIPS has no provision for such differencesand consequently self-contained since no external heuristic information is re- would have difficulty with such problems. However, a modifiquired. The performance is not appreciably degradedby this; cation of it uses a difference ordering in a very effective way, FDS has the ability to solve reasonably difficult problems. which is describednext.
582
MEANS.ENDS ANALYSIS
ABSTRIPS
Simon (3) have proposeda similar method in which the planning space is defined in terms of the difference ordering (g). Basically, the difference ordering is cut in half; the more difficult half is used in the planning space,and the easier half is used in the problem space.As with ABSTRIPS, the same set of subgoalsis encountered,but in a different order. This research shows that there is a definite connection between the difference ordering and planning and that such methods dramatically improve search. Empirical tests show that ABSTRIPS's search is much more efficient than that of STRIPS.
ABSTRIPS (6) was designedfor the same class of problems as STRIPS and uses a very similar representation for problems. Each literal in the precondition of an operator is assigned a criticality value whose purpose is to indicate how difficult it is to remove a differencethat contains this literal. As in STRIPS, differences are formulas that are not satisfied by the given state. The criticality values are assigned in a semiautomatic way; a partial ordering on the predicates,reflecting their intuitive importance, must be given to ABSTRIPS, which then assignsthe criticality values by analyzingthe operator specifications. MPS ABSTRIPS starts by solving the problem at the highest criticality level; i.e., it ignores all those literals in the precon- MPS (7) is a problem-solving method that is capable of effiditions of operators that have a lower criticality value. This ciently solving some very difficult puzzles, such as Rubik's yields a solution that is correct in the most critical literals but cube. MPS requires its problems to have a single goal state not in other literals. Next, it solvesthe problem at the second rather than a set of goal states.The central componentof MPS highest criticality level. This involves finding a literal at this is the macro table in which each column is labeled by a state level that was not true in the top-level solution, creating the componentand each row is labeled by a possiblevalue for a subproblem of making it true and inserting its solution into state component.Each entry in a macro table is a macro operator, which is a sequenceof operators.At each step in the probthe top-level solution. For example, supposethe problem is to move Box-l to Box-2. At the top level the single operator lem-solving process,I macro operator is applied to the current (Push(Box-l, Box-2) is a solution to the problem. But at the state s by applying the first operator in the sequenceto s, the next level STRIPS notes that the two boxesare not in the same next operator to the result of the first, etc. The resulting state room. Moving Box-l to Box-2's room is set up as a subproblem is the input to MPS's next step. The macro operator m;i that is and its solution together with Push(Box-l, Box-2) constitutes applied to s is selectedas follows: s and the goal state have the same values for the first j - 1 state components(which label a solution to the main problem. ABSTRIPS solves all of the 1 columns). In addition, i is the value of the jth subproblemsat a particular criticality level and then movesto the first 7 the next lower criticality level. Backtracking occurswhen no componentof s. This is a very simple procedure;the trick is to solution to a subproblem can be found. This causesABSTRIPS use a macro table that leads to a solution. Macro tables are required to have the following property: If to find an alternative solution at the higher levels. This use of criticality values is essentially the same as m;i is applied to any state in which the first j - 1 components GPS'suse of the difference ordering. ABSTRIPS only considers have their goal values and componentThas i as its value, the resulting state will have goal values for its first/ components. the most difficult differences of a subproblem, those of the current criticality level; smaller criticality values are ignored, In addition, the precondition of each operator in rnii wlll be and larger ones have already been removed. An operator can satisfied if m;i is applied to a state with these values for its first be applied to a state that is not in its domain, but at lower J components. Needless to s&y, these are very strong conditions, but they can be satisfied for many problems. The first criticality levels the subproblemof transforming the state into step of MPS produces a state whose first component has its its domain must be solved.This correspondsto GPS'srule that goal value; after the secondstep the secondcomponenthas its the difficulty of transforming a state into the domain of an operator must be less than the difference the operator is sup- goal value; etc. Unlike the methodsdescribedabove,MPS does posed to reduce. The reason is that the subproblem must be not use any search. The column labels correspondto the differencesof GPS,and solved at lower criticality levels. Thus, ABSTRIPS generates their order gives the differenceordering, the label on column 1 the same subproblems as GPS and uses the same rules for being the most difficult difference.Thus, the first step of MPS terminating search. The new thing in ABSTRIPS is that the order of its search removesthe most difficult difference, and none of the remainis different from that of GPS. Using the notation in Figure 2, ing steps reintroduce it. This is equivalent to GPS's rules for ABSTRIPS delays (s, dom(f)) until after (f(r), D) has been terminating search at more difficult subproblems.MPS consolved.This is non-trivial becauseri, a result of the former, is tains two extensionsto GPS: Macro operatorsare used instead of operators to reduce differences,and each macro is guaranunknown; i.e., ri is whatever state a solution to (s, dom(f)) to reduce the difference to which it is relevant. Banerji teed yet been solved. ABproduces, and this subproblem has not (10) argued that these extensionsto GPS were needed,but also not it does care about because this is very about clever STRIPS he did not have a particular proposalfor the implementation of the lower criticality literals in rii it only cares about the higher ones,which it already knows. Delaying (s, dom(/)) can the latter. The first extension is necessarybecausethe differencesare be very prudent because (f(r), D) may not be solvable in which case (s, dom(/)) will never be attempted. Of course, not invariant over operators; i.e., in applying a macro operator, the operators that composeit temporarily reintroduce difin somecases(s, dom(f)) will notbe solvable,and the effort in ferences that have already been removed, but they are also solving (f(r), D) has been wasted. (6) problem in removed before the end of the macro application. The second a a solving as ABSTRIPS views Sacerdoti hierarchy of abstraction spaces,each criticality level being a extension guarantees progress unlike the relevant operators different abstraction space. The solution at each level forms in GPS, which are only required to affect and not necessarily a plan for the solution at the next lower level. Newell and to fix the property to which a difference pertains.
MEANS.ENDSANALYSIS
A mechanical procedure for learning macro operators with the desired properties has been developed;spacepermits only a brief outline of it. A search is conductedfrom the goal state using the inverse of the operators; i.e., they are applied, first, to the goal state, then to the states producedby these applications, etc. For each state s in this space,there is a 7 such that the fi.rst,1 1 componentsof s have goal values and the ith component has some nongoal value i. Then the sequenceof operatorson the path from the goal state to s is the inverse of a macro operator for row i and column i of a macro table. Thus, each s identifies an entry for a macro table becausethe inverse of a macro is just the inverse of each of its elements in the reverse order. Examining enough of the search spaceshould identify macros for an entire macro table. However, the ordering of state components(the column ordering) is an input parameter to this procedure; it is not learned. MPS does not create subproblemsof the form "transform a state into the domain of an operator." For problems like the Tower of Hanoi they would be useful. There are two major limitations of MPS: Multiple goal states are not allowed, and the differences must be state components.Fool's Disks, described below, is an example of a problem that violates both of these conditions and hence lies outside MPS's domain of applicability.
5
o 4
Figure 4. Initial state in Fool'sDisks puzzle.
variant over the operators.Note that this table is triangular0's above the diagonal. In general, it is this triangular property of the differences that is desirable. It is also important Good DifferenceInformation that the difference ordering is the row ordering in the triangular table. The previous sectionshave pointed out the importance of difFool's Disks has two interesting features besidesthe trianquestion of ference information. This section addressesthe gular property. Its differences are much more complex than a what properties difference information should possessto efficomponent,such as the first number on the third single-state ciently guide search. For motivation, this discussion starts of the problem solvers in this entry (including the Most disk. with an example. Fool's Disks is a puzzle in which each state in Ref. 2) cannot handle such differences.The GPS of version other; has four disks that can be rotated independently of each has to do with operator relevance.Note that a 45o item Figure 4 gives the initial state. There are eight numbers on second to reduce D3 even though it could also reis used only move each disk, and the goal is to align the disks so that each of the general, only the 1's on the diagonal indicate operDl. In duce eight columns radiating from the center sums to L2. Good gives a global view of invariTriangularity relevance. ator differencesfor this problem are: ants, i.e., how different invariants relate to one another. Local D3: The 16 numbers on the horizontal and vertical diame- invariants are not enough because,for example, using a 45" move to reduce D1 will only have the undesirable effect of ters do not sum to 48; reintroducing D3, which is not present when attempting to to 24; and not sum do D2: the 8 numbers on a diameter reduce Dl. Triangularity also leads to a form of completeness Dl: a radius doesnot sum to 12. (11): GPS will find any solution in which the differencesnever increase. This shows that the off-diagonal 1's should not be The difference ordering is D3>D2>D1. Using this difference in definition of relevance. the used possess does not D3. for a state that information, GPS searches Triangularity has been used as the basis of a method (12) Then, without reintroducing D3, it searchesfor a state that for learning difference information. The basic idea is to find a doesnot possessD2. And finally, it looks for a goal state without reintroducing either D3 or D2. Of course,GPS backtracks set of differencesthat gives rise to a triangular table. The row ordering gives the difference ordering, and the diagonal enif necessary. The invariances of the differences are important. A differ- tries give operator relevance. The method starts by looking for enced is invariant over an operator f if , for any state s, both s properties that are invariant over at least one operator. Then, and f(s) either possessd or both do not. In other words, f the properties are combined to form properties all goal states neither introduces nor removesd. For example,D2 and D3 are possess.These properties are potential differences, and the invariant over 180'moves that are relevant to Dl. The invari- method attempts to form a triangular table out of them. The details are given in Ref. L2. ances can be tabulated as follows:
D3 D2 D1
F3 F2 100 110 111
F1
Fl are the 180" moves;F2 are the 90o and270" moves; F3 are the remaining moves. A 0 indicates that the difference is in-
Summary A number of different problem-solving methods have employed some form of means-endsanalysis. Empirical results show that it has been rather effective in controlling search. Most of these methods use a specialization of the mechanisms found in GPS in order to remove the requirement for external
MEDICATADVICE SYSTEMS
information about differences. In addition to GPS's mechanisms, MPS has two important mechanismsit usesvery effectively in solving difficult puzzles like Rubik's cube. Some guidelines are given for selecting differencesthat lead to efficient search.
BIBLIOGRAPHY
level performance so long as the program had comprehensive and accurate knowledge of the domain. AIM research activities are important to medicine not only becausemedical advice systemswill somedaybecomeroutine tools in clinical practice but also becausethe education of doctors,which has traditionally emphasized memorization of knowledge, may increasingly emphasize the learning of effective problem-solving techniques,enhancedwith the knowledge and advice provided by computer systems.
1. A. Newell,J. C. Shaw,and H. A. Simon,Reporton a General Problem-Solving Progr&ffi, Proceedingsat the International Conferenceon Information Process.,UNESCo House, paris, pp. 256264, 1960. 2. G. W. Ernst and A. Newell, GPS: A CaseStudy in Generalityand, Problem Soluing, Academic Press, New york, 1969. 3. A. Newell and H. A. Simon, Human Problem Soluing, Prentice Hall, EnglewoodCliffs, NJ, Ig7Z. 4. J. R. Quinlan and E. B. Hunt, "A formal deductiveproblem-solving system," JACM 15, 625-646 (October 1968). 5. R. E. Fikes and N. J. Nilsson, "STRIPS: A new approachto the application of theorem proving to problem solving, Artif. Intell. 2, 25r-288 (1971). 6. E. D. Sacerdoti, "Planning in a hierarchy of abstraction spaces, Artif. Intell.5, 115-135 (1974). 7. R. E. Korf, "Macro-operators:A weak method of learnirg," Artif. Intell. 26(I),35-78 (April 1985). 8. N. J. Nilsson, Problem-Soluing Methods in Artificial Intetligence, McGraw-Hill, New York, L97L. 9. Reference3, pp. 428-435. 10. R. B. Baner{i, GPS and the Psychologyof the Rubik Cubist: A Study in Reasoning about Actions, in A. Elithorn and R. B. Baner{i (eds.),Artifi,cial and Human Intelligence,Elsevier Science, New York, 1984. 11. R. B. Banerji and G. W. Ernst, A Theory for the CompleteMechanization of a GPS-type Problem Solver, Proceedings of the Fifth International Joint Conference on Artificiat Intettigence, Cambridge, MA, 1977, pp. 450 -456. 12. M. M. Goldsteinand G. W. Ernst, "Mechanicaldiscoveryof classes of problem solving strategies," JACM 29, I-28 (January 1982). G. Enxsr Case Western ReserveUniversitv
MEDICALADVICESYSTEMS For several decades, collaborating computer scientists and physicians have been building computer programs to diagnose medical illness and to recommendtherapy. In the early 1970s four research groups developedprograms that differed somewhat from the other medical decision-making programs in that they drew heavily on earlier AI research such as DENDRAL, a program from the late 1960s that had used expert knowledge to derive chemical structure from mass spectral data (1). The resulting work helped define the field of AI in medicine (AIM) and seededdevelopmentof expert systems(qr) in other domains (i.e.,fields of expertise)as well (2,3).Medical diagnosis and patient management problems helped demonstrate the validity of an emerging AI principle: that domainspecific assertions and extensive knowledge about a problem area are generally more crucial to problem-solving performance than are domain-independent principles of reasoning. Simple reasoning techniques were shown to suffice for expert-
TheoreticalBasis ProtocolAnalysis.The theoretical foundation of AIM owesa great deal to psychological research carried out in the mid1970s.In these experiments physicians were urged to verbalize their thoughts while they solved diagnostic problems. Researchers then analyzed transcripts of those sessions. Investigations of this type (4,5) identified a general problem-solving procedure common to both expert and novice physicians: the hypothetico-deductive approach. Hypotheses emerge quite soon after the physician begins gathering data, and these are tested as new data arrive. Questions may be generated solely to test an active hypothesis or to distinguish between hypotheses. Thus, early generation of hypotheses seemsto provide leverage for the diagnostician. Building on those results, researchers at the University of Minnesota (6) examined the performance of both experts and novices and found differencesnot in their reasoning-regardless of experience,they shared the hypothetico-deductiveapproach-but in the richness and organization of medical knowledge.Novices had spotty knowledge of diseases,not yet full enough or sufficiently organized to optimize the hypothetico-deductive approach. These results agreed with the results of the expert systems research mentioned earlier in that performance seemed to be critically dependent on domain-specificknowledge. KnowledgeRepresentation.Two aspectsof knowledge representation (qv) are of particular interest in consideringthe construction of medical advice systems.First, what knowledge do physicians use to make the diagnosis and to plan therapy? Second,what abstract data types are best for computer implementations of that knowledge? It became increasingly clear that the first-generation AIM programs captured only a small portion of the knowledge that physicians actually use in problem solving. Typically, the medical knowledge represented consistedof weighted associationsbetween findings (i.e., observable descriptors of a patient) and hypothesesor between two hypotheses.The underlying semanticsof such associations were not always made clear, and there was generally no distinction made between causal and associationalrelationships. For example, a diagnostic system might represent a link between the hypothesis of breast cancer and the finding that the patient's mother had breast cancer.In this casethe finding is a risk factor, not a clear causal relationship, as a skiing accident might be to a ftactured leg. In recent years AIM researchhas explored various representations for causal knowledge and their integration into advice systems (seeReasonitg, causal). Pure causal modeling is rarely applicable in medicine becausemedicineis an empirical sciencein which detailed mechanisms are often unknown. However, whenever cause-effect information is available to physicians, they use it in at least five ways. First, if one can confidently follow effect-to-cause
MEDICALADVICESYSTEMS 585 links (i.e., statementsof what entities may causean observed effect) from the patient's complaints back toward primary disorders, Bil intersection point provides the diagnostician with a commoncauseof multiple complaints.CASNET is a computer program developedat Rutgers for the diagnosisand treatment of glaucoma; that domain lent itself to this intersection-point technique (7,8). (For historical reasonsthe names of computer programs are written in uppercase,e.9., CASNET. Occasionally, the name is an acronym, but understanding the acronym seldom helps one understand the system. In this discussion many of the better known computer programs are referred to by an uppercasename, and the acronyms are explained only if necessary to understandthe accompanyingtext.) Second,medical therapy is often unavailable either for the patient's complaints or for the elemental physiologic disorder (primary disease)at the beginning of the causal path. But effectivetherapy may indeed be available for intermediate states. For example, swollen, painful feet can be causedby abnormal retention of fluid in the body, which is in turn causedby cardiomyopathy. Current medical therapy cannot correct cardiomyopathy, and it would be suboptimal to simply give pain killers for swollen feet, but drug therapy can reversethe fluid retention (intermediate state) and thus relieve the patient of swollen feet. Third, physicians use causal models to interpret the temporal ordering of complaints. Leg cramps that occur during vigorous walking may be due to atheroscleroticdisease,in which the leg muscles begin consuming more oxygen than the narrowed leg arteries can deliver. Leg crampsthat are relieved by walking cannot be explained by this mechanism. Fourth, causal information can be used by a diagnostician to avoid treating two related findings as though they provide independent support for a hypothesis. For example, if there are known associations between findings fi, fn, and hypothesisI/, observation of both fi and fn might be interpreted as contributing independently to confidencetn H. But if it is known that the causal path is f/ - fi - fn, then f1 and /p must be dependent.Cooper (9) uses causal models in this way to establish probability bounds that are consistent with knowledge about cause and effect.Finally, physiciansuse causal modelsto partition their knowledge into levels of abstraction. Diagnosis and explanation can then be performedat the clinical level (e.g.,fatigue) or the pathophysiologicallevel (e.g., serum partial pressure of carbondioxide in blood is related algebraically to pH), depending on the complexity of the problem and the demands for explanation. ABEL, a computer program developedat MIT to deal with acid-base and electrolyte disorders, first demonstrated the advantages of using such levels of abstraction (10,11). Another area of increasing emphasishas beenthe representation of a taxonomy (i.e., hierarchic organization)for the diagnostic hypothesis space. For example, viral hepatitis and alcoholic hepatitis are both inflammatory diseasesof the liver. A representation schemethat captures this type of hierarchic relationship might allow the system to begin reasoning at an appropriate level of abstraction, e.g., to identify a patient as having hepatitis before beginning to determine which subtype is present.Diseasetaxonomies,then, have been used to direct search. The MDX system, a liver diseasediagnostic program developedat Ohio State University, contains a taxonomy of diseasesthat allows the system to direct the search as a progressive refinement of hypotheses, popping back to higher nodes in the hierarchy only when strong contradictions arise (L2). Another control schemethat uses taxonomic knowledge
extensively can be found in the design for enhancementsto INTERNIST, a diagnostic program for internal medicine developedat the University of Pittsburgh (10). The abstract data types used in AIM systems have been legion, but three classespredominate: production rules (see Rule-basedsystems),frames (seeFrame theory), and semantic networks (qv). AIM researchershave not been uniform in their choice of knowledge representations. Four early AIM computer programs exemplified this diversity of representation schemes:MYCIN experimented with production rules; PIP and INTERNIST used disease frames; and CASNET representedcausal relations in an associationalnetwork. An excellent discussionof knowledge representation in these four early AIM systemscan be found in Ref. 13. Support for the definition of abstract data types is provided by "object-centeredprogramming" languages (see Languages, object-oriented),which can be used to bind algorithms to the data structures on which they operate.Many feel that the developmentof large systems is more manageable with this encapsulation scheme,and it still allows designs that use production rules, frames, or networks. Several of the object-orientedlanguages facilitate the construction of taxonomies becausethe languagesprovide automatic inheritance of behavior from objectsto their subtypes in a hierarchy. Control. Separation of the knowledge base (data structures) and control (algorithms) is often cited as a central element in expert system design and is a goal of most AIM system designersbecausethe technique preservesthe ability to work with each component separately. Designers can experiment with new control schemes,keeping the knowledge base fixed, and observeperformancechanges(seeControl structures). For example, a new technique for combining evidence might be run on the MYCIN knowledge base, a collection of rules for making infectious disease diagnoses.Or a new INTERNIST differential diagnosis mode might be run on an otherwise unaltered knowledge base. Knowledge acquisition (qv), a primary concern of medical advice systems, can ideally be achieved by adding new instantiations of a data structure (e.g.,a new rule or a new diseaseframe), thereby upgrading the knowledge base without changing the control structure. There are as many control schemesas there are systemsand a large number of terms in use. MYCIN searchesits rule set using a depth-first control strategy. The system usesbackward chaining to invoke and link its rules so that a reasoning network is created dynamically. INTERNIST's control is initiated with a data-directed scheme but evolves into a hypothesisdirected approach after an initial set of hypothesesis invoked (14) (seeProcessing,bottom-up and top-down).The Serum Protein Diagnostic Program (Helena Laboratories), built with an expert system-building tool known as EXPERT (15), doesnot require hypothesis-directedcontrol becausequestion selection is not a problem; most of the information is obtained automatically from an electrophoresisinstrument with which this program is packaged and sold. Thus, its control is predominantly data directed. The control stratery of Ohio State's MDX system (I2) is a breadth-first search of a static tree. As MDX pushesdeeperinto this taxonomy tree, it refines hypothesesto be more specific.The ATTENDING syst€ffi, developedat Yale to critique anesthesiamanagement plans (16,L7),searchesa hierarchical planning network in order to identify alternatives to the user's proposedplan. Starting at the most detailed arcs of this augmented decision network (similar to aug-
586
MEDICALADVICE SYSTEMS
mented transition networks (seeGrammar, ATN) used in natural-language (qv) research), ATTENDING compares the of the user's proposedarc (action) to the risks of parallel ;TI:
the advantages of each can be melded in medical-advicesystems (seeReasonirg, plausible).
EvaluationFunctions.AI chess-playingprograms (seeComputer chessmethods) use an evaluation function to assign scalar values to board positions.Advice systemsin medical management face analogous situations, but the values of medical outcomesare difficult to assess.What are the relative values of chronic pain vs. a lifetime of paralysis vs. loss of life? The absenceof a generally accepted"correct" therapy means that the physician will demand a reasoned argument that addressesthe issues of costs and benefits in a convincing way. This issue is of growing importance to medical AI researchers becausethere is increasing interest in designing therapy systems. Diagnosis systems typically sidestep the difficulties of evaluation functions, except as they relate to test selection during diagnostic workup. Most of these systems consider information-gathering costs,but this doesnot constitute a comprehensive value theory for medical advice systemsbecauseit ignores the utility of acts, i.e., the cost of incorrect action. For example, assumethat a medical-advicesystem concludesthat an infection is most likely causedby organism 1 and much less likely by organism 2. Is it correct management to treat for organism 1 and not for organism 2? Perhaps not if organism 1 causesonly discomfort, organism 2 can cause death, and the treatment for organism 1 may causekidney damage.The cost of diagnostic misclassification drives the real-life diagnostic process.Medical cost containment pressures may force more explicit inclusion of cost-benefit considerations.Someevaluation techniques that AIM management programs have used are included in the discussionof example systems,below. Future researchis likely to draw upon related disciplines such as operations research that provide a formal theory for evaluating the expectedutility of actions.
Additional ongoing research topics for investigators building AIM systems for diagnosis or management advice include knowledge acquisition, explanation, temporal reasoning,and validation.
Inexactlnference(ScoringHypotheses).Inexact inferencein this discussionrefers to use of information that is probabilistic to somedegreerather than purely categorical (seeReasonitg, plausible).Medical evidenceis such that most conclusionscan be drawn only with a limited degreeof certainty. This character of medical evidenceand hypothesis assessmenthas driven AIM researchers to experiment with different scoring schemes.Few AIM systems have used classical probability theory to represent uncertainty. Systemsdevelopedin medical centers have tended to seek representations for uncertainty that reflect physician behavior, and several researchershave argued that probability theory and the use of Bayes' theorem (see Bayesian decision methods) do not model that behavior well (18). They further argued that the application of Bayes' theorem often requires so many simplifying assumptionsthat the theoretical foundations tend to be invalidated in any practical system using a probabilistic approach.Thus, more ad hoc approacheshave becomecompetitors for representation of uncertainty. The MYCIN experiments resulted in the certainty factor model (18).The INTERNIST project produceda calculus of evoking strength and frequency weights (14).Thesealternatives vary in their degree of formalism. It is expected that future work witl better elucidate the features of these alternatives that were not seen in probability theory. The perceived differences between formal systems like probability theory and the alternatives may diminish as researchersidentify how
ResearchThemes
Knowledge Acquisition. A well-recognized bottleneck in building expert systems is acquiring knowledge from the expert. Work on TEIRESIAS, a program built to interface with MYCIN (19),demonstratedthat a program might assistin the on-line transfer of knowledge from a human expert to the consultation program's knowledge base. The expert could disagreewith a conclusion,and then the system would trace, step by step, back through the reasoning processuntil the erroneous rule (or missing rule) was identified. The SEEK program, which operated in concert with the EXPERT program mentioned earlier, also provided assistancein recognizinghow a system'sknowledge base should be altered (20). Focusing on actual cases,the system suggestsrefinements to the knowledge base, which take the form of adding or deleting entries from the "major findings" or "minor findings" of a disease. Explanation.MYCIN was one of the first systemsto demonstrate that explanation (qv) capabilities might be key to physician acceptanceof computer-baseddecision support (2L). MYCIN allowed users to ask "why?" when they were unclear about the purpose of the system's questioning and "how?" when they wanted to know how the system would (or did) reach certain conclusions. Researchersat MIT enriched the Digitalis Therapy Advisor (22) with causal models of heart rhythm disturbances and principles of antiarrhythmia therapy to create a computer program named XPLAIN (23), which could give the rationale behind a therapy. This work demonstrated that optimal explanation was facilitated by accessto the more abstract principles, which do not always appear in the program code. The goals of the NEOMYCIN project at Stanford University are to provide explanation of the diagnostic processin terms of diseasesand symptomsbut also in terms of the overarching principles of medical diagnosis.This work has included a revision of MYCIN's rules and the addition of an explicit model of diagnostic strategy (24 and 25). The ATTENDING system for anesthesia management planning first proposed the critiquing approach to explanation (16,17). Rather than simulating a physician's reasoning and generating a recommended action, critiquing systems center their analysis around the user's proposed management plan. In medical management there is often more than one defensible therapy, so an approach that hightights the pros and cons of each approach is more likely to meet acceptanceby the physician. In addition, critiquing systemsremain silent on the uncontroversial aspectsof the plan. Temporal Reasoning.Medical advice systems are usually designedwith the assumption that data are gathered and inferencesare made at one point in time. Since medical diagnosis and management actually take place over time, optimal medical advice systems would allow reevaluation of the patient, assessingthe rate of diseaseprogressionor the therapeutic responseto prior treatment. The Digitalis Therapy Ad-
MEDICALADVICE SYSTEMS
587
visor, VM, and ONCOCIN are unusual in that they have PR EM ISE: ( $AN D ( SAM E C N T XT IN F EC T PR IM AR Y- BAC T ER EM IA) (MEMBF CNTXT SITE STERILESITES) attempted to man age patients over time. The Digitalis Therapy Advisor (22) usesthe results of previous treatment to alter ( SAM E C N T Xr POR T AL G I) ) its model of the patient. For example,if predictedbody stores ACTION: (CONCLUDE CNTXT IDENT BACTEROIDES TALLY .7) of digitalis are much higher than measured stores,the system adjusts the "oral absorption" parameter downward. VM, a program designed to assist with the management of patients on IF : 1) T he i nfec ti on i s pr i m ar y - bac ter em i a,and respiratory-support systems (ventilators), assumesthat par2) T he s i te of the c ul tur e i s one of the s ter i l e s i tes ,and ticular data are only valid for a certain period of time, and the (26). this of 3) The suspectedportal of entry of the organism is the gastro-intestinaltract, An example systemcan representtemporal trends pressure blood arterial mean in rise a is VM's ability to detect THEN: There is suggestiveevidence(.7) that the identity of the organism is bactercides. of 15 torr (2kPa) over 10 min. ONCOCIN (27)followspatients Figure 1. Rule from MYCIN knowledge base. LISP code at top is through many cycles of cancer chemotherapy,each cycle lastdynamically translated to the prose explanation at bottom. ing weeks. Someof its inference rules are basedon the temporal trends of patient parameters (see also Reasoni.g, temproduction rule representation (9) and also for its innovative poral). model of inexact reasoning, the calculus of certainty factors judged (18). MYCIN became a laboratory for investigations into acby the usually Validation. Diagnosissystemsare accepted to some acquisition (19), metalevel reasoning(34), intelliknowledge compared when diagnosis their curacy of "gold standard." Credibility is gained by evaluating the pro- gent computer-aidedinstruction (32), explanation (21), and gr&ffi, informally at first and then in double-blind studies. Sev- knowledge-engineeringtools (35).The evaluation of MYCIN's eral groups have carried out formal evaluations of perfor- performancewas a careful, blinded study, which demonstrated mance (L4,28-30). Evaluation in a different clinical setting that MYCIN was competitive with expert clinicians (29). INTERNIST is designedto diagnosediseases,and combinafrom that in which the system was built has the advantage of demonstrating generalizability. Fewer groups have evaluated tions of diseases,within the extensive domain of internal medthe acceptability to users, and successin this area is notori- icine (L4). This system has been developedover a lO-year peously difficult to achieve.Systemsthat will involve hands-on riod by workers at the University of Pittsburgh. Diseasesare use by doctors face additional challenging design issues com- explicitly related to their clinical manifestationsin data strucpared to those systems that analyze instrument data and pro- tures resembling frames (Fig. 2). The strengths of association are captured in two numbers: "evoking strengths" and "freduce a report. Objectives and guidelines for system validation quency weights." Evoking strength representsthe degree to are discussedin Ref. 31. which the manifestation suggests the disease. Frequency weight representsthe likelihood of finding that manifestation Example Systems in the presence of the given disease.Patient data allow the program to contribute evoking strengths and frequency Several medical advice systems are now discussed in more illustrated weights to diagnostic hypotheses.Then, a high-level control are above mentioned issues detail. The theoretical choosesone of four hypothesis-directedcontrol schemes,(i.e., in three programs designed for medical diagnosis and in four conclude,pursue, rule out, and discriminate), depending on other programs concentrating on management. the number of active hypotheses and how closely they are program deinteractive clustered by weights of evidence. This higher level control is an Diagnosis Systems. MYCIN schememodels the hypothetico-deductiveapproachmentioned signed to be used as a consultant in difficult cases of earlier. The system can handle multiple coexisting diseases meningitis or bacteremia. It suggests a set of likely organisms (bacteria) and then proposes therapy that will treat those that through a clever partitioning algorithm that allows it to focus rule-based first the one of on the differential diagnosis of subsets of findings while it MYCIN was most significant. are holds the additional patient data in abeyancefor later investiexpert systems (18). The domain knowledge of MYCIN is repgation. Question selection is driven by whichever control resented in a set of abut 500 production rules. Most of these scheme INTERNIST has chosen.INTERNIST's inexact rearules encode associations between the findings and a hypothesoning technique is a logarithmic system of weights that are sis (Fig. 1). These if-then rules were easily understood by both additively combinedby an algorithm that was empirically dethe computer scientists and the physicians collaborating on rived. INTERNIST usesa coarsecost-classificationof findings program construction. Problem-solving behavior could be to decidewhich test to request next. In descendingorder of cost modified by altering a rule or adding new ones. However, valuthey are invasive labs, noninvasive labs, physical exam, and able knowledge about disease taxonoffiy, cause and effect, and history. In contrast to the MYCIN project, which emphasized temporal ordering between disorders was represented only imknowledge acquisition and explanation techniques, INTERNplicitly, somewhat buried in the rules. This frustrated atIST has concentrated instead on the comprehensivenessof its tempts to use MYCIN's rules for intelligent computer-aided knowledge base and the optimal strategic mode for a differeninstruction (32). As described earlier in the discussion of contial diagnosis. A careful evaluation of INTERNIST's pertrol, MYCIN backward chains through the rule base, although formance demonstrated broad diagnostic abilities (L4). INfor reasons of efficiency, rules are occasionally invoked in a TERNIST has been the inspiration for a new program, called data-directed fashion (33). Rules were meant to represent only CADUCEUS, which is intended to addressmany of the inadedomain knowledge, but ultimately they encoded a good deal of quacies of INTERNIST. Plans for CADUCEUS include excontrol logic as well. Later rule-based systems have attempted plicit modeling of diseasetaxonomies and cause-effect relato achieve a cleaner separation between control and domain tionships (36). level knowledge. The MYCIN research is well known for its
5BB
MEDICALADVICE SYSTEMS
AlcoholicHepatitis A G E 1 6 T O 2 5 ...0 I AG E 2 6 TO 5 5 ...03 AG E G T R T HA N 5 5 ...02 ALCOHOL INGESTION RECENT HX ...24 AL C O H OL IS MC HR ONIC H X ...24 SEX FEMALE ...0 2 SE X M AL E ...04 U R I N E D A R K H X . . . 13 W E I G H T L OS SGT R T H A N 1 0 PERCENT ...03 ABDOMEN PAIN ACUTE ... T 2 ABDOMEN PAIN COLICKY ... 1 1 ABDOMEN PAIN EPIGASTRIUM ... 1 2 AB D O M E N P A IN NON -C OL ICKY ... T 2 AB D O M E N P A IN RIGH T U P PER QUADRANT ... 1 3 AN O R E X IA ...04 D I AR R H E A A C UT E ...12 M YA L G IA ...03 VOMITING RECENT ... O 4 A B D O M E N B RU IT CON T IN UOUSRIGHT UPPER QUANDRANT ...I 2 A B D O M E N T E N DE R NE S SR IGHT UPPER QUADRANT ...24 CONJUNCTIVA AND/OR MOUTH PALLOR ... 1 2 FECES LIGHT COLORED ...1 2 FEVER ... O 4 H A N D ( S ) D UP U Y T R E NSCONTRACTURE( S)...12 J AU N D I C E ...I 3 L E G ( S ) E DE MA B T L A T E RA LSLTGHTOR M ODERATE ...12 L I V E R ENL A R GE D MA S S IV E...12 L I V E R E NL A R GE D MODE R ATE ...13 L I V E R ENL A R GE D S L IGH T ...12 PA R O T TDGL A ND (S ) E NL A R GED ...12 SKIN PALLOR GENERALIZED ...0 2 S K I N PA L MA R E R Y T H E MA ...13 S K I N S P IDE R A NGIOMA T A ...23 S K I N T EL A NGIE C T A S IA ...11 A L KA L I NE P HOS P H A T A S EB LOOD GTR THAN 2 TIMES NORM AL ...12 ALKALINE PHOSPHATASEBLOOD INCREASED NOT OVER 2 TIME NORMAL ... 1 4 BILIRUBIN BLOOD DECREASED ... 2 2 B I L I R U BIN UR INE P R E S E NT...24 CHOLESTEROLBLOOD DECREASED ...22 C H O L ES T E ROLB L OOD IN CREASED...12 H EM A T OC RIT B L OOD L E S STHAN 35 ... I 3 H EM O G L OB IN B L OOD L E S STHAN 12 ... 1 3 KE T O N UR IA ...I 2 PR O T EI NU RIA .,.I 2 sGoT 120 TO 400 ...2 3 s G o T 4 0 T O 1 1 9 ...23 SGOT GTR THAN 4OO...I 2 U R EA NIT R OGE N B L OOD L ESSTHAN 8 ...22 U R O BI L IN OGE N U RIN E A B SENT ...11 U R O BI L IN OGE N U RIN E IN CREASED ...24
Figure 2. A portionof onedisease"frame"usedby the INTERNIST computerprogram.The strengthof association betweenthe disease (alcoholichepatitis)and its manifestationsis capturedby the two numbersfollowingeachmanifestation. The first number,the evoking strength,representsthe degreeto which the manifestationsuggests The secondnumber,the frequencyweight,represents the disease. the likelihoodof finding that manifestationin the presence of the given disease. CASNET, developedat Rutgers University, uses a CausalASsociation NETwork to represent cause-effectlinks between hypothesesas well as associationallinks betweenfindings and hypotheses (7,8). An example of a cause-effect link in the system is elevated intraocular pressure causing visual field loss (Fig. 3). There are at least two advantagesto this causal net representation scheme. First, the system can trace from activated hypotheses,backward along the cause-effect pathways, to identify starting nodesin the network. Starting nodes are hypothesesfor which no causeshave been defined and are thus primary disorders. Second, if any intermediate node along this path is known to be false (a "denied node"), this causal pathway can be ruled out as a candidateexplanation for the patient's complaints. CASNET developedone of the more complex test-selection algorithms, in which a weighting schemeis used to selectwhich test should be donenext and to define when further tests or questions are unnecessary.For and each CASNET state S, there are appropriate tests Tr,z,...,n a current weight of evidence(W). Each 7; has an associated cost C;. Weight is separate from "status," although both are measures of belief. Status derives from rules that conclude about S. Weight, on the other hand, is basedsolely on the fact that certain states "causally connected" to S have positive status. The weight of S is calculated by multiplying the status
of a state that is causally connectedto S times the product of the "associative strengths" along the connecting links. One test selection strategy focuses on state Se that is currently considered the most likely state, surveys possible tests Tur,ur,...,hn, &TLd selectsTp,wrth the smallestCp,.If the quotient WplCp,exceeds a predeterminedthreshold Q, the program asks the user for the results of Tp,. If not, it goes to the next best state, repeatittg the test selectionprocedure.If no weight-cost ratio exceedsQ, the system stops. Managementsystems.The Digitalis Therapy Advisor (22) is a program designedto help physicians prescribe a doseof the drug digitalis for particular patients. This program uses body weight, dge, target serum concentration, and other parameters of a pharmacokinetic model and producesan initial dose estimate. Subsequent feedback about toxic and therapeutic states (qualitative information) then guides adjustments to that initial dose.One of the system'scentral data structures is the patient-specificmodel (PSM). The PSM includes not only clinical and laboratory data but also the reason for digitalis administration. This allows the system to evaluate therapeutic responsefrom subsequentclinical information. For example, if atrial fibrillation (an abnormal heart rhythm) were the reasonfor using digitalis, the system would look for heart rate decreaseto determine therapeutic response,but if congestive heart failure were the reason for using digitalis, the system would look for signs of decreasingcongestion,such as resolution of ankle edema.The Digitalis Therapy Advisor'scombination of math modeling and AI techniqueswas novel, especially in a system that took advant age of feedback about the patient's responseto earlier therapeutic actions. The researchissuesin VM (2O lay in modelingthe dynamic environment of the intensive care unit GCU). VM is rule based,using four classesof rules. A "status rule" definesgeneral clinical states(e.g.,"stable hemodynamics").Certain vital signs may imply stable hemodynamicsin one stage of ventilator management but not in another stage. The system generates expectationsof what parameter values (e.g.,blood pressure and pulse rate) should be found in a given clinical context.An exampleof a "transition rule" is one that identifies a return to the ventilator from another device called the "T-piece."This data-directedreasoningis neededbecausephysicians do not always inform VM of what they have done. An example of an "instrument rule" is one that identifies potentially artifactual readings. "Therapy rules" make use of the other three rule classesto recommend therapy. The program monitors patients over time, iteratively analyzing instrument readings, making conclusions,and if appropriate, printing messagesto the physicians caring for the patient. The ATTENDING system is best known for exposition of the critiquing approachto medical advicesystems(16,17).ATTENDING is designed to critique an anesthetist's plan for premedication,induction, intubation, and maintenanceof anesthesia.The system handles risk-benefit trade-offsin medical management through a technique called "heuristic risk analysis." The central data structure in ATTENDING is a hierarchy of augmented transition networks (ATN). These networks consist of terminal and nonterminal arcs (Fig. 4). Terminal arcs can be traversed directly and represent a choice of drug or technique. Traversing a nonterminal arc necessitates dropping into a subnetwork and finding a path through that network before popping up and continuing in the upper network. A therapy plan is a path that starts at the top net-
MEDICALADVICE SYSTEMS
z8[lu'f-il Diseose Coleqorres
a
oPENANGLE Aucoil4A GLAucOil4A
A
22tlittcLosuRE
Ar{cLE ACurEAr{cLE AcurE
,R 1 I l\l\ 1.il
8LX?,ffi,^ ,/,ffi
,/' ll ,/ | ! --a / Ill
C/ossrfrcononLrnks
Po t hophysroloqtco/ Slo tes
/
// ,/conneau I/
Cousol Lnks
EDEMATA
GLAUCOiIATOUS VISUALFIELD LOSS
ELEVATED INTRAOCULAR PRESSIJRE
Obser vot ions
Figure 3. Three-level description ofa disease processin CASNET. Observations are direct evidence about a patient. Pathophysiological states are connectedby causal links. Diseasecategories represent patterns of pathophysiological states. Reproducedwith permission from S. M. Weiss et al., Artif. Intell.LL,148 (1978).
work's start node, traverses the network with varying degrees of descent into subnetworks, and ends at the top network's finish node. Analysis pivots around the physician's proposed plan, heuristically collecting alternative plans that are roughly equivalent or superior to the user's choice.Comparisons are made using a "risk magnitude," which is an aggregate of probabilistic information and information about the utility of possible outcomes. Then, "contextual preference knowlrules" refine these comparisonswith more case-specific edge. Finally, another ATN producesa prose explanation of the analysis. The ONCOCIN system is designedto assist in the treatment of cancer patients (27). ONCOCIN is designedto help managepatients over time, interpreting the current sessionin tight of past information whenever necessary.ONCOCIN's domain knowledge is separated from the control. A central goal of this research effort is that the system be used regularly in a busy clinical environment. This imposes constraints on re-
sponsetime, which ONCOCIN addressesby running two independent processes.One of the processes,the Reasoner,performs most of the inference, and the second process,the Interviewer, controls the data-gathering interaction with the physician. Another constraint imposedby clinical use is that the electronic format of data-recording and display must not retard the physician. To this end, the ONCOCIN project has recently begun to transfer its program to LISP machines that use bit-mapped displays to duplicate the visual appearanceof flowcharts traditionally filled out by physicians. Although MYCIN is often describedas a diagnostic prograffi, its principal motivation was therapy planning. There are several goals to MYCIN's antibiotic therapy task, someof them conflicting with each other. One MYCIN project researcherfound the rule-basedformat a difficult representation to work with when he designeda therapy selectionalgorithm; in Ref. 37 he articulates the motivations and design for a "revised therapy algorithm" that he added to the program.
s90
MEDICALADVICE SYSTEMS
ANES, R E GI O N A T
INTUBATION
MAINTENANCE
INDUCTION
INTUBATI
popa IH IO PENTAL
INTUBATION' KETA M INE
MASKCRIC RAPIOSEQ NORMINT
SUCC INY LCHO L IN E
RAPIDSEQ. P A N C U R OI U NM
RELAXANT
' M AI N T E N A N C E LIGHT
SUCCINYLCHOLINE HALOTHANE
R E L A X A N T, GALLAMINE
ENFLURANE
ETOCU R I N PANCURONIUM
Figure 4. An ATN from ATTENDING. A proposedmanagement plan is traced in boldface.Courtesy ofIEEE, 1983. 'POP = ascent to the upper network.
Systemsin Clinical Use. The CASNET researchat Rutgers led to the first commercial application of AI in medicine, the Serum Protein Diagnostic Program (Helena Laboratories) (15).Two other AIM systemsin clinical use are PUFF (30) and ONCOCIN (27).All three of these systemsare usedby practicing doctors.The design requirements of PUFF and the Serum Protein Diagnostic Program are quite different from that of ONCOCIN, however. Both of those systems acquire the needed information automatically from instruments so that data collection, analysis, and recommendation can proceed without direct interaction with the physician. This is quite different from ONCOCIN, where the physician's hands-on interaction with the computer is a major design consideration.In general, systemsthat will be used interactively face additional design challenges:responsetime must be short, data collection and analysis must be simple and intuitive to the physician, recommendationsmust be backed up with good explanations, and finally system hardware and software must be reliably available. Summary Designing medical advice systems for clinical use has influencedthe evolution of AI during the last decade.The mutually beneficial relationship between protocol analysis and expert systems research are discussedabove, with emphasis on the role of AIM systems as laboratories for experiments in the
representation of causal and taxonomic knowledge,in explanation of reasoning, and in inexact inference.In more detail, example medical advice systems that have made important research contributions to AIM have been examined. The research challenges have not abated, but the future of AI research and applications in medicine promisesto be a fruitful one.
BIBLIOGRAPHY 1. B. G. Buchanan and E. A. Feigenbaum,"DENDRAL and MetaDENDRAL: Their applicationsdimension,"Artif. Intell. 11(1),524 (Le78). 2. P. Szolovits (ed.), Artifi,cial Intelligence in Medicine, Westview, Boulder, CO, 1983. 3. W. J. Clancey and E. H. Shortliffe, (eds.),Readings in Medical Artificial Intelligence, Addison-Wesley,Reading, MA, 1984. 4. J. P. Kassirer and G. A. Gorry, "Clinical problem solving: A behavioral analysis," Ann. Int. Med. 89r 245-255 (1978). 5. A. S. Elstein, L. S. Shulman, and S. A. Sprafka, Medical Problem Soluing:An Analysis of Clinical Reasoning, Harvard University Press,Cambridge,MA, 1978. 6. Reference3, Chapter 12. 7. Reference2, Chapter 2. 8. Reference3, Chapter 7.
MEMORYORGANIZATIONPACKETS g. G. F. Cooper, NESTOR: A Computer-BasedMedical Diagnostic Ph'D' Aid that integrates Causal and Probabilistic Knowledge, 1984' CA, Stanford, University, Dissertation, Stanford 10. Reference2, ChaPter 6' 11. Reference3, ChaPter 4. 12. Reference3, ChaPter 13. 13. Reference3, ChaPter 9. L4. Reference3, ChaPter 8. 15. Reference3, ChaPter 20. ,,ATTENDING: Critiquing a physician's manage16. p. L. Miller, ment plan," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-5(5)' 449-461 (1983). 17. P. L. Miller, A Critiquing Approach to Expert computer Aduice: ATTENDING, Pittman, London/Boston, 1984' 18. B. G. Buchanan and E. H. Shortliffe, (eds'),Rule-BasedExpert Systems,Addison-Wesley,Reading,MA, Chapter 11, 1984. 18, ChaPter 9. Reference 19. ChaPter 18. 3, Reference 20.
591
and constantly being changed and created through learning of terms in memory generalization. Episodes are indexed in earlier from generalized been fnowledge structures that have examples.Episodesare always anaryzedat a number of levels gensimulturr"or*Iy. This allows information to be stored and abstract and eralizations to ue made at a variety of concrete levels during the processingof a single example. when a new episodeis ,rld.rclood, information is collected from all relevant knowledge structures and applied to the new example. Scripts
Historically, MOPs developedout of the understanding theory proposedin Ref. B, specifically from the knowledge structure known as a script (see scripts). scripts were designed to be used to explain events comprisedof stereotypical sequencesof actions such as visits to restaurants and doctor visits. Although scripts were successfully used in several languageunderstanding prograffis, they did have certain probleffis, 18. ChaPter 18, .Reference 21. particularly when used for memory and learning. ZZ. G. A. Gorry, H. Silverman, and S. G. Pauker, "Capturing clinical The main problem in using scripts as defined in Ref. 3 for expertise:a computer program that considersclinical responsesto memory and learning is that they are too large and monodigitalis," Anl.. J. Med. 64, 452-460 (1978)' Iithic. Several psychologicalexperiments,€.g.,Ref. 4, showed 23. Reference3, ChaPter 16. that peoplewould confuseevents that occurredin similar local 24. D. W. Hasling, W. J. Clanc€Y,and G. D. Rennels,"strategic expla- settings even if in different scripts. So, for example, a subject nations for a diagnostic consulting system," Int. J. Man-Mach' who read about an action that took place in a waiting room Stud. Z0, B-19 (1984). during a dentist visit might recall it as having taken place in a 25. Reference3, Chapter 15. story about a visit to a doctor. In addition, Iearning in situa26. Reference18, Chaptet 22. tions involving different scripts (such as different kinds of 2 7 . Reference18, Chapter 35. waiting rooms) would be difficult. 28. D. H. Hickam, E. H. Shortliffe, M. B. Bischoff, A. C. Scott, and The solution to these problems was to develop a system C. D. Jacobs,"A study of the treatment advice of a computer-based made up of a number of much smaller structures. Each struccancer chemotherapy protocol advisor," Ann. Int. Med. 103, 928ture describes a small chunk of information about events. 936 (Dec.1985). These chunks can be used by a variety of higher level struc29. Reference18, Chapter 31. tures, providing flexibility in memory organization and 30. Reference3, Chapter 19. learning. 31. Reference3, Chapter 30. 32. Reference18, Chapter 26. 33. Reference2, Chapter 3. 34. Reference18, Chapter 28.
Scenes
The basic unit of memory in the MOP theory is the scene.A sceneconsistsof actions that occur over a short period of time 35. Reference18, Chapter 15. in service of a specific goal. MOPs organize scenes.In Ref. 2 36. Reference2, Chapter 5. Schank divides scenesinto three basic classes:physical, soci37. Reference18, Chapter 6. etal, and personal. Physical scenesdescribe events that take place at a single location. Societal scenesare tied together by a G. RnNNELSand E. SuoRTLIFFE social relationship between people.Personal scenesare unified Stanford University by idiosyncratic goals unlikely to be shared by many people. MOPs can be broken down into the same three categories.In understanding, these classeslead to the questionsabout what happened physically, what happened socially, and what hapMEMORYORGANIZATIONPACKETS penedto the participants. Note that all types of memory struc(MOP) represenof unit is packet a tures are idiosyncratic-not just personal scenesand MOPs. A memory organi zation tation and memory organization in a theory proposedby Roger Physical and societat MOPs and scenesdescribe an idiosyncratic view that a person assumesto be shared by other people. Schank of Yale University to explain the way episodic inforto Most events that take place are, of course,not simply isomation is stored in human memory. The term is also used around built memory organizatlon lated scenes.Scenesoccur together in commonpatterns. This of theory refer to the entire this unit. The theory was initially introduced in Ref. 1 and has information is captured by MOP memory structures. Each been most fully explicated in Ref. 2. A number of computer MOP is a stereotypical sequenceof scenesor other MOPs tied programs that make use of MOPs have been developed to together physically, societally,or by a personalgoal. An event test the theory, primarily at the YaIe Artificial Intelligence is usually understood in terms of three or more MOPs-at least one at each of the physical, societal,and personallevels. Project. The basic idea behind the MOP theory is that representa- In understanding, for each MOP found to be relevant, the varitions of information are dynamic-knowledge structures are ous scenescan be collectedand used much as a script would be
592
MEMORYORGANIZATIONPACKETS Physical
Societal
Personal
M.GROCERY-SHOP
M.PURCHASE
M-MAKE.(MY).DINNER
Get-cart Examine-fruit Check-out
Make-selection Determine-availability Provide-payment
Preheat-oven Get-TV-dinner Eat-and-watch-TV
Figure 1. Examples of different MOP types with some component scenes.
in the theory of Ref. 3. The use of the samescenesin a number of MOPs increasesgenerally and the ability to learn. Figure 1 lists a typical MOP of each class,along with some component scenes.The physical MOP contains concrete, if stereotypical, information about a trip to a grocery store; the societal MOP involves social conventionsabout making a purchase; and the personal MoP is a very idiosyncratic one involving the preparation of a typical dinner for a specific person. To illustrate how MOPs are applied, the method that a person (or program) might make use of some of the structures in Figure 1 to understand a story (seeStory analysis) about a person going to a grocery store and coming home to make a TV dinner will be briefly considered.Understanding such a story would involve all three MOPs in Figure 1. The physical MOP would be used to understand, for example,why the patron took a shopping cart on the way into the store. The societal MOP might be used to understand why a patron who did not have enough cash was able to write a check. Such processingcould occur even if the understander only knew about checks in other situations. The personal MOP might be crucial to understanding if the patron turned on the oven before leaving for the store. The overall flow of understanding is to collect the various relevant physical, societal,and personal scenesand match incoming events against them. This provides the explanatory part of understanding. In the example above,an action in the story involving the patron at the checkout with a TV dinner will be understood in physical terms-the checkout sceneis being carried out; societally-a payment is taking place; and in terms of the patron's personal goal of having a TV dinner to watch television by, which is being achieved. Crucial to the idea of MOPs is that understanding must include learning (qv), where the understander examines the caseswhere input was not adequately explained and determines how the relevant scenesor MOPs should be modified to enable later understanding to take place. Depending on whether earlier expectation violations were similar, the generaltzations underlying one or more scenesor MOPs might be changed or this violation will be indexed so that it can be found if there are similar violations in the future. Developing an algorithm for determining exactly which structures should be modified is one of the most difficult aspectsof implementing a MOP-based computer system. In people, accurately determining which structures should be modified seems to be an important component of intelligence. The first two computer experiments that made use of MOPs were CYRUS (5), developedby Kolodner, and IPP (6), developed by Lebowitz. CYRUS used MOPs to store detailed descriptions of episodesabout a single individual. Due to the rich nature of the descriptions and the generalizations made from them, CYRUS was able to answer a variety of questionsabout its memory. CYRUS was also used to study the reconstructive nature of memory retrieval and question answering (qv). IPP used a MOP-type memory structure to organize information
taken from news stories about international terrorism. The generaltzation-basedmemory created from the articles was useful both in studying the cognitive process of organizing information and as a prototype intelligent information system. The use of dynamic memory structures in text understanding was also a major part of the development of IPP. MOPs have subsequentlybeen used in a number of other computer experiments. MOPs have been used to assist language understanding, in Lebowitz's intelligent information system,RESEARCHER (7), and Lytinen's translation system, MOPTRANS (8). They have also been used in a number of problem-solvingsystemssuch as Kolodner'sMEDIcase-based ATOR (conflict mediation) (9), Bain's JUDGE (criminal sentences)(10),and Hammond'sWOK (cooking)(11).All of these systemsare describedin Ref. 12.
BIBLIOGRAPHY 1. R. C. Schank,"Languageand memoryi'Cog.Sci. 4(3),243-284 (1e80). 2. R. C. Schank,DynamicMemory:A Theoryof Remindingand Learning in Computersand People,Cambridge University Press, New York, L982. 3. R. C. Schank and R. P. Abelson, Scripts,Plans, Goalsand Understanding, Lawrence Erlbaum, Hillsdale, NJ, L977. 4. G. H. Bower, J. B. Black, and T. J. Turner, "scripts in text comprehensionand memoryi' Cog.Psychol. ll, L77-220 (1979). 5. J. L. Kolodner, Retrieual and Organizational Strategiesin Conceptual Memory: A Computer Model, Lawrence Erlbaum, Hillsdale, NJ, 1994. 6. M. Lebowitz, "Generaltzatton from natural language text," Cog. Sci. 7(l), 1-40 (1983). 7. M. Lebowrtz, RESEARCHER: An Experimental Intelligent Information System, Proceedingsof the Ninth International Joint Conferenceon Artificial Intelligence, Los Angeles, 1985, pp. 858-862. 8. S. L. Lytinen, Frame Selection in Parsing, Proceedings of the Fourth National Conferenceon Artificial Intelligence,Austin, TX, 1984,pp.222-225. 9. J. L. Kolodner, R. L. Simpson,and K. Sycara-Cyranski,A Process Model of Case-BasedReasoningin Problem Solving, Proceedings of the Ninth International Joint Conferenceon Artificial Intelligence,Los Angeles, 1985, pp. 284-290. 10. W. M. Bain, Assignment of Responsibilityin Ethical Judgments, in J. L. Kolodner and C. K. Riesbeck(ed.),Memory, Experience and Reasoning,LawrenceErlbaum, Hillsdale, NJ, 1985,pp. L27138. 11. K. J. Hammond, Planning and Goal Interaction: The Use of Past Solutions in Present Situations, Proceedingsof the Third National Conferenceon Artificial Intelligence, Washington, DC, 1983, pp. 148-151. L2. J. L. Kolodner and C. K. Riesbeck(eds.),Memory,Experienceand Reasoning,Lawrence Erlbaum, Hillsdale, NJ, 1986. M. Lenowrrz Columbia University
MEMORY,SEMANTIC
MEMORY,SEMANTIC The term semantic mernory gained currency in AI with the publication of Quillian's memory models in the late sixties (1,2).The term remains associatedprimarily with Quillian's models and their direct descendants,whereas the broader term semantic networks (qv) (or nets) is preferred for the full range of networklike memory models and knowledge-representation (qv) formalisms. (The terms associatiuenLernory(qv) or ossociatiue networks are also used.) In cognitive psychology (qv), semantic memory is often distinguished from episodic memory (qv), where the former serves as a long-term store of knowledge needed for language understanding (see Naturallanguage understanding) and the latter servesas a long-term store of information about specific episodesand events, especially personal experiences(seeRef. 3 for the original formulation of this distinction). This distinction has been less important in AI research on language understanding and question answering (qt), where both kinds of knowledge are usually represented in a formally uniform, structurally integrated fashion (however,seeRef. 4 for a sophisticatedsystem incorporating the distinction). Quillian'sModels Quillian's models were foreshadowedin networklike representations of sentencemeanings developedin the late fifties and early sixties by researchers in Mechanical Translation (see Machine translation) (e.g., M. Masterman of the Cambridge Language ResearchUnit and S. Ceccatoof Milan University). However, this work lacked many of the most important features of Quillian's approach,perhaps most crucially his emphasis on the role of a large body of associatively interconnected knowledge about language and the world in language understanding. In essence,Quillian's models consistedof nodesrepresenting word sensesor their properties and links interconnecting these nodes and providing associativepathways for processes presumed to underlie language comprehension. In his first model the nodesand links were layered into planes (1). Each plane was headed by a type node for a word sense and contained a set of token nodes accessiblefrom the type node through a series of links. The first such token node supplied the superclassof the word sense,whereas the remaining token nodes supplied its additional properties (where a property could be specified by any number of token nodes interconnected by links of certain fixed types). For example, the plane for PLANT1 (a living plant) specifiedA STRUCTURE as its superclassand included LIVE, not ANIMAL, WITH3 LEAF, and GETS FOOD FROMB AIR OR WATER OR EARTH as additional properties.All token nodesin a plane were linked by interplane links to the type nodesthey referenced(such as LIVE, ANIMAL, WITH3, etc.). As a first step toward simulating the human language-understanding process,Quillian constructed programs for comparing and contrasting pairs of words, such as cry and comfort, plant and liue, or planl and man. The programs producedsimple verbalizations of the relationships they discovered, expressedin terms of disambiguated sensesof the given words. They relied on intersection searches(seeSearch,bidirectional) to accomplishtheir task: beginning at the word sensenodesof the given words, they propagated "tags" outward along links in pseudoparallelfashion, keeping track of the paths traversed by the tags. Nodes at which the "spheres of spreading activa-
593
tion" intersected were noted, and the paths to them were used to generate the descriptions of similarities and contrasts. For example, comparison of plant and man led to intersections at the ANIMAL and PERSON nodes. The paths to ANIMAL al' Iowed generation of the contrasting sentences "PLANTI IS NOT A ANIMAL STRUCTURE" and "MANI IS ANIMAL," and the paths to PERSON allowed generation of the compari' son sentences "TO PLANTS IS FOR A PERSON TO PUT SOMETHING INTO EARTH" and "MANB IS PERSON." Quillian suggested that intersection searches underlie the elimination of lexical ambiguity in language understanding. He also noted the importance of memory pathways for inference (qv). In particular, a path connecting successivelyhigher level concepts can mediate property inheritance; i.e., lower level conceptslying on such a path inherit the properties ofthe higher level concepts (see Inheritance hierarchy). In a follow-up project, called the Teachable Language Comprehender (TLC), Quillian slightly revised his memory model, replacing type nodes by "units" and token nodes by "properties" (2). Units provided explicit slots for a superset pointer and pointers to refining properties. Properties in turn provided slots for a pointer to an attribute (somepredicative concept),a pointer to an attribute value (some particular concept), and possibly further "subproperties" augmenting the attributevalue specification. The goal ofthe TLC project was to expand the knowledge stored in semantic memory by having the system read text. Its successwas limited by its nearly exclusive reliance on intersection searches,with little syntactic analysis or inference, and no pragmatic analysis. Direct Extensions In a series ofpapers during 1969-1972, Collins and Quillian clarified the notion of a property inheritance hierarchy, the semantic distance (number of intervening links) between two concepts, and the relevance of these notions to the theory of human memory, as indicated by reaction time studies (5'6). Collins and Loftus refined Quillian's ideas about intersection searches(spreading activation) and evaluated their psychological import in detail (7). Carbonell implemented a CAI program called SCHOLAR (specializing in the geography of South America) around a TLClike semantic memory (8). The memory used a refined measure of semantic distance, with "irrelevancy tags" serving to increase this distance where appropriate. Anotherpotable extension was McCalla and Sampson's MUSE (9), in which a nontrivial syntactic component was used to improve both the range of input sentences the system could convert into TLC memory format and the quality ofoutput sentences. Further Developments:SemanticNetworks Quillian-like memory models and knowledge-representation (qv) formalisms evolved along several dimensions from 1970 onward. First, their expressive power was augmented to permit representation of episodic information, information about the knowledge, beliefs, etc., of other agents (see Belief systems), and logically compounded and quantified information. Second, ideas about the structure of knowledge at levels "above" the logical level were incorporated into them, including nested subnets or partitions, framelike or schemelike structures, and taxonomies ofparts and topics (in addition to the original taxonomies of concepts). Third, they were augmented with knowledge in the form of procedures directly as-
594
MENU-BASED NATURATTANGUAGE
sociated with stored concepts. And finally, possible ways of implementing semantic-memory models as active parallel networks were studied, with a view toward building practical intelligent systems or advancing the theory of human memory. The first three lines of development are covered in several collections of articles (10 -12) and surveys (18-15), and the last in a collection by Hinton and Anderson (16).
BIBLIOGRAPHY 1. M. R. Quillian, Semantic Memory, Report AD-6 4LG71,Clearirrghouse for Federal Scientific and Technical Information, 1966. Abridged version in M. Minsky (ed.), Semantic Information pro_ cessing,MIT Press, Cambridg", MA, 196g, Chapter 4. 2. M. R. Quillian, "The TeachableLanguage comprehender 1ACM ,,, 12,459_475(1969).
NLMenu grew out of research on building conventional natural-language interfaces-the kind where users are invited to type whatever questions they have and the naturallanguage-understanding system will do its best to decipher what the user means. However, the performanceand usabllity of conventional natural-language systemsis limited. NLMenu is an attempt to overcome these limitations. NLMenu also providessomeopportunitiesthat are not possiblewith conventional natural-language systems. Problemwith ConventionalNatural-Language Systems
A conventional natural-language system is one in which the user is presentedwith a blinking cursor and the opportunity to type in whatever question he has. It is then the natural-language system's problem to understand what the user wants and respond appropriately. A number of problems with this 3. E. Tulvirg, in E. Tulving and W. Donaldson(eds.),Organizationof approach are described(4). Discussionof these problemshelps Memory, Academic Press,New York, L972. to clarify the benefits of NLMenu. 4. w. G. Lehnert, M. G. Dyer, p. N. Johnson, c. y. yang, and s. First, there are mechanical problems. Many users do not Harley, "BORIS-an experiment in in-depth understanding of know how to type or do not type well. Users often have considnarratives," Artifi.cial Intelligence 20, 1b-62, 1gg3. 5. A. M. Collins and M. R. Quillian, Experimentson SemanticMem- erable difficulties with spelling, which can causeproblems for ory and Language Comprehension,in L. W. Gregg (ed.), Cognition language-understandingsystems. Finally, users often have trouble getting started. They can find it difficult to articulate in Learning and Memory, wiley, New york, pp. 1L7-L87, Lg72. what they want to say despitehaving very explicit problemsto 6. A. M. Collins and M. R. Quillian, How to make a languageuser,in E. Tulving and W. Donaldson(eds.),Organizationof MemoU,Aca_ solve. Next, there are problems with understanding language. It demic Press,New York, pp. 809-Bb1, Lg7Z. is not uncommonto ask a question in a way that conventional 7. A. M. Collins and E. F. Loftus, "A spreadingactivation theory of systems do not understand; but if properly rephrased,these semantic processing,"Psychol.Reu. gz, 407-429 (lg7b). questionscan be understood.This is called exceedingthe lin8. J. R. Carbonell, "AI in CAI: An artificial intelligence approachto guistic coverageof the system. With lots of hard work, system computer-aided instruction," IEEE Trans. Man-Mach. svs. MMS-I 1, 190-202 (1970). developersmight anticipate every possible synonym, para9. G. I. McCalla and J. R. Sampson,"MIJSE: A model to understand phrase, metaphor, or point of view and prepare the naturallanguage system for them all. Thus, with enough hard work, simple English," CACM lE, 29-40 (L972). 10. E. Tulving and W. Donaldson(eds.),Organizationof Memor!,Aca_ the problem of linguistic coveragecould be effectively eliminated. Notice, however, that this could be difficult-imagine demic Press,New York, L972. 11. D. G. Bobrow and A. Collins, .Representation and (Jnd,erstand,ing, providing all possiblesynonymsfor all of the databasevalues and keeping them current with a dynamically changing dataAcademic Press,New York, Lg7E. base. (ed.), 12. N. V. Findler AssociatiueNetworks, Academic press, New A problem related to exceedingthe linguistic coverageis York, L979. exceedingthe conceptualcoverageof the system.If one were to 13. A. Barr and E. A. Feigenbaum,Handbook ofA.I, Vol. B, william ask "How many trucks did we ship in January?" he might be Kaufmann, Los Altos, CA, pp. B6-G4, I}BZ. told that the system did not understand the query. He may 14. R. J. Brachman, On the EpistemologicalStatus of SemanticNetworks, in N. V. Findler (ed.), AssociatiueNetworks, Academic assumethat he had exceededthe linguistic coverageand rephrase, "How many January truck shipments did we have?" Press,New York, pp. 3-50, 1g7g. 15. G. D. Ritchie and F. K. Hanna, "semantic networks: a general He might again be told to rephrase,and this could go on until he ran out of patience.The problem could be that the system definition and a surveyi' Inf. Technol. Res.Deu.2, 1988. 16. G. E. Hinton and J. A. Anderson(eds.),Parallet Modetsof Associa- doesnot know about truck shipments.If so,the questionshave exceededthe conceptualcoverageof the system. tiue MeffioU, Lawrence Erlbaum, Hillsdale, NJ, 1981. The limits of coverage,both linguistic and conceptual,are difficult for users to infer. They tend not to learn quickly what L. K. ScHuspnr is acceptable and what is not. Part of the problem is that University of Alberta natural-langu agesystemsfail in very different ways from human understanding so the strategies for making oneself unMENU.BASED NATURALLANGUAGE derstoodin person-to-personconversationsdo not apply to person-to-computerconversations. Menu-basednatural-langu ageunderstanding (NLMenu) is an The last major set of problems relates to the implementaapproach to natural-language interfaces (qv) that combines tion of natural-language systems.Conventional natural-lanthe expressivepower of natural language with the easeof use guagesystemstend to be quite large. Indeed,they must anticiof menus (1-3). The NLMenu approachis unique in that I00Vo pate every likely synonym and paraphrase of questions from of the queries entered through NLMenu will be understood,it users. If they are to provide accessto large databases,they provides a convenient way of combining textual and graphical must at least be large enough to accept the database values input, and its simplicity allows interfaces to many applica- and synonyms for those values. Large natural-language systions to be automatically generated. tems require computers with large memories.
Y?il|drrJ
Fnrtd x rhiprnts the lest lftnrGtr <spccifb vcndor) <spcclfrc prrt) <spccific sniPnt>
Figure 1 NLMenu Interfaee - SPO with
FFtr ;FF *'hh wtich
Rubout SaveQuerg
Re-start 0 u t p u t l li n d o r eatures
of
S h o uQ u e r g R e t ri e v e Q u e r g
parts
Figure 2
Execute D eI e t e Q u e r g
prrt
color
D.rt nfiE prr"t prr# prrt weipt rre sold btt
Suspend
Figure 3
p*;; rt**
----1.**----
|il*t
f;il------l
________l----------l------1----------l i ____i_ I Pr2l ip-si
Bo-rl c*ti
cREElf l a-trl
!?l r2l
tlllgl Pmlsl
Figure 4
LANGUAGE NATURAL MENU.BASED
597
with language understanding itself. As noted above, all queries entered through the NLMenu interface are accepted-the NLMenu solves the problems with conventional natural-Ian- user gets no opportunity to composea question that would not guage systemsoutlined above.In this sectionNLMenu is illus- be accepted.As a result, the problem of linguistic coverage lratea. in the last sectionthe solutions to the various problems disappears.Similarly, one cannot exceedthe conceptualcoverof natural-language systems are discussed, as well as the age of the system-that problem disappearsas well. Notice unique advantages of NLMenu. that the problem of exceeding the coverage has disappeared Users build questionswith NLMenu by selectingwords and not becauseof the massive work of finding all possibleparaphrases from menus. Figure 1 shows the processin progress. phrases but because of elimination of the need for paraT'ft. user has selected "find" and "all features of" from two phrases. successivemenus and is about to select"parts," the boxedword NLMenu interfaces require less memory and processing in the white-background menu. Notice that the sentencehe is than do conventional natural-langu age systems.They do not buitding appearsin the window near the middle of the screen. need to sift through large grammars and dictionaries to anaWhen he selects"parts," severalthings will happen.First, the ryze sentences.There are also ways of expressing database sentenceunder construction will be updated to "Find all fea- queries in such a way that the interfaces can be generated tures of parts." Second,other menus will becomeactive (indi- automatically from a description of the database (5). In fact, cated *ittt white backgrounds). Third, the contents of those the example interface for this entry was automatically gennewly activated menus will be restricted to only those phrases erated. that make sense following "Find all features of parts." This As discussedabove, NLMenu has many advantages over last point is important becausethis is why all sentencesen- conventional natural-Ianguage systems. It has the same extered into NLMenu will be understoodby NLMenu. For exam- pressive power as conventional systemsbut solvesthe biggest ple, the large menu at the right of the screen will be made proble*r thut natural-Ianguage systemshave. It also provides active, and such phrases aS "who ship" and "who supply," opportunities such as mixing textual and graphical input and which do not make sensefollowing "Find all features of," will automatically generating new interfaces from a description of not appear, &s shown in Figurc 2. an application. In Figure 3 the user is about to select"(name)," indicating One question that is frequently asked is whether NLMenu that he will specify specific part names, which are specific understands language. There are two answers. If convendatabase values. One advantage of the NLMenu approach is tional natural-Ianguage systems understand language, then that the system always knows when the user intends to enter NLMenu must also. Behind the menus it usesthe same techspecifi.cvalues and so can provide assistance in expressing nolory as they do, representing and translating questions in those values. The assistancecan be as simple as a menu of all the same way that they do. Behind the menus one cannot tell relevant values (in this caseall part names).Another applica- the difference between conventional and NLMenu interfaces. tion could present an opportunity for graphical input such as a Assuming that conventional systems understand language, map. The user can enter latitude and longitude values by the answer is yes. The other answer to the question is "Who pointing at the area of interest on the map. In this w&Y, cares?" This is a technology, and the appropriate forum for NLMenu provides a convenient way to combinethe expressive evaluating technology is in solving problems.If it provides a power of natural language with the easeof expressingspatial flexible, mnemonic, and powerful interface, what difference relationships with graphical input. doesit make if it is declaredthat it doesor doesnot understand Figure 4 shows the completed query "Find all features of language? parts with part name cam or bolt." The data satisfying this command are shown at the bottom of the screen.The user still has an opportunity to further restrict his query by selecting BIBLIOGRAPHY
lnterfaces Natural-Language Menu-Based
"and" or "or" and adding other clauses. of NLMenu Performance NLMenu interfaces provide the same expressivepower as conventional natural-langu agesystems,but the problems of conventional systems are largely eliminated. First, the mechanical problems: typing, spellitg, and articulating questions. With an NLMenu interface there is no typing. Sentencesare built through menu selection.With an NLMenu interface, the user is presented with words and phrases from which he sees what types of questionscan be asked.Instead of composinga question, one can think of it as recognizinghis question-an easier task-one phrase at a time. In addition, if the system provides useful but unexpectedcapabilities, such as a graphing option for numerical data, the existenceof those capabilities is revealed through the menus. In fact, the full extent of the coverage of an NLMenu system is revealed through the menus. With a conventional syst€ffi, the user must guessthe coverageof the system or find it through trial and error. The most dramatic advantage of the NLMenu interface is
1. H. R. Tennant et al., Menu-BasedNatural Language Understandirg, Proceedings of the Conferenceof the Association for Computational Linguistics, Cambridg", MA, pp. 151-158, 1983. 2. H. R. Tennant, K. M. Ross,and C. W. Thompson,Usable Natural Language Interfaces through Menu-Based Natural Language Understanding," Proceedings of the Conferenceon Human Factors in Computing Systems,Cambridge, MA, 1983. 3. C. W. Thompson, Using Menu-Based Natural Language Understanding to Avoid Problems Associated with Traditional Natural Language Interfaces of Databases,Ph.D. Dissertation, Department of Computer Science,University of Texas at Austin, 1984. Ph.D. 4. H. R. Tennant, Evaluation of Natural LanguageProcessors, Dissertation, Department of Computer Science,University of Illinois, 1980. 5. C. W. Thompsonet al., Building Usable Menu-BasedNatural Language Interfaces to Databases, Proceedings of the Ninth International Conferenceon Very Large Databases,Florence,Italy, pp. 4345, 1983. H. TnrvNANr Texas Instruments
598
MERLIN
MERTIN A system that implemented Newell's data-flow graphs for heuristic search (see Heuristics) encodedas schemas,MERLIN was developedaround 1971by Moore at Carnegie-MellonUniversity. It represented Newell's Logic Theorist prograffi, with the generators and tests derived by hand. MERLIN could prove theorems (see Theorem Proving) by executing the schema (seeJ. Moore and A. Newell, How can MERLIN Understand?, in Gregg L. (ed.), Knowledge and Cognition, Erlbaum Associates,Hillsdale, NJ, pp. 253-285, L974). K. S. Anone SUNY at Buffalo
LES, META-KNOWLEDGE, META-RU AND META.REASONING AI research involves building computer systems capable of reasoning (qv) and acting in a variety of environments. For example, these computer systems, or cognitive agents as they are sometimes called, should be capable of talking with other cognitive agents, advising people in complex tasks, and interacting with the world by perceiving situations and carrying out actions. The nature of knowledge is crucial for this research. When building these systems, one must think in terms of what they have to know to perform these tasks. Similarly, one must analyze the performance of cognitive agents in terms of their knowledge. Thus, a system knows about the objects in its domain of application, about how to perform a certain activity, or about the events that take place during that activity. Research on knowledge representation (qv) in AI concerns the search for models of knowledge that will enable systems to behave intelligently. A particular representation for knowledge is a combination of data structures and procedures that, if represented and used adequately in a program, will lead to intelligent behavior. The knowledge contained in an intelligent system is, for the most part, embodied in these data structures, generally called the knowledge base, and represents the propositions that the system knows or believes. Some of the propositions are represented explicitly, whereas others can be derived from those by applying inference rules. The process of deriving new propositions is done by the inference system (see Inference), either when new information is added to the knowledge base (forward inference) or when a query is posed to the knowledge base (backward inference). Forward inference enables the cognitive agent to make new deductions with information perceived from the world, and backward inference enables the cognitive agent to find out answers to its problems (see Processing, bottom-up and top-down). Although the inference system allows the cognitive agent to perform reasoning, it does not allow it to act in the world. To do this, the cognitive agent needs an acting system that executes actions. Since most actions are not trivial, they must be planned first; thus, the cognitive agent needs, in addition, a planning system that derives appropriate plans to be given to the acting system. The issues involved here are the subject of research in planning (qv), another field of AI. The problem in trying to formulate a plan of action to achieve some goal comes from the multiple interactions that can exist between the sub-
actions that constitute the plan and from the fact that the agent may not have enough information to formulate the plan. It is often necessary to reason about what knowledge is needed to carry out a plan and how that knowledge can be obtained. A typical action that these cognitive agents may need to perform is to conduct a dialogue with other cognitive agents. This action, besides the problems common to other actions, has problems of its own. They are the subject of research in natural-language understanding (qv), another field of AI. One of these problems is that a cognitive agent engaging in a dialogue has to take into account the knowledge possessedby the other cognitive agent. This usually requires having a model of that agent's knowledge and reasoning about what that agent knows. Thus, the main question to be answered by these fields of AI is: What kinds of data structures and procedures must the agent know about and how should they be used by the agent in order to make it behave intelligently? Research in these fields has led very soon to the conclusion that, among other things, a cognitive agent must know about objects, states, and actions. In addition, it is now strongly believed that knowledge about the extent and organizatron of its own and others' beliefs, about how to use its own reasoning rules, about how to perform an action, and about its own and others' performance are important aspects of intelligent behavior. Several researchers have suggested the use of metaknowledge, meta-rules, and meta-reasoning to accomplish the integration of all these features in a single cognitive agent. In a general sense, meta-knowledge is knowledge about knowledge as opposed to knowledge about "things in the word" (1). It enables a reasoning system to "know what it knows" and to make multiple use of its knowledge (2). In addition to using its knowledge directly, the system may have other abilities: knowing what it knows and what it does not know (consciousness) (1-f0); knowing where and how to use knowledg" to infer other knowledge (planning reasoning or meta-reasoning) (5,6,9- t4); knowing where and how to use knowledge to perform actions (planning acting) (9,10,15-f9); explaining how and why it used its knowledge (explanation (5,6,9,10,11); and examining its own knowledg", modifying it, abstracting and generalizing it, and acquiring new knowledge (learning) (2,4-6).
Foundations in Human Cognition The idea of incorporating meta-knowledge in knowledge-based systems has its foundations in human cognition. Although simulation of human cognition by a machine is not needed or even desired, AI researchers continue to search for answers in human cognition. This is reasonable for two reasons. If one considers that the goal of AI is to make one better understand human cognition, one must test the theories developed in psychology through the use of computer models. If, on the other end, one considers that the goal of AI is to develop machines to help humans in activities requiring intelligence, those rnachines must reason and act like humans so that they can interact smoothly. Bahr (1) describes some studies of human behavior that demonstrate people's ability to reason about what they know and about how they reason, suggesting that meta-Ievel knowledge and reasoning are an integral part of common cognitive activity in human experience. In his words:
AND META.REASONING 599 META-RULES META.KNOWLEDCE, the concept of meta-leuel knowledge captures intrinsic, corn' mon-plaie properties of human cognition that are central to an und.erstand,ing of knowledge and intelligence.
tion was to give these systems the capability of reasoning about knowledge that was not available previously. Each of these motivations is discussedbelow in greater detail.
Acquisition and Maintenanceof Knowledge.The development of expert systems,i.e., programs that are skillful in a specific domain of application, emphasizedthe importance of Iarge stores of domain-specificknowledge as a basis for high p"tfot*ance (20). Assembling and modifying the required knowledge base is a complexprocessthat involves great expertise and careful maintenance. This is usually an ongoing task that often extends over several years and, due to the high dependency of related facts and rules, is often error prone. A key element of this processis the transfer of expertise from a human expert to the program. Due to the expert's lack of knowledge about programming, this usually requires the mediation of a human programmer, called the knowledge engineer. However, this transfer of knowledge through the knowledge engineer has some problems. First, the knowledge engineer is not an expert in the specificdomain of application. Second,since most of the expert knowledge is heuristic and experimental, the expert is not capableof conveying it directly to the knowledge engineer. The processusually extends over many sessionsin which the knowledge engineer struggles to extract the knowledge from the expert. This suggeststhat the expert should be able to interact directly with the program. Of course,the program has then to supply the same kind of assistance the knowledge engineer would provide and if possiblein a more efficient and flexible way. Davis and Buchanan (2,4,5) suggestedthe use of metaknowledge to enable the system to provide this kind of assistance. Management of knowledge presents a real problem sincethe internal organization of the data structures and their interrelationships with other data structures are very complex. It is diffieult for the expert to keep all these in memory, Lspecially when they are constantly changing, as occurs during the initial phasesof development where the refinement of successiveprototypes takes place. A secondproblem is that studied this phenomenon and pointed out some conditions that documentation is usually not well organized and updated, and of importance relative increase certainty in these beliefs: the consequently, changing the system is not a trivial task. Anthe is, That area. topic in the expertise own the fact and one's other problem is that since the expert doesnot know about all more important the fact is and the more one's own expertise in knowledge stored in the knowledge base, it is not easy for the the topic area, the more certain one is of something not being him to discoverwhat knowledge should be addedto the system true if not remembered. its performance. As the size of the domain of speincrease to people that evidence strong constitute phenomena These cific knowledge increases, maintenance becomesa more and have an intuitive knowledge of the extent and importance of complextask. Systemsthat allow the explicit declaration more their own knowledge. The concept of meta-knowledge captures data structures in their representation schemes, meta-level of metathat it seems this property of human cognition, and that allow the encodingof data structures that formalisms i.e., sysreasonittg AI of behavior the improve knowledge could "describe"other data structures, will possiblybe a solution for tems. this problem (2,4-6).The system can then assist and advise the user in modifying its knowledge and can even provide Motivations expectations concerning what knowledge should be acquired next. There were also several problems faced by AI knowledge-
several cognitive phenomena illustrate the importance of meta-krrowfedg" uttd meta-reasoning in human experience. For example, the tip-of-the-tongue phenomenon suggests that one has knowledge about our knowledge. This phenomenon happens when on" knows that one knows some fact even though one cannot recall it. Another phenomenon that is common in human cognition is the knowing-not phenomenon studied by Kolers and Palef (7). It is illustrated when one knows rapidly and reliably that one does not know something. Kolers and palef collected data that suggest people know what they do not know without having to search their positive knowledge and that negative knowledge is accessed as directly as positive knowleag. and sometimes even more rapidly. This phenomenon is not easily captured by common searching models of memory, where negative judgments are made only as a result of a search of positive instances that end in failure. The fact that some negative knowledge can be accessed more rapidly than some positive knowledge suggests that not even parallel processing can accommodate this fact. Another interesting ph"no-enon is what one can conclude from the fact that one do"r not know something. Bahr (1) views this phenomenon as directly related to meta-knowledge and the knowing-not phenomenon since such reasoning presumes some awareness of not knowing some fact. In the lack-of-knowledge inference the fact that one would know some fact if it were true, but one does not remember it, makes one believe that it is not true. For example, if one were asked if the President had died one month &go, in normal circumstances it would not be a reasonable answer to say "I don't know." Although one could not find either a positive or a negative answer, the fact that the death of the President is such an important fact, if it had occurred, one should have known about it. Therefore, since one does not know about it, one can conclude that he did not die. Collins (3)
based systems, namely expert systems (qv), that motivated the use of meta-knowledge and meta-reasoning in those systems. One problem was how to do acquisition and maintenance of knowledge. Other problems concerned the reasoning process. One problem was how to control or plan the reasoning process in those systems. Another problem was how could they explain their reasoning behavior in an intelligible manner. These problems with the reasoning process apply to any kind of activity, not just to reasoning. Finally, a more general motiva-
Planningthe ReasoningProcess.A secondmotivation for using meta-knowledgein knowledge-basedsystemsis to control or plan the processof reasoning (5,6,8,11-14).At each cycle of the reasoning process,the system must reason about how to reason,i.e., must do meta-reasoning.At a certain point, adding more object-level knowledge to the system will no longer improve performance. What is neededis some knowledge about how to use the object-levelknowledge selectively. In fact, part
600
META.KNOWTEDCE, META.RUTES AND META-REASONING
of the definition of intelligence includes appropriate usage of information, not just brute force; so even if the amount of object-level knowledge is small, it is important to use it wisely (5). Also, a main weaknessof reasoning systemscomesfrom the fact that they use a severely limited and predetermined subset of reasoning strategies.Sacerdotr(21) suggeststhat a significant number of strategies should be integrated into a single system. Generally, current AI paradigms have only one strategy, and even that one is embeddedin the inference processor. This implicit inclusion makes the systemsinflexible and hard to modify and expand. Therefore, it would be convenient to represent explicitly these strategies by meta-rules, i.e., by rules that indicate how to use other rules. One could now changethe stratery of the system very easily just by changing the rules in its knowledge base. One could also write rules describing different strategies and have meta-rules of even higher order to decidewhich strategy to choosein each particular situation (6,11- 13). Explainingthe ReasoningProcess.An essentialaspectof the interaction between cognitive agents is the explanation of their reasoning. Explanation (qv) and meta-knowledge are generally associatedsince both constitute a trend toward declaratively representing knowledg" that previously was encodedprocedurally. Moreover, if meta-rules encodestrategies to plan the act of reasonirg, an explanation facility gives an account of the planning decisionsduring reasoning. However, much more researchhas to be done to allow the user to model his own explanation facility in the same sense that he can model his own strategies, i.e., to use meta-knowledgeto explain reasoning. According to Davis (6), the fundamental goal of an explanation facility is to enable a program to display a comprehensible account of the motivation for all of its actions. It is not easy, even for an experiencedprogrammer, to find out how a complex processof reasoning got to where it is. Trying to account for past behavior is even more difficult when dealing with an audienceassumedto know nothing about programming. Comprehensibility, then, has to be defined in terms of the application domain rather than in the language of computation. Current explanation facilities are one of the main reasons for the successof expert systems. They use a goal tree built during reasoning as a basis for explanation. Sincethe goal tree models the control structure of reasoning, it provides a single and easy model for the system'sreasoning behavior. Explanation is then viewed in terms of traversal of the goal tree and is generally activated by two commands,"why" and "how," that allow ascent or descent traversal of the tree, respectively. These commands can in general be issued consecutively to allow the entire traversal of the tree. In some systems, like TEIRESIAS (6), the command "why" has an integer argument that allows the explanation of several levels of the tree to take place in a single step, and the command "how" has an argument that can refer to the number of the rule clause to be explained. TEIRESIAS also has the capability of directly examining the rules in the knowledge base to determine which clauseshave already been establishedand which have not yet been tested. In this casethe explanation facility interprets the same piece of knowledge that the inference facility is about to use. The explanations are thus expressedin terms of the contents of the rule. Morgado (9) suggestedthat the goal tree, or an equivalent
data structure representing the ongoing reasoning process, should be representedin the knowledge baseitself, so that this knowledge could be reasoned about as any other kind of knowledge. A system must be able to explain the course of action taken during reasoning in terms of the knowledge that was used during that reasoning and taking into account the previous interaction with the user. In order to give explanations, a system must understand what it knows and what it is doing. So, knowledge about the specificdomain of application and knowledge about the ongoing reasoning activity should be encodeduniformly to allow the system to reason about them equally (9). This allows the system to use rules to reasonabout its own reasoning behavior and therefore to explain it. Reasoning about a previous or ongoing activity is also a precondition to dealing with dialogues in natural-langu ageunderstanding. One must make use of what has gone on to help interpret what is coming. Planningand ExplainingActivities. What was said about reasoning can be applied to any activity in general. The interaction between knowledge, planning, and action has been the subject of much research (9,10,15-L8,22).A cognitive agent must integrate a belief model with an acting model to form a single model (9,10).It must have a uniform representationfor beliefs and actions to reason effectively about the interaction between knowledge and action. In particular, the system should be able to reason about what knowledge it must have to perform an action, what knowledge it may acquire by performing an action, and what knowledge it needsto plan an action (15-18,22). These are all aspectsof meta-knowledge.In other words, the system must have knowledge about its own knowledge and about acting. Pushingthe DeclarativeApproach To RepresentKnowledge. Finally, the contribution of meta-knowledgeto reasoning and acting can be looked at as the ultimate move toward representing most knowledge declaratively (9). This gives the system the capability of reasoning about knowledge that it could not reason about previously. What Are Meta-knowledge, Meta-rules,and Meta-reasoning? Now that the background and the motivations have been presented,the main conceptstalked about are defined, 4s well as how they relate to each other (9,10). Knowledgeand Meta-knowledge.Meta-knowledge,like object knowledge, is composedof assertions (meta-assertions) and rules (meta-rules). Meta-assertions are beliefs about beliefs, and since a rule that is believed to hold is a belief, metaassertions include beliefs about rules. For example, the belief that John loves Jane is an assertion, whereas the belief that Henry believesthat John lovesJane is a meta-assertionrepresenting a belief about a belief. Similarly, the belief that Henry believes that all men are moral is a meta-assertion representing a belief about the rule that all men are mortal. Other meta-assertionsthat can be represented in a system are the beliefs that John loves Jane is a belief about John; all men are moral is a rule about men; Bill doesn't know whether John loves Jane; I (the system) don't know about the fishing industry in Venezuela. Rules telt how to derive beliefs from other beliefs. Since a rule that is believed to hold is a belief, one may have rules
AND META.REASONING META.RULES META-KNOWLEDGE,
about rules as well. These rules are called meta-rules. There are two types of meta-rules: deduction meta-rules and planning meta-rules. Deduction meta-rules are rules that use rules to derive beliefs or that derive rules from beliefs. For example, the rule A -+ (B - C) is a meta-rule that enables the system to derive the rule B -> C if the belief A holds. SimilarlY, the rule (A - B) + C is a meta-rule that enables the system to derive the belief C in case the rule A -> B holds. Both meta-rules are representedby a proposition that has the proposition representing the rule B + C appearing on the consequent and antecedent position of the meta-rule, respectively. The second type of meta-rules,planning meta-rules,are rules that encode reasoning strategies. The distinction between deduction rules and planning rules, i.e., between reasoning and metareasonirg, is discussedbelow. Reasoning and Meta-reasoning.Believing is a state of knowledgerepresenting the propositionsthat the system assumesto be true. Reasoning is the processof inference to form beliefs from other beliefs using deduction rules. Davis proposedthe use of meta-rules as a means of encoding strategies for reasoning (5,6,11).Meta-rules specify which rules should be considered and in which order they should be invoked. For example, the two rules from Ref. 11 appearing in Figure 1 are of this type. Planning meta-rules have to be used differently from all the other rules (deduction object rules and deduction meta-rules) since they do not expresshow to derive beliefs but how to plan the reasoning process.They are inference rules that specify how the deduction rules should be used. Davis proposeda layered control structure to handle reasoningin the TEIRESIAS system(5,6,11).Thebasicexecution cycle in TEIRESIAS consists of selecting the inference strategy to use (backward inference, forward inference, etc.) and applying it to invoke all rules that are relevant to the goal. But before invoking the rules at one level, the system checks for rules at the next higher level that specify which rules should be selectedand in what order they should be used. Morgado and Shapiro (9,10) seethis processas a particular case of a more general acting-planning processsuch as the one proposedby Sacerdoti (23). Acting is the processof executing a plan. Any complex action has to be planned before being performed. Planning is the processof composinga sequenceof actions to be executedto achieve a predetermined goal from a given situation; it is reasoning about how to act to achieve
Meta-rule1: If 1. you are attemptingto determinethe beststockto investin 2. the client'stax statusis nonprofit, 3. there are rules that mention in their premise the income-tax bracket of the client, then it is very likely (.9) that each of these rules is not going to be useful. Meta-rule 2: If
1. the ageofthe client is greaterthan 60, 2. there are rules that mentionin their premiseblue-chiprisk, 3. there are rules that mentionin their premisespeculativerisk, then it is very likely (.8)that the former shouldbe usedbefore the latter' Figure 1. Selectingand orderingplanning meta-rules.
601
that goal. The basic planning cycle in NOAH (qt) consists of looking for a plan to achieve the goal and of an iterative process in which new refinements of the plan are continuously expanded and witicized until a final plan is derived. The expansion phase produces a new, more detailed plan. The criticism of the new plan consists of any necessary reordering or elimination of redundant operations to ensure that the local expansionsmake global sense.After being constructed,a plan of actions may be executed. Reasoning can be looked at as the sequenceof actions performed in applying rules (plans for reasoning) to derive beliefs from other beliefs. Since reasoning is itself an action, and an action has to be planned before being performed, then before reasonirg, the system must first plan the reasoning. Since planning is reasoning about acting, and in this case,the acting is the act of reasonirg, then this planning of the act of reasoning is reasoning about how to reason, or meta-reasoning,and Davis's meta-reasoning cycle can be seen as a special case of the general planning cycle. Morgado and Shapiro conclude, then, that if an actingplanning-reasoning system usesits acting componentto carry out its reasoning, its planning component will automatically perform meta-reasoning. ConnectingTheories.In philosophy there is a substantial literature on the logic of knowledge and belief (24-26) and on the theory of reasoning and acting (27-29). These topics (15,22,30),as well as the topics of meta-knowledgeand metareasoning (1,6,9,L0,L2-L4,3I,32),and the interaction between knowledgeand acting (15-18) have also receivedconsiderable attention in AI recently. Morgado and Shapiro present a thesis (10) that provides an insight into the relations among these issuesin AI knowledge-basedsystems: In a knowledge-representation(KR) systemin which assertions and rules are representedin the same way as any other concepts,no special mechanism is neededto representmeta-knowledge, where this is understood to include beliefs about beliefs, rules about beliefs,beliefsabout rules, and rules about rules. In a knowledge representationsysternwhich has an acting-planning componentand which can representactions and plans, no other mechanism is needed to handle meta-reasoning,where this is understood in include rules about the order of using rules, and reasoningabout the processof reasoning.The dtfference betweenmeta-knowledgeand meta-reasoningas formulated aboueis that the former dealsprimarily utith beliefswhile the latter deals with acting. We thereforeconcludethat, besides the conceptualdistinction betweenthe objectleueland the metaIeuel,a ualuable distinction to focus on when building KR systems which can haue mets,-knowledgeand can do meta-reasoning is that betweenbelieuing and acting.
Systemswith Meta-knowledge Two systems that have meta-level components-TEIRESIAS and MOLGEN-are briefly described. TEIRESIAS.Davis and Buchanan (2,4_6,11) explore in TEIRESIAS the concept of meta_level knowiedge in several different forms, each of them supporting one or more of the tasks of acquisiiion, accumulation, and maintenance of knowledge. Schemata and rule models were built to support acquisi-
602
META.KNOWLEDCE, META.RULES AND META.REASONING
tion and accumulation of knowledge via interactive transfer of expertise from the human expert to the knowledgebase.Function templates and schemata support maintenance of knowledgeby giving the system a "picture" of its own knowledgeand the way that knowledge is organized.Schemataencodeknowledge about the representation of objectsand about their relationship. Knowledge about inference rules is encodedin the rule models.A rule model is an abstract descriptionof a subset of rules, built from empirical generahzationsabout those rules and it is used to char acterizea "typical" member of the subset. Finally, function templates are list structures indicating the order and the type of the arguments in a typical call of a function. They are used for code dissection and generation. According to Davis and Buchanan (5,6,11),meta-rules embody strategies-knowledge that indicates how to use other knowledge. They show how meta-rules can be used to encode strategies and to define control regimes. They see strategies from the perspective of deciding which knowledge (rule) to invoke next when more than one rule may be applicable. Meta-rules in TEIRESIAS draw conclusions about object-level rules in two ways: they can make deductions about the likely utility of certain object rules or they can indicate a partial ordering between two subsetsof object-levelrules. Davis and Buchanan stress that meta-rules should make conclusions about the utility of object-levelrules, not their validity. They claim that it is because of this fact that it makes sense to distribute knowledgein object-leveland meta-levelrules. Otherwise, it would be only neededto add another premise clause to each of the relevant object-levelrules. Adding meta-rules to the TEIRESIAS system requires only a minor addition to the control structure. The system retrieves the entire list of rules relevant to the current goal. But before trying to invoke those rules, the system first looks for any meta-rules relevant to that goal. If it finds atry, these are invoked first. This may draw conclusionsabout the likely utility and relative order of those rules. The list of objectrules may be shortenedand reorderedby those meta-rules and only then are they used. Viewed in tree searchterms, the implementation of meta-rules in TEIRESIAS can either prune the searchspaceor reorder the branchesof the tree. This processis generalizedin TEIRESIAS, i.e., there can be an arbitrary number of levels of knowledge, each one guiding the use of the knowledge at the next lower level. Finally, Davis defendsmeta-rules,sincethey enable one to use content-directedinvocation. This technique allows the user to define his own invocation criteria, offering him a richly expressivelanguage. Meta-rules also have strong validity, since descriptionsare done via direct referenceto the knowledge source content itself. In meta-rules, then, the two ideas of generalized invocation criteria and content-directed retrieval are combined. The former gives a high expressivenessto meta-rules sinceit allows invocation of any knowledge sourcethat fits a given description. The later gives meta-rules a strong degree of validity becausethere is a formal link between the knowledge source and its description. Besidesthis, content-directedinvocation offers a strong degreeof flexibility in a program, since acquisition and maintenance of knowledge becomeseasier. Editing or adding an object-levelrule doesnot require meta-rules to be edited to make sure they still apply, since meta-rules will adjust to the changesfound in the edited rule. On the other hand, editing or adding a meta-rule doesnot cause problems either, since one does not have to look for all the object rules to which these meta-rules apply in order to mention them in the code.Indeed, 8s invocation is made by a
description of the codeof the objectrule itself, this entire operation becomestransparent to the user becauseagain this burden for system upkeep was transferred to the program. This idea of replacing referenceby name with referenceby description has its problems as pointed out by Davis. First, it is not always clear how to gen erahze from a specificprocedure to a general description of the capabilities desired. Second, the overhead in computer time must be considered. MOLCEN. Stefik (I4) recognizesthe fact that most of the decisionsa planner makes are about the reasoningprocessas opposedto decisions about the problem and that it raises a variety of decisionsthat are usually made implicitly in planning programs with rigid control structures. It is this fact that leads him to propose a layered approach for meta-planning, that is, for planning about planning. His meta-planning model usesoperations for hierarchical planning with constraints and integrates two strategies generally used independently in planning programs: the least-commitment (conservativereasoning) and the heuristic (plausible reasoning (qv)) strategies [namely in Sacerdoti'sNOAH (23) and Sussmans'sHACKER (33) (qr), respectivelyl. By integrating these techniques, MOLGEN makes senseof the use for guessing,but only as a last resort, and so, he considers bugs as inevitable (as in HACKER), but only when one guesses.Guessing is used to compensatefor the lack of knowledge to solve a problem. The control structure in MOLGEN is composedof an interpreter and three layers, called planning spaces.Each space has operators, objects,and steps and controls the creation and scheduling of steps in the next lower layer in the hierarchy. The lowest layer in the hierarchy, the domain space,is called the laboratory space. This is the space that has knowledge about the objectsand operationsof the specificdomain, & genetic laboratory in MOLGEN. This is not a control level at all; it plays merely an executerole. The next layer in the hierarchy is called the design space. It is the space charged with designingthe plans; i.e., it is this layer that createsand schedules stepsin the laboratory space.This is the first control layer in MOLGEN. This spacedefines a set of operators for desigtting plans abstractly and for propagating constraints among the refined subproblemsin the laboratory plan. The top layer of the hierarchy is called the strategy space.The organizational idea behind the strategy space is the distinction between least-commitment and heuristic modesof reasoning. It relies on cooperation between subproblems via constraint propagation to stay in the least-commitment cycle as long as it can (conservativereasoning),resorting to guessing(plausible reasonirg) only as a last choice. This is the spacethat has knowledge about strategy. Although the design operatorsplan by creating and scheduling laboratory steps,the strategy operators meta-plan by creating and scheduling design steps.Communication between spacesis done by using control messages that invoke proceduresat the next lower level without knowing their names. This guarantees the communication to be uniform, but these procedures redundantly represent the knowledge about operators.Another problem is that scheduling is based on numeric priorities rather than on contentdirected invocation. In summary, MOLGEN uses layers as a way of creating abstraction. Although meta-knowledgeis used to combine the least-commitment and the heuristic strategies, metaknowledge is embedded in the interpreter in a form of two cycles that invoke the strategy operators. Therefore,
MICRO-PLANNER
MOLGEN has in the strategy space the tools to create several different control regimes, but the way they are combined to specify a particular strategy is controlled by the interpreter. In order to have other different strategies, the interpreter would have to be modified.
Conclusions Meta-knowledge is knowledge about other knowledge as opposed to knowledge about things in the world. Meta-reasoning is planning the act of reasoning. Meta-rules are rules that "talk" about other rules. They can be deduction meta-rules or planning meta-rules. The planning meta-rules are rules to do meta-reasoning. Recent work suggests that besides the conceptual distinction between the object level and the meta-level, a valuable distinction to focus on when building knowledge-based systems that can have meta-knowledge and can do meta-reasoning is that between believing and acting. The theories of knowledge and belief and of knowledge and action may shed some light on the issues of meta-knowledge, meta-rules, and meta-reasoning.
BIBLIOGRAPHY 1. A. Bahr, Meta-knowledge and Cognition, Proceedings of the Sixth International Joint Conference on Artificial Intelligence, Tokyo, Japan, pp. 31-33, 1979. 2. R. Davis, Knowledge Acquisition in Rule-Based SystemsKnowledge about Representations as a Basis for System Construction and Maintenance, Pattern Directed Inference Systems, Academic Press, New York, 1978. 3. A. Collins, Fragments of a Theory of Human Plausible Reasoning, TINLAP -2, Lg4-201, 1979. 4. R. Davis, Interactive Transfer of Expertise, in Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Cambridg", MA, August L977.
5. R. Davis and B. G. Buchanan, Meta-level Knowledge:Overview and Applications, tn Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge, MA, L977. 6. R. Davis and D. Lenat, Knowledge-BasedSystems in Artificial Intelligence, McGraw-Hill, New York, pp. 227-490, L982. 7. P. A. Kollers and S. R. Palef,"Knowing not," Mem. Cog.4-5r 5bB5 5 8 ( 1 9 76 ) . 8. S. C. Shapiro, On RepresentingAbout, Extended Abstract, Computer ScienceDepartment, SUNY at Buffalo, 1980. 9. E. Morgado,Believing and Acting: An Approachto Meta-Knowledge and Meta-reasoning,Ph.D. proposal,Department of Computer Science,SUNY at Buffalo, 1980. 10. E. J. Morgado and S. C. Shapiro,Believing and Acting-A Study of Meta-Knowledge and Meta-Reasoning, Proceedings of the EPIA-8S (Encontro Portugues de Inteligencia Artificial), Oporto, Portugal, 1985,pp. 138-L54, 1985. 11. R. Davis, GeneralizedProcedure Calling and Content-Directed Invocation, Proceedingsof the AIIPL Conference,August 1977. 12. H. Gallaire and C. Lassere,Controlling KnowledgeDeductionin a Declarative Approach, in Proceedingsof the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo, Japan, rg7g. 13. M. Genesereth,An Overview of Meta-Level Architecture, Proc. of the Third AAAI Conference,Washington, D.C., 119-L29,1988. L4. M. Stefik, Planning and Meta-Planning, MOLGEN: Part 2, Computer Science Department, Stanford University, Stanford, CA, 1980.
603
1 5 . D. A. Appelt, A Planner for Reasoning About Knowledge and Action, in Proceedings of the First AAAI, 1980.
Stanford, CA, 131-133,
1 6 . R. Moore, Reasoning About Knowledge and Action, Proceedings of the Fifth International Joint Conference on Artificial Cambridge, MA, pp. 473 -477 , 1977 .
Intelligence,
L7. R. Moore, Reasoning About Knowledge and Action, Technical Note 191, AI Center, Computer Scienceand Technical Division, SRI International, Menlo Park, CA, 1980. 18. B. Smith, Knowledge Representation Semantics,Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge,MA, pp. 987-990, L977. 19. L. Morgenstertr, A First Order Theory of Planning, Knowledge, and Action, Proceedings of the Theoretical Aspects of Reasoning About Knowledge,Monterey, CA, pp. 99-LL4, 1986. 20. E. Feigenbaum, The Art of Artificial Intelligence: I. Themes and Case Studies of Knowledge Engineering, Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge,MA, pp. 1014-L029,1977. 2L. E. D. Sacerdoti, Problem Solving Tactics, nt Proceedingsof the Sixth International Joint Conferenceon Artificial Intelligence,Tokyo, Japan,pp. 1077-1085,1979. 22. S. Amarel, On Representationsof Problems of ReasoningAbout Actions, in D. Mitchie ed, in Machine Learning and Heuristic Programming Machine Intelligence3, 131-L7L, American Elsevier, NY, 1968. 23. E. D. Sacerdoti,A Structure for Plans and Behauior,Elsevier, New York, 1977. 24. J. Hintikka, Knowledge and Belief, Cornell University Press, Ithaca, NY, 1963. 25. J. Hintikka, Semanticsfor PropositionalAttitudes, in Ref. 26, pp. L45-I67. 26. L. Linski (ed.),Referenceand Modality, Oxford University Press, London, L97L. 27. B. Aune, Reasonand Action, Reidel,Dordrecht,The Netherlands, L977. 28. M. Braud and D. Walton, Action Theory, Reidel Dordrecht, The Netherlands, 1976. 29. H.-N. Castaneda,Thinking and Doing, Reidel, Dordrecht, The Netherlands, L975. 30. A. Maida and S. Shapiro, "Intensional conceptsin propositional semantic networks," Cog. Scl. 6, 29I-330 (1982). 31. K. Bowen and R. Kowalski, Amalgamating Languageand MetaLanguage in Logic Programming, Technical Report, School of Computer and Information Science, Syracuse University, New York, 1981. 32. R. Filman, Meta-Language and Meta-Reasoning,Computer ResearchCenter, Hewlett-Packard Laboratories,Palo Alto, CA. 33. G. J. Sussma\, A ComputerModel of Skill Acquisition, American Elsevier,New York, L975. E. MonGADo SUNY at Buffalo
MICRO-PLANNER MICRO-PLANNER is a subset of the programming language PLANNER (seeC. Hewitt, PLANNER: A Language for Proving Theorems in Robots, Proceedingsof the First International Joint Conferenceon Artificial Intelligence, Washington, DC, pp. 295-301, 1969).PLANNER itself has never been implemented completely,but MICRO-PLANNER was implemented by Sussman,Winograd, and Charniak (seeG. J. Sussman,T.
604
MILITARYAPPTICATIONS
Winograd, and E. Charniak, MICRO-?LANNER Reference Manual, Artificial Intelligence Memo No. 208A, MIT, Cambridge,MA, December1971.MICRO-PLANNER was intended to combine elements of a theorem prover with a normal LISPlike programming language. The mechanism used can best be describedas pattern-directed procedureinvocation. A theorem prover is a program that blindly searchesthrough a database of assertions and theorems. on the other hand, a normal programming language has a fixed prespecified and inflexible flow of control. MICRO-PLANNER behaves like a theorem prover that makes use of additional procedural information. In this way it becomespossible to specify a goal to be reached instead of a detailed algorithm of how to reach it. Winograd's SHRDLU (qv) program is basedon MICRO-PLANNER (seeT. Winograd, Understanding Natural Language, Academic Press, New York, 1972). Deficiencies of MICRO-PLANNER resulted in the development of several other languages, most prominently CONNIVER (qv). J. Gnllnn SUNY at Buffalo
MILITARYAPPTICATIONS During the last few years AI technology-related activity within the military has increaseddramatically. This heightened interest and expanding investment in AI by the Department of Defense (DOD) and the individual services (Army, Navy, Air Force, and Marine Corps) may be attributed to a number of factors, in particular, 1. the very real progressAI technologieshave been making and demonstrating at academiccenters and in commercial applications; 2. the increasing complexity of modern-day military operations, brought about in great degree by significant advances in the speedand accuracyof sensorsand weapons, coupled with the rapid growth in the amount of critical information to be processed,analyzed,and assimilated under severe time constraints with limited manpower; and 3. a growing awarenessand acceptanceby the military of the potential of AI technologiesto help solvemilitary problems.
haps the most significant areas for both near- and far-term AI technologyapplications (1). Underneath the current surge of attention to military applications of AI lies a history of almost 20 years of DoD support, through agencies such as the Office of Naval Research (ONR) and the Defense Advanced Research Projects Agency (DARPA), to basic AI researchat a number of universities. As the discipline has progressedand promising technologiessuch as expert systems and natural-language processing have emerged, interest has gt:own in applying these techniques to challenging real-world military problems. In the early 1980s the Navy took the lead among the servicesand establishedthe Navy Center for Applied Researchin AI at the Naval Research Laboratory to address the transition of basic AI research to naval applications. More recently, the Air Force has accelerated AI research and exploratory developmentat the Avionics and the Flight Dynamics Laboratories at Wright-Patterson Air Force Base and designatedRome Air Development Center at Griffiss Air Force Base as part of a long-range AI effort that includes a consortium of sevenNew York universities and the University of Massachusetts.The Army, also, is investing in long-term AI research,exploratory development,and training of personnel, in part through liaisons with the University of Texas and the University of Pennsylvania. A new, far-reaching program involving a number of universities, defenseresearchand developmentlaboratories,and private industry is the Strategic Computing Initiative (SCI).Administered by DARPA and estimated to cost about $600 million (106)for the first 5 years, SCI is aimed toward developing and applying a new generation of machine intelligence technologyto critical defenseproblems(3). Three specificmilitary areas targeted for initial technology applications are an autonomous land vehicle, 8D intelligent Pilot's Associate, and naval battle management.
Autonomousland Vehicle. The developmentof the autonomous land vehicle, with active participation by the Army, will emphasizecomputer vision and image understandingtechnologies.Ultimately, the addition of advancedAI reasoningtechniques may allow the vehicle to not only senseand react but interpret its environment and then adapt its mission strategy correspondingly.Initial work is concentratingon designing a vehicle that can automatically determine the path of a road and follow it. Eventually the vehicle must also be able to not only detect an obstaclein its path but also determine its naThe possiblecontributions of AI to defensespanthe breadth ture (e.g.,a shadow, a traversable log, or a large boulder reof military activities. Table 1 relates 14 basic AI technologies quiring a detour) and react accordingly. to a number of military-problem areas.Applicability to seven generic military problem areas as well as a number of more Pilot'sAssociate.In concert with the Air Force, the Pilot's specific task domains is indicated as either major or minor. Associate project is directed toward providing the pilot of a That the matrix is quite denseis not surprisirg; each AI tech- single-placefighter aircraft with the support and expertiseof a nology is applicable to a wide variety of military task areas, "phantom flight crew." Rather than addressing the automaand each problem area could profit from a number of AI tech- tion of conventional functions in an aircraft, the project is niques. Note also that the generic problem entry "operations" aiming toward providing logical expertise in specifiedtask aris rated as a potential major application area of almost all of eas through the conceptof an integrated cockpit. Initially, the the AI technologiesconsidered.The more specificmilit ary task system is being conceivedas a construct of four major interacareas enumerated in the table are not only primarily opera- tive expert subsystems: a situation assessmentmanager, a tions oriented, but many are vital componentsin the critical tactical-planning manager, a mission-planning manager, and operations area of command,control, communications,and in- a systems-statusmanager. Specialemphasisis being placedon telligence (CBD.Indeed, military commandersare identifying the pilot-vehicle interface, which will include advancedconCBIand its increasingly complex and difficult problems as per- trol, display, and automation techniques that utilize speech
MILITARYAPPLICATIONS
605
Table 1. Military Applications of AI Technologies" o)
(1 H
-
C)
.F{ +)
AI Approaches To
c{
cU
iF
o eq
8r 3 cd<
.9
'|-{
5
E
I
l<
bo
tr
tr
c
rcs
A
jtr T d
dE
'98
E E sX i €, 5 f" E F_t "
E
-? E -
cs
X
i
ho
g)E
€-
€-
rcu aa
b0
bo
ho
€Defense Applications
. - 9 6 'X .E
!
H€
GE
U
E;
qP
hn
EF
=
=
o
i:
r)
tr
tr
:--,
cl e ' A5
il
B g E rra-= .t{ o-
X
X
X
X
X
X
X
X
X
X
X
a)
Manufacturing Operations Maintenance Logistics Personnel Training Intelligence collection and surveillance Intelligence processing Intelligence analysis and situation assessment Sensor resource allocation Force allocation Force command and control Route planning and navigation Battle tactics Targeting Autonomous and semiautonomous vehicles Avionics Electronic warfare C3 Countermeasures Communications Network control Information routing Information management and retrieval Combat engineering and support
X
X
X
a)
o
o
o
^
X
X
X
X
X
X
X
X
X
X
X
X
X
X r-\
X a)
X
X a)
X
X a)
X
X
X
X X
X
X
X
o
X
X
\,
X X X
X
X
X
X
X
X
a)
a)
^ X
a)
X
o
X
X a)
X
X
X
X
X
X X
\,
X
n a)
X
X
o
^
X
X
X
o
X
X X
o
X
X
o
X
X
X t l
X
X
X
X
X
X
a)
X
X
X
X
X
X
X
X
X
X
X
X
X
X n
X
X
X
X
X
X
X
X
X
X
X a)
.)
X
X
o
X
X
X X
E
U
X
X
n
i
X
X
X
^
r-\
X
€E E>
A
X
X
X
qB cEd
X
X
X
€ Eo o
BE
X
X
X
sc d[ o
X
X
X
X
dE E9
X
a\
X
a)
€ 9
(D
X
X
X
X
.)
X
.q
rE
X
X
X a)
rf
6
\, ^
r)
X a)
X
ia
-' io t r
X
X
X
:
o
X
X a\
X
!o
.:
g
UE 5a
X
a)
O.d,
* fi
R&D
X
E
gg 5E
v J -
X
-
=
5
O
.95 'j5'H
E .E ? 5 5b fr E E sg E 9 q cd o
R
LV
P
E
r\
X
X
^
" Symbols: O, major applicability; x, minor applicability. From Ref. 2. Courtesy of EW Communications, Inc.
recognition (see Speech understanding), derstanding (qt), and voice synthesis.
natural-language
un-
Naval Battle Management. A goal of the battle management prograh, & joint effort with the Navy, is to demonstrate how AI technology, particularly expert systems and natural-language understanding, can contribute to the development of automated decision aids for the complex combat environment. Five battle-management functions have been identified as initial application areas within fleet-command center operations. They include force requirements, capabilities assessment, campaign simulation, operations planning, and strategy assessment. These functions are well defined, yet complex, demanding, and labor intensive, requiring skill and expertise to perform and are thus promising candidates for expert system decision aids. As with the personnel they will support, expert
systemsdevelopedfor these applicationswill need to interact and cooperatewith each other. Emphasis is also being placed on natural-language understanditg, both as an interface between the expert systems and their users and as a means of automating the processingof the ever-increasingcommandcenter message traffic, which can expand lO-fold during a crisis. Military operations,and in particular C3I, possesssignificant characteristics that have not always been prominent in other AI application domains. One such characteristic is the time-critical nature of tactical decision making-the need for appropriate, real-time responseto dynamic situations. The deployment of increasingly complicated surveillance and weapons systems, both friendly and hostile, has compressedthe time available for tactical decision making. Automated deci-
606
MILITARYAPPLICATIONS
sion aiding (and ultimately automateddecisionmakirg) under these conditions must emphasize efficient solution-space search and pruning techniques and consider finding the first solution that satisfies a given set of conditions or exceedsa specifiedthreshold. In addition, vast amounts of diverse, often incomplete and uncertain data must be interpreted and integrated to form the tactical picture upon which situation assessmentsand consequenttactical actions are based. Therefore, effective techniques for reasoning under uncertainty will be crucial to automated decision support (seeReasonirg, plausible). The problem bf information processingin the military is an enormous one, due both to the vast quantities of data to be handled and to the distributed nature of the generation and usage of the information. Huge databasesmust not only be maintained and updated but must also be quickly and efficiently accessibleby a distributed hierarchy of military personnel with differing needs.The flood of incoming data must be analyzed, disseminated,integrated, stored, and presented appropriately and in a timely manner. Thus, methodologies for efficient distributed database management and information interpretation and integration as well as man-machine interfaces that accommodatethe specialtzedneeds and personal preferencesof the system user will be required. The geographic and functional distribution of both C3I assets and C3I decision-making authority and responsibility have led CsI to be describedas an excellent exampleof distributed problem solving (qv). A new field of researchwithin AI, called distributed AI, is addressingmany of the difficult issues in this area (4). For example, how may control be most effectively distributed across a network of semiautonomousproblem-solving or decision-making nodes and still ensure their cooperation in arriving at consistent and coherent problem solutions or strategies?How may tasks be assigneddynamically among often competing nodes,and in what ways should nodes communicate with each other and what kinds of information should be exchanged?Additionally, how will distributed problem-solving systems recover from the failure of one or more nodes?These questions are but an indication of the challenges in developing intelligent systems for distributed problem solving in such domains as C3I. To date, military AI application systems are still in the prototype developmentstage.The following sectionsdescribea small sampling of experimental systemsthat are demonstrating the feasibility of applying AI techniques to a variety of military needs. Application areas include sensor-information integration for situation description and assessment,combatresourceallocation, mission planning, maintenanceand troubleshooting of military equipment, training, and automated natural-langu ageunderstanding (qv) of military messages.A crucial issue that separates current prototype systems from operational systemsis that of robustness-the ability of a system to "keep its head" and not fall apart when facedwith input that is unfamiliar, violates internal-system constraints, or contains unresolvable ambiguities. Many AI researchersalso feel that operational systems, particularly those that are expert-systembased,will need a capability to learn to survive in the complex, dynamic military environment. Although current computer systems cannot adapt and improve themselves significantly on the basis of past mistakes or acquire new abilities through observation (e.g., by example or analogy), machine-learning research, which is receiving increased attention following its evolution from early network approachesto
present-day knowledge-intensivetechniques, is making progress toward these goals (5). Sensorlnformationlntegrationfor SituationDescription and Assessment A central problem in military intelligence is the construction of coherent situation descriptions using sensor information. Situation descriptionsprovide crucial support to military decision makers over a wide range of activities, from local tactical operations to strategic planning. Sensor information comes from diverse sourcesin a variety of forms, such as intercepted communications,radar returns, intercepted radar emissions, aerial surveillance, sonar, etc. Such data are often incomplete and uncertain and may be time delayed, ambiguous, and in error. As the technology of warfare escalates,the informationintegration problem grows along two dimensions: The quantity of sensor data is increasing at the same time that the variety of such information is proliferating. This combination creates a potentially overwhelming situation for the human analysts who must generate curuent, coherent situation descriptions and assessmentsunder increasingly restrictive time constraints. Perhaps the simplest form of sensor information integration occurs when returns from successivesweepsof a single radar are correlated to produce a track of somedistant object. Conventional computer algorithms have long been developed for this and other routine correlation tasks. More recently, however, the techniques of AI have been applied to sensor information integration problems that normally require the attention of human analysts since their solutions often involve reasoning with incomplete, uncertain evidence.This section describestwo such applications of AI technologyto sensorinformation integration. The ANALYST program illustrates the use of AI techniques to help generatetactical situation descriptionsand assessmentson the battlefield (6). Developedin the early 1980s by the MITRE Corporation as a prototype expert system for the Army, ANALYST uses reports from multiple-sensor sourcesto generate a real-time battlefield situation display for use by force commanders and their staffs. The premise on which the ANALYST program is basedis that the existenceof enemy units can be inferred from their basic war-making activities. Thus, the input to ANALYST is in the form of reports involving five types of intelligence: intercepted communications, indications from shooting sensors,photo interpretation, radar interceptions, and moving-target indications. The output of ANALYST is a situation map showing suspectedlocations of enemy units. The process of fusing the incoming stream of intelligence reports into a coherent situation map is performed in a deductive fashion using production, or if-then rules. An important constraint of the project was the implementation of the software on computers small enough to be deployedat a battlefield command post. Three basic types of entities are manipulated by ANALYST. Each of the three entity types is representedusing a frame hierarchy (7) (seeFrame theory). Thus, intelligence reports, groups of seemingly related intelligence reports (activity clusters), and hypothesizedbattlefield entities (tanks, command posts, etc.) are all stored as frames. The selection of frames as the basic data structure provides a convenient framework for storing information having taxonomic structure. For example, each photo interpretation report is a spe-
MILITARYAPPLICATIONS
cific instance of a generic intelligence report, and therefore it inherits certain properties from intelligence reports in general. In addition, frames provide for the attachment of demon functions, which supply values for slot values that may be missing as a result of an incomplete report. Although a good deal of information is inherent in the frame structures, the major portion of the domain knowledge available to ANALYST is stored in six distinct knowledge bases.Each knowledge base consistsof a collection of if-then rules that operate on the frame entities. The first knowledge base servesto associateeach incoming intelligence report with clusters of previously processedintelligence reports from the same general geographicalarea. Patterns among these activity clusters are recognizedby the second knowledge base, which creates a frame representing a hypothesized battlefield entity whenever one of its pattern rules fires. A corresponding symbol is placed on the system's graphical situation map to represent the entity. One of the slots in the newly created battlefield entity frame contains a likelihood that is used to indicate the strength of the evidence used to infer the entity's existence. The inference processof the first two knowledge basesis pursued in a parallel fashion for each of the five types of intelligence reports. Thus, it is possiblefor the pattern rules of the secondknowledge base to create multiple frames representing the same battlefield entity when that entity's existence is supported by more than one form of intelligence. Such duplicate entities are merged into single compositeentities by the merge rules of the third knowledge base. This merging process is a crucial step not only becauseit removesredundanciesin the situation map but also becauseit allows information from diverse sourcesto be integrated into a coherent situation description. Tactical and terrain data are used by the fourth knowledge base to refine the descriptions of the hypothesizedbattlefield entities. The rules of the fifth knowledge base reinforce the existence of hypothesized battlefield entities by examining those activity clusters that were not used by any pattern rules. If a stray activity cluster is sufficiently closeto somehypothesized battlefield entity, a reinforcing rule may use that unclaimed cluster to reinforce the entity's existence. One way that the refinement and reinforcement of the fourth and fifth knowledge basesis accomplishedis by adjusting the values of the tikelihood slots contained in the various battlefield entity frames. The sixth knowledge base serves to delete hypothetical battlefield entities that have persisted for a sufficient length of time without reinforcement. ANALYST's rules are segregatedinto separateknowledge bases for control purposes. Each of the knowledge bases is applied to the incoming data in the order that they have been described. Thus, all possible clusters are formed before the pattern rules are applied, all pattern rules are applied before any merging rules are applied, etc. The partitioning of ANALYST's rules into specializedknowledgebasesfacilitates controlling this sequential application process.ANALYST works from lower level data to higher level conclusionsusing a forward-chaining inference mechanism. Thus, conclusionsmade in the then portion of an if-then rule may be used later to satisfy the if portion of some other rule. Conflict resolution, the processof deciding which rule to select when more than one rule is applicable,is handled by applying the rules in their order of appearancein the knowledge base. ANALYST was tested using data from a computer simulation of a battlefield environment. The simulation contained
607
modelsof enemy units performing somespecificmission, and it simulated the intelligence observablesthe enemy units would produce as a result of their war-making activity. In addition, the simulation employed models of friendly sensors used to capture the intelligence observables.Two capture ratios were used in testing ANALYST: 35 and 20Vo.(A 35Vocapture ratio means that the intelligence-gathering apparatus captures only 35Voout of all the possibleevents.)In both casesANALYST produced a quite comprehensive situation ffiap, even given sparse intelligence. For example, even at the 207ocapture level, approximately half of the simuiated battlefield entities were correctly hypothesized.In somecaseshypothesized locationswere accurate enough to be used as targeting data for area weapons.In addition, it may be difficult to trick the ANALYST program using decoyssince it employsinformation from diverse intelligence sources. Just as battlefield entities produce observableinformation during their activities, ships and submarines produce observable features as they transit the ocean.The observableof interest here is an acoustical signature-energy in certain narrow bands of the sound spectrum and particular harmonics of these fundamental frequencies-that is produced by the propulsion system and other equipment in the vessel. To detect and classify the ships and submarines operating in a certain sector of the ocean,the Navy uses acoustical data collectedby submerged hydrophone arrays located at the ocean'speriphery. Each hydrophone of the array is directional, so that its sensitivity is concentratedin a cone that projects out into the ocean. The signals collected from the hydrophones are displayed in the form of a sonograffi,a time series display of the acoustical energy spectrum detected at the hydrophone. Highly trained sonar analysts interpret the sonograms,and by using their knowledge of ship and submarine signature traits, sea-lane characteristics, underwater sound propagation, and intelligence information, they develop a situation board that describesthe current state of activity in the ocean sector in question. The most straightforward situation for the analyst occurs when only one source presents itself on a given hydrophone channel. In that case the processof matching the incoming signature with a collection of stored referencesis complicated primarily by noise, changing acoustical propagation conditions, measurementerrors, and the possibleincompletenessof the signal data. A more difficult situation is one where radiations from several vesselsare captured on the same channel and where several channels are active simultaneously. The processof disentangling these multiple signatures is challenging to even the most experiencedanalysts. In order to investigate the feasibility of using automated knowledge-basedreasoning to aid in this complicated signal-understandingtask, DARPA initiated a research project in the early 1970sinvolving computer scientists at the Stanford Heuristic Programming Project and also at Systems Control Technology. The resulting programs, HASP (Heuristic Adaptive Surveillance Project) and SIAP (Surveillance Integration Automation Project) (8,9), were evaluated in the late 1970swith quite promising results. There are several superficial similarities between ANALYST and HASP/SIAP. For example, frames are used to store static knowledge about the characteristicsof vessels,and entities hypothesizedby HASP/SIAP have associatedweights as a measure of the confidencein the hypothesis. These weights are used in a fashion similar to ANALYST's likelihoods.
608
MILITARYAPPLICATIONS
HASP/SIAP representsmuch of its domain knowledge as production rules. In addition, the information refinement process in HASP/SIAP is similar to that of ANALYST. ANALYST successivelyrefines intelligence reports into activity clusters and then refines activity clusters into entities. The refinement processof HASP/SIAP begins by detecting harmonic relationships between sonogramlines and associatingthem into harmonic sets. Harmonic sets are further related to potential shipboard noise sources,and groups of sourcesmay suggest a specificvessel,etc. The underlying framework for problem solving used by HASP/SIAP, however, is quite different from that of ANALYST. The control strategy employed by ANALYST is to apply knowledge basessequentially, each of which uses forward chaining and a straightforward conflict-resolution scheme. The control strategy of HASP/SIAP is a much richer one and is known as a blackboard architecture (10,11) (see Blackboard systems).In this implementation the production rules are divided into a hierarchy of knowledge sources.The lowest level in this hierarchy consistsof specialistknowledgesourcesthat contain domain knowledge about ocean surveillance. The higher levels of the knowledge source hierarchy contain strategic knowledge about how to solve ocean surveillance problems. These problem-solving strategy rules monitor a central data structure called the blackboard, where the current best hypothesis[e.g., &t the highest level of analysis, the situation board postulating the most likely vessel(s)basedon data available up to that timel is posted. The strategy rules determine opportunistically which of the lower level knowledge sources should be applied to the current best hypothesis in order to provide the most refinement. Thus, all of the knowledge sourcesoperate on the same blackboard under control of the strategy knowledge source,whosejob it is to provide focus of attention for the system. The lower level knowledge sources may be invoked in either an event-driven (forward-chaining) or an expectation-driven (backward-chaining) mode. As in ANALYST, event-driven inference combinesincoming data to create hypothesesat higher levels of abstraction. For example, a newly found sonogram line might be combined with other lines to form a harmonic set. Expectation-driven inference takes a higher level hypothesis and searchesfor lower level information to support it. For example, supposethat the current best hypothesis contains a certain type of ship known to possessseveral noise sources.If not all of the expectedsources are present, expectation-driven inference would direct the system to look for lower level information, such as the presenceof certain previously unexplained sonogram lines, in order to reinforce further the higher level hypothesis. Thus, even though ANALYST and HASP/SIAP both addressthe military problem of integrating and interpreting sensorinformation to develop a situation description, they use dissimilar architectures to accomplishtheir goals. The performance of HASP/SIAP was evaluated in a series of three experiments performed by the MITRE Corporation in the late 1970s.During these tests the expert system'sperformance was compared to that of two expert sonar analysts. In all casesHASP/SIAP developedsituation descriptionsof similar quality to that of the experts, and in one case it outperformed a human analyst. Combat ResourceAllocation A critical element of battle management is the allocation of combat resources,both in anticipation of and responseto tacti-
cal situations. In particular, battlefield commanders have always been confronted with the problem of determining how to allocate their weapons resources so as to destroy desired targets most efficiently. A wide range of factors, pertaining both to the enemy and to friendly forces, can influence the success of a weapons-assignment strategy. For example, it may be important to consider the enemy's counterfire ability, vulnerability, etc., and at the same time, the state of readiness of friendly forces and the easewith which they can be resupplied. Furthermore, the allocation problem is compoundedfor modern commandersbecauseof the ever-widening variety of weapons from which to choose. The Marine Corps is addressing this problem with the introduction of the Marine Integrated Fire and Air Support System (MIFASS). Under MIFASS, fire and air support centers would be established to help solve weapon-to-target allocation problems. These centers would perform weapon allocation planning using information relayed from forward observers equipped with hand-held digital communications terminals. Originally, MIFASS used a heuristic algorithm for weapon-totarget assignment that approachedthe allocation problem in a sequential fashion by optimizing on successiveweapons (12). However, this simple sequential scheme has several limitations: it does not consider the assignment problem as a whole; it doesnot allow for more than one weapon to be allocated to a target; and it ignores a significant number of battlefield factors. More recently, BATTLE, a prototype interactive decision support system that employs AI techniques to solve the weapon-to-target allocation problem, has been developed at the Naval Research Laboratory to remove these limitations (13,14). In its first phase of operation BATTLE examines each possible weapon-to-target pairing and calculates a measure of its effectiveness.This effectiveness calculation is performed by a computation network that is a generalization ofthe inference networks of PROSPECTOR (15). The network, which is prepared by a military domain expert, involves an extensive set of over 50 weapon, target, and battlefield situation factors. Data for a particular battlefield situation are entered interactively by the system user under BATTLE's guidance. In its secondphase BAT'TLE generates a weapon allocation tree using the effectiveness measures computed in the first phase together with a user-supplied set oftactical values ofthe targets. Instead of searching for the optimal solution only, BATTLE allows its user to specify a value, ft, such that the best ft plans will be found. Because the size of the weapon allocation tree becomes astronomical in complex battlefield situations, it is not computationally feasible for BATTLE to explore it exhaustively. Rather, a pruning algorithm is used so that only a selected portion ofthe tree is explored. The pruning algorithm works by applying a heuristic each time a new node is generated. The heuristic calculation finds an upper bound for the overall destructiveness of the current partial assignment, If the upper bound indicated by the heuristic is less destructive than the &th best complete assignment found so far, the current partial path is abandoned. As in most command and control situations, there is a certain time criticality associated with solving the weapon-totarget assignment problem. As mentioned earlier, BATTLE's first phase considers a multitude of factors in calculating the potential effectiveness of each weapon against each target. It would be time-consuming and tedious if BATTLE always insisted upon asking its user all possible questions about the situation at hand, especially if some of the answers affected
MILITARY APPLICATIONS
the outcome only marginally. To prevent this problem and thereby acceleratethe interrogation process,I new questioning stratery called merit was developed(16). The merit strategy ensures that BATTLE focusesits question asking so that the questions asked first are those questions whose answers will have the greatest effect upon the final outcome.A cutoff value may be set so that questionshaving a merit value below the cutoff will not be asked. Experiments have shown that a significant reduction in the number of questions asked occurs when the merit stratery is used to guide questioning. MissionPlanning Another complex problem facing military commandersis the task of mission planning. As in other military probleffis,mission planning is performed at many scales,ranging from tactical to strategic. In all casesthe planning processis a laborintensive one, relying on both the common sense and the specializedtraining and experience of the commanders.The potential for applying AI to mission planning has long been recognrzed,and several existing efforts illustrate the range of applications. Tactical air planning for the Air Force provides an example of an intermediate-level planning task facing the military. In this case the missions being planned are air strikes against designated targets. The problem is to design a plan in which aircraft and ordinance are assignedin such a way as to ensure the destruction of these targets within some predetermined probabitity. This planning processbegins on receipt of an apportionment order issuedby the Joint Task Force Commander. The resulting air tasking order, which may take 24 h to complete manually, specifiesa detailed plan that satisfiesthe original apportionment order. To accelerateboth the planning processand the replanning process,the Air Force has funded the developmentof a planning aid called KNOBS (17). KNOBS (the KNOwledge-BasedSystem)was developedat the MITRE corporation between 1978 and 1982. Its specific domain of expertise is planning ground-strike counter-air missions in the European theater. Its knowledge base contains plausible (but, for security reasons,not necessarilyaccurate) information about a number of potential targets and friendly air bases,generic information about aircraft and ordnancecapabilities, information about antiaircraft defenses,and Air Force tactical doctrine. In a typical KNOBS interactive-planning session,the user would enter the desired target and the desired probability of destruction for that target. The user can then specify other particulars for the mission, such as the type and number of aircraft to be used, which air base should supply the aircraft, etc., or the user can have KNOBS make suggestions for each particular. An advantage of allowing KNOBS to make suggestionsis that KNOBS will only make suggestions that result in a valid plan, and KNOBS will present its suggestionsin an order of preference.At any point in the sessionthe user can have KNOBS checkfor inconsistencies in the partially created plan. For example, the system can alert the user if the selected aircraft and airfield are too far from the designated target or if the ordnance selectedcannot achieve the desired probability of destruction. The processof refining the details of the plan continues,with the user always having the option of letting KNOBS attempt to complete the plan on its own. When the plan is completely specified, KNOBS warns the user about possibleantiaircraft defensesin the vicinity of the target, and the interaction is complete. KNOBS approachesthe planning processby treating each
plan as an instance of a prototypical plan. Plans, as well as nearly all other objects in KNOBS, are represented using frames.Thus, the constructionof a valid plan consistsof building an instance of a plan frame in which all slots have been filled in and the values contained in the slots define a valid plan. The validity of the plan is defined in terms of constraints that exist between the various particulars of the plan. For example, the fact that a given aircraft has only a fixed operating range defines a constraint on the distance between that aircraft's airfield and any potential target areas. In KNOBS the planning process is simplified greatly becauseall of the constraints are known a priori. Although the approach of generating a specific plan by elaborating on a template plan is a limited one, it has been found to be useful in a variety of domains that require somewhat stereotypical planning. By modification of KNOBS's domain-dependentcode,the Navy has used KNOBS to plan certain specificcategoriesof naval missions,and NASA is investigating several applications of KNOBS relating to planning for the space shuttle. A further refined planning system is now being developedfor the Air Force (18). The planning cycle in the Navy is not unlike that of the Air Force; operational mission planning is initiated by the arrival of an operational order document, which states a mission goal in very general terms. The resulting operational plan, which may take a team of commanders days or weeks to complete, specifiesin detail a military plan the planners believe best satisfiesthe original order. To provide Navy commanderswith a planning tool, a knowledge-basedproblem-solving system called OPPLAN-CONSULTANT is being developedat the Naval ResearchLaboratory (19). OPPLAN-CONSULTANT solvesplanning problemsin the domain of naval operational planning. Unlike KNOBS, OPPLAN CONSULTANT is designedto be a general naval planning tool incorporating knowledge acrossthe full spectrum of naval operational planning. The software system used to represent and operate on the domain knowledge is called CKLOG (Calculus for Knowledge processingin LOGic). CK-LOG is a knowledge-processingsystem that uses a three-valued logic (true, unknowr, and false) in building partial models of world states and a two-valued logic (true and false) for theorem proving. An important feature of CK-LOG is its ability to represent and reason about actions and the temporal dependenciesbetween them. A pilot implementation of the system is currently under construction; however, it is expectedthat it wilt take several years before a sufficiently extensive knowledgebase has been developedto demonstrateOPPLAN-CONSULTANT in a realistic planning environment. of Military Equipment Maintenanceand Troubleshooting Sincethe early 1960smilitary equipment has increasedsteadily in complexity and variety, whereas at the same time the pool of trained technicians has been decreasing.A major cost of operations is in fault diagnosis and repair, the procurement of maintenance equipment, and the training of technicians and operators. Each of the services has problems that are unique to its mission, but all share problems of space,difficulty in providing logistics support, and limited technical manpower. These factors, coupled with the demandsof operations, place heavy emphasis on speedyand accurate diagnosis and repair in the field. The various difficulties have created prime opportunities for the application of AI, and a number of efforts are underway. This discussion considers AI applica-
610
MILITARYAPPLICATIONS
tions in three key military maintenance areas: automatic test equipment (ATE), built-in test (BIT), and interactive troubleshooting aids. AutomaticTestEquipment(ATE).In any maintenanceapplication where only a limited pool of human experts is available, the application of an expert-system-basedmaintenance aid is an attractive option. In electronics equipment maintenance the possibilities for immediate benefit are even more apparent. This is particularly true for aircraft electronics (avionics) because of the large number of different systems involved, the heavy reliance of modern aircraft on avionics for mission accomplishment, and the premium placed on rapid turnaround. In avionicsmaintenancethe Navy and Air Forcerely heavily on automatic test equipment (ATE) for diagnosis of faults. This reliance is especially evident in the Navy, where scarcity of spacelimits test equipment and manpower and sparesstorage. Even though many items of avionics have an intricate built-in test (BIT) with automated testing, high false removal rates, excessivelevels of fault ambiguity, and the continued need for human intervention are still problems. (Falseremoval rates as high as 857oare found, and ambiguities involving three to five circuit cards are fairly common.) ATE makes use of test programs sets (TPSs)that consist of an interface between the avionics and the ATE and software for fautt diagnosis. For each different item of avionics a separate TPS must be provided. Test program generation is highly manpower intensive, and results are variable, with high costs and long delivery times common. A test sequencemay take between20 min and L2h to diagnosesystemfaults. Moreover, a typical Navy carrier, for example,requires over 600 different TPSs to support the avionics on its various aircraft. In many casesTPSs are inadequate, either failing to identify faults in a reasonabletime or producing a large ambiguity group of suspected faulty components.These factors, coupledwith limited expert manpower, make ATE an especially attractive application for AI. Efforts underway in this area are applying expert systemstechnology toward the performance of efficient, accurate fault isolation either automatically or interactively with maintenance personnel (20 -23). A more near-term application of this knowledge-basedapproach is directed toward the automatic generation of TPSs for execution on existing ATE configurations. For electronics fault diagnosis an expert-system database typically consists of two kinds of information: detailed specifications for the equipment to be diagnosed and results of measurements. For electronics equipment the specifications consist of such information as a functional description, interconnections,nominal values for normal operating parameters, and componentvalues and tolerances.This kind of information must be available for each piece of equipment and is equivalent to the manuals and performancespecificationsthat a technician would use. The additional data information in the databaseconsistsof symptoms and results of measurements. The rule base of the system consists of general diagnostic methods, rules associatedwith particular classesof equipment, and finally, rules unique to the specificequipment being tested. The key to the effrcient utilization of expert systemsin ATE is the automation of the rule and data acquisition process.This particular bottleneck to expert systemsdevelopment in general is a prime candidate for automation in this application becausethe design data for military electronic equipment is already available in a computer-usable form from CAD/
CAM databases.In addition, it is expectedthat at least someof the rules can be automatically captured from analysis of system functional descriptions and circuit topolory. The possibility for automatic "knowledge compilation" is an important driver in applying expert systems to electronic diagRosisin ATE and should be useful in more conventional maintenance aids as well. In fault isolation systems being developed by the Navy lsuch as FIS (Fault Isolation System) at the Naval Research Laboratoryl and the Air Force,the conceptof "functionality" is utilized to add a dimension of deep reasoning without resorting to detailed circuit analysis (23-25).Most existing expert systems are limited to knowledge bases that provide shallow reasoning capability within their area of expertise. Deep reasoning is not even a reasonableobjective for many application areas becausethe level of theoretical understanding is inadequate to permit reasoning from basic principles, and even if it were, the processwould be incredibly inefficient. Electronic systems offer a potential for effective deep reasoning because their functions are fully understoodand documented,and convenient partitioning of functions can permit a mix of shallow and deep reasoning to be utilized as appropriate. Under the conceptof functionality, electronic equipment subsystemsare consideredto be more than simple nodes in a circuit. In addition to producing a value of output under a given stimulus, a subsystem is consideredto provide a specifiedtransformation of information. By reasoning about the relationship of functional elements in addition to tests made concerning nominal measuredvalues, ambiguity about the ultimate causeof faults is expectedto be greatly reduced. A prime, long-range objective in using expert systems for diagnosis is to rninimize the total testing time to unambiguously isolate the fault. An additional shorter-range goal, with substantial benefit in cost and timeliness, is the automation of TPS generation for existing ATE. A current Navy project, Intelligent Automatic Test Generation (IATG), is incorporating the features of FIS along with a performance improvement capability basedon actual test fault detection successto generate conventional TPSs and eventually perform as an on-line controller for future ATE. The economic benefits are potentially very high for a successfuldiagnostic expert system,for it is estimated that TPS and ATE procurement costs could be reducedby 25-50Vo.This is no small matter since these costs are several billion (10e)dollars a year for all the services.One key point for the application of expert systems in ATE is the already high level of commitment to automation and the fact that most of the equipment neededfor immediate application exists and is designed for computer control. Significant benefits can be achieved in nonavonics applications as well, but in most casesstimulus and measurementsrequire human intervention. The issue is addressedfurther in the discussionabout interactive maintenance aids. Built-in Test (BlT). For large weaponssystemsoff-line testing using ATE and manual methods may be impractical and inadequate. Typical characteristics of such systems are high value, long operating times, and isolation from sources of spares and test equipment during normal operations. Some typical examples are submarine and surface ships, where replacement of "black boxes" is not practical owing to the nature of the equipment and the difficulty in providing sufficient sparesto last for a full deployment. In such circumstancesthe equipment is typically diagnosedand repaired in place at ei-
MILITARYAPPLICATIONS
ther the module or componentlevel. Similarly, diagnosisand repair or the activation of redundant systems on large, longendurance arrcraft must be done in flight to ensure adequate capability levels. The Air Force has undertaken a project to develop "smart" BIT for digital systems, with the intent of minimizing false alarms, improving fault coverage,and identifying intermittent faults (26,27). The intent is to provide design conceptsthat individual designers would use to incorporate smart BIT. Although smart BIT improves performance of individual systems,two Air Force projectsare looking at the overall operation of a vehicle and its systems.The integrated maintenance information system (IMIS) project is designedto provide flightline personnelwith accessto all onboarddiagnostic data available as well as accessto the supply system,scheduling data, training, and maintenance records.The B-1B aircraftwould be a likely candidate for a demonstration sincethis aircraft incorporates extensive BIT already. In addition, the Generic Integrated Maintenance Diagnostics project (GIMADS) proposes to use AI coupled with more conventional equipment and software to address the overall diagnostics problem in an integrated system. Also, the Navy is developing an expert-system-basedradar maintenance aid for the AEGIS ship combat systeffi,I modern missile-defensesystem for Navy cruisers and destroyers. The complexity of the systems involved makes conventional software approachesuneconomical; however, these systemsare consideredexcellent applications for expert system technology. lnteractive MaintenanceAids. Many military systems are not adaptable to the approach taken in avionics and large electronics systems either becausethey are largely mechanical and must be diagnosed in place, lack sufficient built-in sensors to diagnose, or must be repaired in the field under austere conditions. To addressthese cases,there is strong interest in maintenance aids that can interact with operators and technicians to guide the diagnosis, provide advice, or make technical information available in a readily usable form. Commercial work in this area has been successful(e.g., the DELTA system at GE), and the military is interested in exactly the same sort of easily transportable interactive system for field use. For direct aids to a man, much more attention must be given to interfacing with the technician than is neededfor ATE or BIT. Besidesthe more austereenvironment in which he must operate, the technician is likely to be less highly trained and less tolerant of systemdemands.Such aids must provide him with the information he requires in a readily usable form, in natural language, and with advanced graphics capabilities to be of significant utility. For maintenance aids the problem is not so much free-form communication as it is providing accessto large bodiesof information in a convenientway. Video disks under computer control are being explored as one solution, but the options are still open at this time. The IMIS, GIMADS, and Integrated Diagnosticsprojects all expect to develop some form of easily transportable aid of this sort. The army in particular has need of this type of maintenance aid becauseof its austere operating environment and the emphasis placed on rapid and accurate repair in the field. The work done in diagnostic expert systemsin ATE should be directly applicable to interface with a man, but voice response,natural-language capability, advanced graphics, and perhaps vision and image understanding will be critical elements. The psychology of implementation is very important here, with several key issuesto be consideredbeyondthe per-
611
formance of the AI system itself. To be effective, these man aids must be more than simply intelligent; they must becomea partner of the maintainer to a degree well beyond existing systems. To be specifically avoided is the "smart machinedumb man" philosophy. This approach, perhaps justified in certain instances,can only lead to failure in servicedue to poor job satisfaction, wasted human capability, failure to capitahze on learning as a by-product of aid, and outright sabotage.The various aspects of natural language, voice recognition, and reasoning neededto produce interactive maintenance aids are very similar to those needed in any interactive environment and neednot be expandedfurther here. The potential for use of Al-based systemsto improve human performanceis especially evident in the field of diagnosis and repair. For all branchesof the military-complex equipment, high costs,the needfor rapid and accurate diagnosis, and the relatively high turnover rate of manpower create prime opportunities for AI applications. Training An additional, and potentially very important, use of AI technolory is in training. As military operationsand combat systems increasein technical complexity and personnelresources both shrink in number and increase in turnover, the efficient, yet thorough, training of military personnel is crucial to all the services.Many of the sametechniquesused to aid decision making can be applied to provide guidanceand instruction in the training process.An important example of an intelligent, computer-basedmilitary training systemis STEAMER, developed at the Navy PersonnelResearchand DevelopmentCenter (28). STEAMER's domain is propulsion engineering. Steam propulsion systems are an integral part of most Navy ships, and it is imperative that the engineers who operate them have a thorough understanding of their behavior. Although these engineers must operate the systemsroutinely on a day-to-day basis, their understanding must be complete enough to enable them to anticipate the behavior of the system during mechanical failures. The cost of specialized-training simulators is high for such systeffis,and the use of traditional simulators does not necessarily engender a deep understanding of the system being simulated. Since mathematical models exist for such systems,they may be simulated readily on a digital computer.In addition, the systemin question is a physical one, making it a goodtarget for experiments involving aiding humans in the constructionof mental models. STEAMER combinesthese problem features to produce a graphics-oriented trainer whose underpinnings derive from computer simulation. The user of the systemis presentedwith a detailed digital simulation of a steam plant. Elaborate interactive-computer graphics permit users to inspect the simulated plant's operation at many different hierarchical levels. The trainee may use a mouse interface to vary settings of valves and other plant controls and watch how the changes affect the overall system. In addition, for example, the trainee can manipulate fluid levels, which could not ordinarily be manipulated externally in such a plant. The latitude to introduce such changes and observe their effects may be important in developingan intuition about the system.The emphasisin the graphical depictions is to provide trainees with a display that enablesthem to develop a mental model of the plant's operation similar to the mental models used by experts. The initial implementation of STEAMER relied heavily on mating a traditional digital simulation with newer AI pro-
612
MILITARYAPPLICATIONS
gramming techniques, such as object-oriented programming and active display icons. An important componentof the overall system is an object-basedgraphics editor, which includesa wide range of predefined icons for displaying various levels or their rates of change. This editor has enabled nonprogrammers to build complicated steam-plant diagrams by combining primitive icons.In addition, the editor allows systembuilders to define their own unique display icons, if necessary.Future research will be in the area of the knowledge representations necessary to represent steam-plant-operating procedures in terms of their primitive components. STEAMER has been used as a training aid in the Great Lakes Training Center and on CoronadoIsland. Preliminary results are quite encouraging; they indicate that personnel respondvery positively to the interactive system and can learn the same material in a shorter period of time than with traditional instruction methods. AutomatedNatural-Language Understandingof Military Messages Enormous numbers of operational reports are generated and transmitted as part of daily military messagetraffic. These reports range from messagesabout employment schedules, equipment failures, and weather to messagesconcerningforce deployment and readiness, tactics, and intelligence and are used at various levels throughout military command hierarchies.Typically operational reports obey strict formatting conventions but also contain important English narrative descriptions. Although current message-handlingsystemsprocessthe formatted sectionsby entering the data into appropriate fields in the system'sdatabase,messagenarrative is usually treated as adjunct information and stored in the form of remarks or comments.However, many tasks, such as messagedissemination and the recognition of messagetrends, require information in messagenarrative and consequentlymust rely on personnel performing keyword searches and visually scanning individual messagenarratives, a laborious and time-consuming process. Automation of these and other tasks in future military message systems will require computer interpretation of message content. One effort toward this end is an experimental system being developedat the Naval Research Laboratory that employs techniques of computational linguistics and AI to automatically extract information from Navy messages(29,30).Initially the system is addressinga class of operational reports about shipboard equipment failure called CASREPs (CASualty REPorts). CASREPs are an important messagetype, providing current information about ship readiness and equipment performance. They inform operational and support personnelabout equipment casualtiesthat could affect a unit's ability to perform its mission, as well as reporting the unit's need for technical assistanceand for parts to correct the failure. The experimental system uses CASREP messagecontent to assign a distribution list to each messageand to generatea summary of the equipment failure (31). To processsuch messages,the systemmust provide a representation of messagecontent that can be readily accessedand used for applications such as dissemination and summarization. This is accomplishedby a messageinterpreter that initially decomposesthe messageto determine its overall structure and then performs narrative analysis to generate the
structures that enable automated interpretation of English narrative. Messagedecompositionof reports like CASREPsis straightforward becausethe overall structure is known and report formatting conventionscan be used to extract pro forma (strictly formatted) information. However, narrative analysis-the extraction and representation of the particular types of information contained in the narrative portions of a message-is more difficult, principally becausethe structure of the information, and often much of the information itself, is implicit in the narrative. The experimental system uses an approach to narrative analysis called information formatting, originally developed at New York University (32,33). This technique employs an explicit grammar of English and a classification of the semantic relationships within a suitably restricted domain to derive a tabular representation of the information in a messagenarrative. Thus, in simplest terms, an information format is a large table, with one column for each type of information that can occur in a class of texts and one row for each sentenceor clause in the text (seealso Naturallanguage understanding). The implementation of this approach first requires the development of the information format structure through the identification of the classes of objects and the relationships among them discussedin messagetexts within the domain. For CASREPsabout electronicequipment, the objectsinclude the equipment items and their component parts, the signals and data operatedon by the equipment,the peopleand organizations who operate and maintain the equipment, and the documents involved in the maintenance process.These various classesof objectsand their semantic relationships then have their own "slots" in the data structure, so that information can be much more readily retrieved than from the original narrative. The transformation of the narrative portion of each message into a series of tabular format entries involves three stagesof automated processing:parsing, syntactic regularization, and mapping into the information format. Parsing essentially determines sentencestructure and resolves lexical ambiguity, such as usage of the word if both as a noun abbreviation for "intermediate frequency" (a frequent occurrence in CASREPs) and as the more familiar subordinating conjunction. In the secondstage the parse trees are syntactically regularized by a series of transformations to simplify the subsequent mapping into the information format. For examPle, passive assertionsare transformed into simple active assertions, some elements missing from sentencefragments are filled in, and a subject-verb-object word order is created for sentencesnot having one. The third stage of processingmoves the phrases in the syntactically regularized parse trees into the information format. The mapping processis controlled in large part by the sublanguage (semantic) word classesassociated with each word. Theseclasses,along with syntactic information about the word, are recordedin each word's diction ary entry, which is tailored to the domain. CASREP information formatting is currently beng applied to two task areas: dissemination and summary generation.In each area the experimental system contains a knowledgebase organized as a production system; productions operate on an initial database of working memory elements that includes data from both the pro forma set and the information formats. Someproduction rules reflect an understanding of the subject matter of the equipment failure reports, and others are based
MILITARYAPPTICATIONS
613
on general principles of dissemination and summarization. Taken together, the productions addresssuch matters as malfunction, causality, investigative action, uncertainty, and level of generality. Although production rules for the dissemination system act on data extracted from both the formatted portion (e.g., identity of malfunctioning equipment) and the naruative portion (e.g., in requests for assistance,the type of assistanceand from whom) of the message,rules for summarization deal only with messagenarrative (34). Typically a summary consists of a single clause extracted from a section of text, thereby reducing significantly the material that must be read for such critical uses as detecting patterns of failures for particular types of equipment. Currently each summary is generated manually by reading the entire messageand then selecting an appropriate clause from the "remarks" narrative as the summary. Using manual summanzatron expertise as the basis for its production rules, the experimental summarization system involves three steps: inference, scoring the information format entries for their importance, and finally the selectionof the appropriate (highest rated) format entry as the summary. For example,words like inhibit, impair, and preuent trigger inference rules such that if part 1 impairs part 2, one can infer that part 1 causespart 2 to be bad, and one can also infer that part 1 is bad. In scoring the various format entries, the fact that bad is a member of the class of words signifying malfunction will causeentries associatedwith part 1 and with part 2 to be promoted in importance. In addition, the entry associatedwith part 1 will score even higher becauseit is a causerather than an effect. In an early comparison of computer-generatedsummaries with those generated manually on a modest set of CASREPs, the summaries agreed on approximately 837oof the messages tested. Sometimes the summarization system generated two summary lines (as a result of a tie between two format entries), although the manual summary consisted of only one sentence. Nonetheless, one of the two computer-generated summary lines was also the manual summary.On the other hand, the most significant discrepancies(exceptwhere the crucial status word in the narrative was not in the production rule system) involved the system actually selecting more specific causal information than was indicated in the manual summary. Issues yet to be addressedin experimental system development include refinement of the format, intersentential processing,and robustness.A future option for such messagesystems is to perform message analysis at the point of transmission so that the messagesender can be aided by the system in resolving ambiguities and avoiding crucial omissions (35). This could also result in an improvement of message system capabilities by eliminating messagesof little or no information content and upgfading overall message quality.
1 1 . L. D. Erman, P. E. London, and S. F. Ficas, "The design and an example use of HEARSAY-III," Proceedingsof the SeuenthIJCAI , University of British Columbia,Vancouver,BC, Canada,409-415 (August 24-28, 1981).
BIBLIOGRAPHY
19. C. V. Srinivasan, The Use of CK-LOG Formalism for Knowledge Representation and Problem Solving in OPPLAN-CONSULTANT: An Expert System for Naval Operational Planning, NRL Report in publication, Naval ResearchLaboratory, Washington, DC, 1985.
1. A. J. Baciocco(RAdm), "Artificial intelligence and C3I," Signal 36(1), 24-28 (September1981). 2. B. P. McCune and R. J. Drazovich,"Radar with sight and knowledge,"Def. Electron. (August 1983). 3. P. J. Klass, "DARPA envisionsnew generationof machine intelligence technology," Auiat. Wk. Space Technol. 122(10, 46-84
(April 22, 1985). Also Neut-Generation Computing Technology: A Strategic Plan for its Deuelopment and Application to Critical Problems in Defense,DARPA, Arlington, VA, October 28, 1983. 4 . R. G. Smith, "Report on the 1984 Distributed AI Workshop," A/ Mag.6(3), 234-243 (Fall 1985). D . R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, Palo Alto, CA, 1983. 6. R. P. Bonasso,Jr., ANALYST: An Expert System for Processing Sensor Returns, MTP-83W 00002, The MITRE Corporation, Mclean, VA, 1984. 7. M. Minsky, A Framework for RepresentingKnowledge,AI Memo 306, MIT AI Laboratory, L974. 8. H. P. Nii, E. A. Feigenbaum,J. J. Anton, and A. J. Rockmore, "Signal-to-symbol transformation: HASP/SIAP case study," A.I Mag.3(1), 23-35 (Spring 1982). 9. H. P. Nii, E. A. Feigenbaum,Rule-BasedUnderstanding of Signals, in D. A. Waterman and F. Hayes-Roth (eds.),Pattern-Directed Inference Systems,Academic Press, New York, pp. 483501,1978.
10. L. D. Erman, F. Hayes-Roth,V. D. Lesser,and R. D. Reddy,"The HEARSAY-II speech understanding system: integrating knowledgeto resolveuncertainty," ACM Comput.Suru. l2(2),213-253 (1980).
L2. K. E. Caseand H. C. Thibault, A Heuristic Allocation Algorithm with Extensions for Conventional Weaponsfor the Marine Integrated Fire and Air Support System, School of Industrial Engineering and Management, Oklahoma State University, Stillwater Oklahoffio, September 1977. 1 3 .J. R. Slagle, E. J. Halpertr, H. Hamburger, and R. R. Canton€,A Decision Support System for Fire Support Command and Control, IEEE Trends and Applications ConferenceProceedings,National Bureau of Standards,Gaithersburg, MD, pp. 68-75, May 25-26, 1983. L4. J. R. Slagle and H. Hamburger, "An expert system for a resource allocation problem," CACM 28(9), 994-1004 (September1985). 15. R. O. Duda, P. E. Hart, K. Konolige, and R. Reboh,A ComputerBased Consultant for Mineral Exploration, Artificial Intelligence Center, SRI International, Menlo Park, CA, September1979. 16. J. R. Slagle, M. W. Gaynor, and E. J. Halpern, "An intelligent control strategy for computer consultation," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-6, 129-136 (March 1984). 17. C. Engelman, J. K. Millen, and E. A. Scarl, KNOBS: An Integrated AI Interactive Planning Architecture, Computersin AerospaceIV Conference,American Institute of Aeronautics and Astronautics, Hartford, CT, (October24-26, 1983. 18. G. Courand, C. O'Reilly, and J. Payne, OCA (OffensiveCounter Air) Mission Planning, Advanced Information and DecisionSystems, AI/DS-TR-3050-1,Mountain View, CA, 1983.
20. J. J. King, Artificial Intelligence Techniquesfor Device Troubleshooting, Computer Science Laboratory Technical Note Series Hewlett Packard,Palo Alto, CA, AuCSL-82-9(CRC-TR-82-004), gust 1982. 2t. W. R. Simpson and H. S. Balaban, The ARINC ResearchSystem
614
MINIMAX PROCEDURE
Testability and Maintenance Program (STAMP), Proceedings of the 1982 IEEE Autotestcon Conference, Dayton, OH, October 1982. 22. R. R. Cantone, F. J. Pipitone, w. B. Lander, and M. p. Marrone, Model-Based Probabilistic Reasoning for Electronics Troubleshooting, Proceeding of the Eighth IJCAI, Karlsruhe, FRG, August 22-26,1983, pp. 207-21I. 23. K. DeJong, Applying AI to the Diagnosis of Complex System Failures, Proceedings of the Conference on AI, Oakland Universitv. Rochester, MI, April 1984. 24. F. Pipitone, An Expert System for Electronics Troubleshooting Based on Function and Connectivity, IEEE First Conference on AI
Applications,Denver, CO, December1984,pp. lB3-188. 25. F. Pipitone, "The FIS electronicstroubleshootingsystem," C o m puter, l9(7), 68-76 (July 1986).
Proceedings of the Army Conference on Application of Artificial Intelligence to Battlefield Informotion Management, Battelle Columbus Laboratories, Washington, DC, April 20-22, 1983. R. Shumaker and J. Fl. Franklin, Artificial Intelligence in Military Applications, Signal Magazine 4O(10),29 (June 1986). The AI Magazine, an official publication of the American Association for Artificial Intelligence (AAAI), Menlo Park, CA, published quarterly. Proceedings of the International Joint Conferences oruArtificiat Intelligence (IJCAI), which are held biennially (odd-numbered years) since 1969, every 4xk* 1 year in the united states. Proceedings of the American Association for Artificial Intettigence Conf e r e n c e s ,s i n c e 1 9 8 0 , e v e r y 4 x k , 4 x k * 2 , and 4xk*J years.
26. K. A. Haller, J. D. Zbytniewski, K. Anderson, and L. Bagnall, Smart BIT, Rome Air Development Center Report RADC-TR-85, June 1985. 27. H. Lahore, Artificial Intelligence Applications to Testability, Rome Air DevelopmentCenter Report RADC-TR-84-208,October 1984. 28. J. D. Hollan, E. L. Hutchins, and L. Weitzman, "STEAMER: An
'#::;:ff;
r::;;ilt*-:it?iion-based
trainingsvstem"' 41
29. E. Marsh, J. Froscher, R.Grishman, H. Hamburger, and J. Bachenko, Automatic Processing of Navy Message Narrative, NRL report in publication, Naval Research Laboratory, Washington, DC, 1995. 30. E. Marsh, Utilizing Domain-Specific Information for Processing Compact Text, Proceedings of the Conference on Applied I{atural Language Processing, 1983, pp. gg-1090. 31. J. Froscher, R. Grishman, J. Bachenko, and E. Marsh, A Linguistically Motivated Approach to Automated Analysis of Military Messages,Proceedings of the 1983 Conference onAl, Oakland University, Rochester, MI, 1983. 32. N. Sager, Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base. in M. C. Yovits (ed.), Aduances in Computers, Vol. 17, Academic Press, New York, pp. 89- IGz. 33. N. Sager, Natural Language Wesley, Reading, MA, 1981.
Information
Processing, Addison-
34. R. Granger, "The NOMAD system: Expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text," Am. J. Computat. Ling. g(3*4), 188-196 (1984). 35. E. Marsh, H. Hamburger, and R. Grishman, A production Rule System for Message Summarization , Proceedings of the lt{ational Conference on Artificial Intelligence (AAAI -84), University of Texas, Austin, August 1984, pp. 243-24G.
General
References
Discussion
and documentation
of military
applications
of AI
are appearing in an increasing variety of sources. Defenseoriented popular publications such as Signal, Defense Electronics, and AuiationWeek and Space Technology provide general articles on current or proposed military AI applications. Papers and reports from defense-research laboratories, industrial defense contractors, and defense-funded university research groups remain the primary source for the more technical descriptions of these applications. However, as service-sponsored conferences and symposia on AI become more numerous, their published proceedings, along with those from the well-known AI national and international conferences, are providittg additional valuable references for military applications. Examples include the following.
J. FnINKLTN Planning ResearchCorp. Launa Devls RnNoeLL SHUMAKER NRL Paul Monawsxr Mitre Corp.
MrNIMAX pRoc EDu RE At least two other entries in this volume (Game playing and Alpha-beta pruning) have discussed the idea of minimax search as it is commonly used and understood in AI. This entry places the idea of minimax in context (as a very convenient simplification useful in special cases) with the rest of the field of game theory (including the way it is used in economics) (1). To do this, it is necessary to go a little deeper into the discussion of the idea of a game. General Model of Games Although the use of game trees (see Game trees) to model games is almost universal in AI, the game tree cannot reflect all the aspects of games in general. consider the game of bridge. Each team consists of two players (since their interests are the same), but since the two players in the team do not have the same information (each player knows what cards he holds but does not know what cards the partner holds), the game tree does not quite tell the whole story of the game. There is still a game tree, of course. The deal (a chance move) starts the game. The player on the move knows that the deal is one of the 39!/(13!13!13!)possible ones that give him the hand he is holding. What the bridge player decides to do is not based on the game state, &s in chess, but on his imperfect knowledge of the game state. Imperfect knowledg" does not come only from chance moves, &s in bridge. In Kriegspiel chess is played on two boards, one before each player. Neither player can see the position of the other player's pieces, although, if he makes an illegal move in view of the opponent's position, the umpire tells him. So his knowledg" of the opponent's pieces is imperfect. Another phenomenon occurs in Kriegspiel that makes the game different from the kind normally encountered in AIthe players do not play alternately. If a player makes an illegal move and is so informed, he can try again. The illegal move, then, yields information but allows him to play again: He can make more than one move in sequenceusing the illegal moves as "probes." Returni.rg to bridge, &player's knowledge of the opponent,s
MINIMAXPROCEDURE 615 hands as well as of his partner's hand increases during the bidding, after the dummy is laid down, and as the cards fall during the plays. The point is that one decideson moves in a general game, not on the basis of the state, but on the basis of his knowledge that he is in one among a possible given set of states. The result of a player's move places him in one of a set of states. This set may be smaller than the set reachable from the original set. Extra information may be gained on the basis of what was learned during the move. The umpire decides,oil the basis of the rules of the game, what the player is supposedto know. Strategiesand Payoffs In the usual games consideredin AI, a strategy is an initial decision by the player as to what moves would be made at which state. If the reader doesnot feel that the idea of a strategy is realistic (that the player has to make a decision on the move only when he is on move),he would do well to think of a strategy as a game-playing program itself. It is important to note that given the strategy of all players in the game, the outcome of the game is determined. The conceptof strategy can be used in the general casealso but with some modification. The strategy now determines a move on the basis not of the state but of the knowledge of the player on move about the state. One may objectthat the move chosenby a player may not even be a valid move, given his imperfect knowledge; but this can be counteredby saying that the very invalidity of the move is a piece of information the player can use to enhancehis knowledge.With Kriegspiel this point has already been made; however, for the mathematical ramification of the idea, see the rather complex set-theoretic discussionon which von Neumann and Morgenstern (1) based these concepts. One of the stipulations made by von Neumann and Morgenstern was that at the beginning of the game the player on move knows that it is the beginning of the game. So his choice is with complete information, and the result of his move is known to the umpire. If the first move is a chance move, at least there is a probability-distribution known. One can proceedby induction from here to show that once each player's strategy is known, including a probability distribution over all the chance moves, a probability distribution over all the leaves of the game tree is known. With the payoff to each player specifiedby the rules at each leaf, there is an expectedpayoff of each player known as a function of the ntuple of strategies,one chosenby each player. Given the rules of a game, one can encapsulateits essential structure into what is called a game in normal form. This is a game where each player, instead of choosinga move when his turn arises, is asked at the beginning of the game what his strategy is, i.e., what procedurehe will use to choosea move given any state of information that may arise in the game. He makes this choice without any knowledge of what strategies are chosen by all the other players. On the basis of these choices,the payoff to each player is determined. That is, the game is now encapsulatedinto n tables, one for each player, of his payoff as a function of nvariables (the strategies),of which he can control only one. The sizes of these tables are enormous. The entry Game playing has discussedthese sizes. But for the present ignore the practicality of this table and consideronly what one would do with it if it was accessible.
Zero-SumTwo-PersonGames Specializations: Game theory has been studied in economicsmostly in terms of the normal-form games. The problem that is posedis the following. Given n different n-dimensional matrices, each of size ' x kn, the first player choosesan integer between ht x kz x 1 and k1, the secondplayer between 1 and k2, etc.The payoff to each player is given in the corresponding cell of his matrix. How would a player, given his matrix, decidewhat his choice should be? The last word has in no way been said with regard to the answer to this problem: It is not even clear as to what is meant by anybody's "best" choice.The reader can be given a whiff of what is involved from the well-known game Prisoner's Dilemma: Two men have robbed a bank and have been arrested on suspicion. Both of them have been given the option of confessing and bearing witness against his partner in return for full pardon. If neither confess,there is not sufficient evidence against them for the entire crime but enough to send them to jail on lessercharges.If both confess,they are both convicted. If one of the men doesnot confess,he gets convictedif the other confesses.So, if the two coursesof action chosenby the two are "confess"and "don't confess," the 2 x 2 payoff matrix for each is shown below. Confess Don't Confess Don't
30 30
0 10
In this matrix the two rows refer to the action of the prisoner whose payoff is seen here. Not confessingis a good way for the two to get lesser charges. But if the other prisoner, using that strategy, doesnot confess,there is great advantage to confessingand getting the full pardon due a state witness to a major crime. However, that is not a good idea if the other person confesseson the same argument; then total conviction is certain. Leaving aside the unsolved questions of game theory, return to the case where a few things are known or, at least, agreed upon. This is the casewhere there are two players and where one's gain is always the other's loss, i.e., a zero-sum game. To discusszero-sumgames,the matrix aboveis used again, but its interpretation is changed. The payoff shown is once again that of the player whose choicesare shown as the rows. The payoff of the player whose choicesare shown as columns, however, has the negative of the numbers shown on the matrix. Lookin g at the above matrix from that point of view, one notes that the opponentnow has a vested interest in giving up as little as possible.So if the first row (which is called Confess in the previous discussion;the story is changednow) is chosen, the opponent'sbest move is Don't, yielding zeroto the player. If the player's choiceis Don't, the minimum he can have is 10. So his safest move is Don't since this choice gives him the greatest value of minimum to which he can be pushed by the opponent. Similarly, the opponent can lose 30 if he choosesConfess and only 10 if he choosesDon't. So he minimizes his maximum Ioss,playing Don't and losing 10. Thus, if both sides move conservatively, they would both play Don't. This is a rather stable situation, different from the previous one; this is because,unlike the previous case,no co-
616
MINIMAX PROCEDURE
operation is possible between the two players: One gains exactly what the other loses. The discerning reader will see that this also is really a rather special case: The maximum over the row minima is exactly the minimum over the column maxima. The matrix has a saddle point. In matrices without saddlepoints the players (if they are to play more than one game) can play different strategies in different games.In this caseone gets a mixed strat egy,given by the chosen probability distribution over the different strategies. It can be shown in this casethat there is a saddle point over the mixed strategies,i.e., that there are mixed strategies p and q of the two players such that the expectedpayoff over the two strategies satisfy:
pav orr(p'q) t n ; #il JJ:ifl1o" t"l;ii ff :?iq'Jtt ffi"#,9',\ However, this is not neededfor the kinds of games people consider in AI most of the time, when the strategy maps state to move rather than knowledge to move. It can be shown that in these casesthe minimax value as calculated over the entire game graph is the same as the minimax value over all strategies, and this value is indeed the saddlepoint of the normalform matrix. The strategy for which this saddle point is obtained over the matrix maps each state into exactly the same move dictated by the minimax search of the game tree. It may be worthwhile to illustrate the point by considering the three-stepgame shown in Figure 1. Here a,the first move, is the maximizing player's move (i.e., the move of the player whose payoff is given at the leaves of the game tree); A andB are the opponent's moves (the normal "alternating move"); and b, c, d, and e ate the maximuzer'smove again, leading to the leaveswhosepayoffs are as shown. The minimizer's strategies are specifiedby whether the left or the right branch are taken at the points A and B. A strategy where he would go left at A and right at B is to be denotedby LR. There are thus four possible strategies of the minimizrng player. Similarly, the maximizer's strategies are given by the left and right choices
a A
.,,\\ , t
,'
tt
'4.
/,/\
Figure 1. A three-step game.
made at the points a-e. There are 32 possiblestrategies,de-oted by LLLLL to RRRRR. It can be seen that if the maximizer choosesthe strategy LLLLL and the minimrzer choosesLL, the game will be at A after the first move, at b after the secondmove, and so end up at the leaf with value 1 after the third move following the strategy of choosing L at b. Similarly, the strategies LLLLL and RR would yield the value 3. Table 1 shows the 32 x 4 payoff matrix correspondingto the various strategy pairs. Notice that this matrix has various saddle points with value 6; they are at the intersections of the rows R--R-,with all choices at the nodes b, c, and e and of the columns LL and RL. These are exactly the values and the strategies that the extendedform minimax would yield also: The maximuzerchoosesthe right branch at the first move. It is to the minimizer's advantage to take the left branch, forcing the game to the lower values, after which the maximrzer obtains the larger of the two payoffs by taking the right branch again. This latter analysis, well known in AI and discussedin the entry Game playitg, can be clarified if the following is noted. The maximizer's value of the nodes b-e are 2, 4, 6, and 8, respectively.The minimizer's value at A, being the minimum of 2 and 4, rs 2. Similarly, the minimizer's value at B is 6. So the maximizer's value at a, the larger of 2 and 6, is 6. So the maximrzer in turn plays R to B, the minimrzer answers with L to node d, and the maximizer gains 6 by playing R again.
Table 1. Strategies Minimizer
Maximrzer LLLLL LLLLR LLLRL LLLRR LLRLL LLRLR LLRRL LLRRR LRLLL LRLLR LRLRL LRLRR LRRLL LRRLR LRRRL LRRRR RLLLL RLLLR RLLRL RLLRR RLRLL RLRLR RLRRL RLRRR RRLLL RRLLR RRLRL RRLRR RRRLL RRRLR RRRRL RRRRR
2 2 2 2 2 2 2 2 5 o 6 6 5 D
6 6 5 5 6 6 5 5 6 6
133 133 133 133 144 144 144 144 233 233 233 233 244 244 244 244 757 858 767 868 757 858 767 868 757 858 767 868 757 858 767 868
MODAr toctc Table 2. Payoff Matrix for Maximizer.
LL LR RL RR
4 4 5 D
-2 10 -2 10
10 -2 10 *2
An illustration of what the situation may be when the information is incomplete clarifies some of the ramifications of the von Neumann-Morgenstern formalism: A two-step game, started by the minimizer, who has three choicescalled 1, 2, and 3. In answer, the maximuzet can make one of two moves, called L and R. So there are six possibleplays, LL,2L,3L, 1R, 2R, and 3R. After the minimizer's choice, the game is restricted to the plays lL and 1R if the minimizer plays 1 and to two other correspondingsets if she plays 2 or 3. However, the maximrzer, in her turn, may not be informed as to what the minimrzer played: If she played 1, the maximizer is told that. In the two other casesthe maximuzeris not informed what the move was, so the maximizer (from the fact that no information was given) can surmize that either 2 or 3 was played. So, even though the umpire would know that the game has been restricted to (say; 2L and 2R, the maximuzer would only know that the game has been reducedto 2L, 2R,3L, and 3R. After her move, the game reducesto a leaf, which may be either of 2L or 3L if the maximtzer plays L or to 2R or 3R if she plays R. So if the minimizer doesnot play 1, the maximizer choosesher own move on incomplete knowledge.After her move the actual play is determined however. In this casethe minimrzer has three strategies, each determining her first move. The maximuzer has four strategies, choosing L or R depending on her state of knowledge. If the values of the six plays above are 4, 5, -2, 10, 10, and -2, respectively,the payoff matrix for the maximuzeris as given in Table 2. The columns indicate the minimizer's choice. The rows correspondto the four maximizer's choices.For example, the stratery LR correspondsto when the maximizer decidesto play L in reply to 1 and to play R otherwise. Notice that the matrix has no saddle point. The maximum of the row minima occur at each row as -2. The minimum of column maxima are in column 1, at the values 5. In a game with complete information the minimax value would clearly be 5. As said earlier, most of the ideas describedhere are not of direct applicability to AI. They appear here merely to place the AI work in context with the rest of game theory. Certain ideas of game theory applicable to AI only use minimax indirectly. These are describedin Ref. 2. References3-5 point out certain limitations to minimax and suggestalternatives and improvements.
617
4. G. M. Baudet, "On the branching factor of the alpha-beta pruning algorithm," Artif. Intell. lO, 173 (1978). 5. G. A. Stockman, "A minmax algorithm better than alpha-beta?" Artif. InteII. L2, L79 (1979). R. B. BeNprur St. Joseph'sCollege The preparation of this paper was supported by the National Science Foundation under grant MCS-82L7964and forms a part of an ongoing research on Knowledge Based Learning and Problem Solving Heuristics.
MODAL LOGIC Modal Logic goes back at least to Aristotle, but the current phaseof its developmentbegins with the work of the American logician, C. I. Lewis (1883-1964) (1), who defineda number of axiomatic systems for it. Modal and related logics like temporal logic, dynamic logic, and the logics of knowledge and belief have recently acquired importance for computer science, Iargely because of potential applications in AI and program correctness. Modal Logic is concernedprincipally with the notion of necessityand its companionnotion of possibitity.A propositionis said to be necessarilytrue if it could not be the casethat it was false. Thus, the propositions "everything that is green is colored" or "two plus two is four" could not possibly have been false (or so it seems)and hence are not just true, but necessarily so. By contrast, the proposition "Ronald Reagan is President of the United States in 1985" is true but might have been false since he might have lost the election to Mondale. In other words, it is not necessarily true but only contingently so. The formul a JA indicates that A is necessary. A proposition is possible if it is not necessarythat it be false. Then every true proposition is possible, but not vice versa, and every necessary proposition is true but not vice versa. That A is possibleis written AA. Note that OA is equivalentto-f-/'. Lewis was interested in the notion of necessity becauseof his dissatisfaction with material- or truth-functional implication. Using -+ to indicate material implication, it is true that (pigs lack wings) + (you are reading this entry). But there is no connection between the two facts, and Lewis felt that the intuitive notion of implication was not adequately expressed by -. Lewis proposedto remedy this defect by introducing the = B is an abbreviation for strict implication 1, where A I(A+B). Thus, A strictly implies B if it is impossiblethat A be true and B false. However, there are paradoxes of strict implication analogousto those of material implication, and it is not clear that Lewis's attempt was wholly successful. Formalismsfor Modal Logic
BIBLIOGRAPHY 1. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behauior, Princeton University Press, Princeton, NJ, L947. 2. R. B. Bane{i, Artificial Intelligence: A Theoretical Approach, North-Holland, Amsterdam, 1980. 3. D. S. Nau, "Patholog'yon game trees revisited, and an alternative to min-maxing," Artif. Intell. 2L, 222 (1983).
The language of propositional modal logic is obtained from that of the propositional calculus by adding the operator I for necessity.Then the set of formulas of propositional modal logic will be obtained from some propositional atoms P, 8, etc., by closure under truth-functional connectives(say - and v) and the operator I. Note now that under this interpretation, the formula tr(Pv-'P) will be true sinceP is necessarilytrue or -P, which says that either P is false, but the formula lPvf
618
MODAr toctc
necessarilytrue or it is necessarilyfalse,may not be true since P might have been contingently true. Axiomatic Systems.The system T (sometimescalled M) is due essentially to Gddel and Feys (2).It has the languagejust described above, with axioms consisting of all propositional tautologies (or enough of them) together with the two axiom schemes (A1) [,A+A (A2)
f(A--+B)-IA+IB
These schemessay, in effect,that every necessaryproposition is true and that if one proposition A necessarilyimplies another, B, the necessity of A implies that of B. The rules of inference are modi ponentes (derive B from A and A+B) and necessitation(derive IA from A). There is a subsystemK of T that lacks axiom scheme(A1) and is used as a basis for deontic logic. In deontic logic the symbol I would not represent necessitybut desirability. In this casethe axiom scheme(A1) would be unsuitable since a propositionthat is desiredmay well be false. Again, in a logic of knowledge,tr will stand for "is known," and such a logic will tend to have axiom scheme(A1) since a propositioncannot be known unless it is true. However, in a logic of belief, I would stand for "is believed," and (A1) would not be wanted since false propositions may be believed. [In practice, these logics tend to use symbols other than I and O, but a uniformity of notation makes the comparison easier (see Refs. 3-5 for a discussionof modal logics of knowledge and belief).1 There are two systemsstronger than T, namely the systems S4 and S5 of Lewis. 54 is obtained from T by adding the scheme
The models as described above are the K models. In T modelsthe relation R is required to be reflexive. 54 modelsare obtained by restricting ^Rfurther, to be also transitive, and for 55 modelsit must be an equivalencerelation, i.e., be reflexive, symmetric, and transitive. Then each of the four axiomatic systems is complete for its particular models. That is, a formula A is a theorem of S, where S is any of the systemsK, T, 54, and 55, iff it is true in all S models at all states (b,6).
TemporalLogic Temporal Logic (with linear time) is a casebetween54 and 55. In it W'is the set of instants of time, and R is the before-after relation so that sBt holdsjust in casef either equals or comes after s. Since R is reflexive and transitive in this case, all theorems of 54 will hold. However, there will also be some additional laws that are not theorems of 54. For example, all formulas OD A+ IO nA will be valid, but they are not theorems of 54. See Ref. 8 for a computer-science-orientedintroduction to temporal logic. All the systems K, T, 54, and 55 have the finite-model property. If S is any of the four systems,I formul a A has an S model iff it has a finite S model. This fact, together with the completenessof the appropriate axioms, yields decidability. However, Ladner (9) has shown recently that the system S5 is I{P complete, whereas the systems K, T, and 54 are PSPACE complete.Thus, any implemented decisionprocedurecan only deciderelatively short formulas.
First-OrderModal Logic
It is easy enough to obtain a language for first-order (or quantified) modal logic. One simply adds the modalities I and O to the usual language for first-order logic. One can also define a so that all propositions that are necessary are necessarilynecsemantics for first-order modal logic as an enrichment of that essary.It follows at oncethat all propositionsthat are possibly for propositional modal logic. Frames are as before,but each possibleare possible,i.e., state s is now a model of ordinary first-order logic, and the (A3',) OOA+OA semantics of first-order logic can be easily extendedto one for first-order modal logic. The new enriched states are now the and indeed, (A3) and (A3') are equivalent. The yet stronger possibleworlds of first-order modal logic. system Sb can then be obtained from 54 by adding the axiom Unfortunately, there is little agreement on what these posscheme sible worlds might be like and henceabout the right first-order (A4) OA+IOA modal logic. SeeRef. 7 for someof the issuesthat arise. Below is one that has some interest even for nonphilosophers. There is a semantics for modal logics, due chiefly to Kripke The law a - b+(A(a)-A(b)), substitutivity of equals,is a (6,7), that brings out the differencesbetween the systemsin a fundamental principle governing identity. However, Quine striking way. (10) has pointed out that this law seems to fail in contexts involving modalities, knowledge,or belief. For example, it is Kripke Models true that the number of planets equals 9. It is also true that 9 is necessarily greater than 7. However, it is not true that the A Kripke model is basedon a frame <W,R), where I4zis a set of number of planets is necessarilygreater than 7. Similar exampossibleworlds or, more prosaically,a set of states.Individual ples exist where "necessarily" is replaced by "John knows states are denoteds, t, . .. . Here R is a binary relation on W that" or "Mary believes that." Contexts in which the law fails called the accessibilityrelation and sRt is read ", is accessible are sometimes called referentially opaque, whereas those in from s." I.Jsually,there is a special state r that stands for the which it holds are called referentially transparent. real world or the start state. The model M is then obtained There is an axiomatization of first-order modal logic in confrom the frame by assigning a truth value to each atom P at stant-domainmodels,i.e., modelsin which all worlds have the each state s. The truth value u (4, s) of an arbitrary formul a A same individuals, which is due essentially to Ruth Barcan at each state s is then defined as Marcus (7). The axiomatizatton includes axioms for first-order u(AvB, s) _ true iffeither u(A, s) - true or u(^B,s) - true logic and laws inherited from propositional modal logic of the appropriate kind. It also includes the well-known Barcan foru(-A, s) - true tff u(A, s) _ false rnula and its converse.These last two say that the universal u(IA, s) - true ifffor all r such that sBt, u(4, t) - true
(A3)
IA+trlA
MORPHOTOGY
quantifier commutes with f , so that nVrA(r) is equivalent to VrIA(r) (seeRef. 6 for details). 6.
Dynamic logic
7.
Consider a world W whose states s are the possible states of some actual or abstract computer. The formulas A of the language express properties of individual states. Now each program o, consideredas an IIO relation, is a binary relation on W and generatesa modality lal. To be precise,let sBof mean that there is an execution of the program a that begins in the state s and terminates in f. Then the state s satisfies the formula lalA if, for every f such that sBot, t satisfiesA. Since there are infinitely many programs, there are infinitely many accessibility relations and hence infinitely many modalities. These modalities all satisfy the laws of the logic K and other laws that depend on the particular program o. Intuitively, lalA is the property that has to hold now so that A must hold if and when the program a terminates. The formula (alA, where (a) is -[o]-, says that the property A may hold after o terminates. Since programs are allowed to be nondeterministic, [ ] and ( ) are distinct. However, there are interactions not only betweenthese modalities and the usual logical notions but also among themselves.For example, if a and b are two prograffis, and they are composedto form a third program c : a; b, the formula [c]A is equivalent to the formul alal[b]A. A more interesting example expressesa fundamental property of the "while do" construct. If B is a formula, and d ts the program "while B do a," then the formul a ldlA is equivalent to the formula
8.
9.
puter ScienceNo. 193, Springer-Verlag,New York, pp. 156-168, 1985. S. Kripke, "Semantical considerations on modal logic," Acta Philos. Fenn. 16, 83-94 (1963).Also in Ref. 14, pp. 63-72. G. E. Hughes and M. J. Cresswell,An Introduction to Modal Logic, Methuen, London, 1968. Z. Manna and A. Pnueli, Verification of Concurrent Programs: The Temporal Framework, in Boyer and Moore (eds.),The Correctness Problem in Computer Science, Academic Press, New York, pp. 215-273, 1982. R. Ladner, "The computational complexity of provability in systems of modal propositional logic," SIAM J. Comput. 6, 467-480 (Le77).
10. W. V. Quine, Referenceand Modality, in From a Logical Point of View, Harper & Row, New York, pp. 139-L57,1961.Also in Ref. 14, pp. L7-34. 11. V. Pratt, Semantical Considerationson Floyd-Hoare Logic, Proceedingsof the SeuenteenthAnnual IEEE Symposiumon Foundations of Computer Science,IEEE Computer Society, Piscatawoy, NJ, pp. 109-L2L, 1976. L2. D. Harel, First Order Dynamic Logic, Lecture Notes in Computer ScienceNo. 68, Springer-Verlag,New York, 1979. 13. L. McCarty, Permissions and Obligations, Proc. of the Eighth I JCAI, Karlsruhe, FRG, 1983, pp. 287-294. 14. M. Fittitg, Proof Methods for Modal and Intuitionistic Logics,D. Reidel, Boston, MA, 1983. 15. L. Linsky, Referenceand Modality, Oxford University Press,New York, 1971. 16. A. Prior, Logic, Modal, in P. Edwards (ed.),Encyclopediaof Philosoph!, Vol. 5, Collier-MacMillan, New York, 1967. R. Penrrn Brooklyn College, City Universitv of New York
(-B &A) v (B &IaltdlA). It is now possibleto expressthe partial correctionassertion {A } a{B}, where A and B are formulas and o is a program. The assertion {A } a{B} says that if the formul a A holds before the program a begins,then B must hold if and when o terminates. This fact can be expressed by the dynamic logic formula A+lalB. Thus, dynamic logic becomesan effective tool for studying the properties of programs. However, it also has a potential for applications in the logic of actions and in formalrzing legal reasonirg, both of which are areas with relevance to AI. SeeRefs. 11 and 12for further reading on dynamic logic and Ref. 13 for an application of dynamic logic to legal reasoning. References 14-L6 contain a more detailed treatment of modal logic and various issues connectedwith it.
BTBLIOGRAPHY
619
MORPHOLOGY Morpholory describesword formation, i.e., inflection, derivation, and compounding.A base form of a word, e.g., reach can be inflected in a paradigm of forms (reaches,reached,reaching), and new words related to it can be producedusing derivational affixes (reachable,reacher, unreachable,etc.) (1). Morphology relies on a lexicon, which contains entries for a set of words. It consists of rules for handling derived and inflected forms by relating them to existing entries in the lexicon. Aspectsof Word Formation
Any AI system using natural language has to recognize in1. C. I. Lewisand C. H. Langford,SymbolicLogic,Dover,Mineola, flected words, but the effort required for implementing a morNY, t932. phological component greatly varies from language to lan2. K. Gddel,CollectedWorks,M. Feferman et al. (eds.),Oxford Uniguage and also dependson the extent of the vocabulary used. -302. versity Press,New York, 1986,pp. 301 Originally published English has an extremely simple inflectional system, and natas "Eine Interpretation des Intuitionistichen Aussagenkalktils," ural-language interfaces with a restricted English vocabulary ErgebnisseeinesMathematischenKolloquiums 4, 34-38 (1933). 3. J. Hintikka, Knowledge and Belief, Cornell University Press, often ignore morphology altogether by listing all distinct word forms in their dictionary (2). Even in the caseof English, proIthaca, NY, L962. nunciation of derivations is phonologically complex enough to 4. J. Halpern and Y. Moses,Knowledge and Common Knowledge in deserve a careful treatment in speechsynthesis and recogniaByzantine Environment, Proceedingsof the Third Annual Symtion (3). Other languageslike German, Swedish,and Russian posium on the Principles of Distributed Computing, Associationof have more elaborate inflectional systeffis,and the listing of all Computing Machin€ry, New York, pp. 50-61, 1984. derivations, compounds,and inflectional forms is impractical; 5. R. Parikh and R. Ramanujam, Distributed Processesand the in Finnish it is quite impossible becauseevery noun can be Logic of Knowledge, inLogics of Programs, Lecture Notes in Com-
620
MOTIONANATYSIS
inflected in 2000-odd distinct forms and every very verb in some 12,000forms. Word formation consists of three parts: specifying the meaning of the resulting entry from the meanings of the components, specifyittg the components (word roots, derivational and inflectional affixes) and the order in which they may be combined with each other, and the shape in which these components are realtzed in the actual written or pronouncedword form. The meaning can be describedin terms of features and values using templates and unification (4). The secondtask defines the morphotactic structure of words and the third task consists of rules governing the phonological and morphophonological alternations, e.g., in English nominal stems (book, bus, sky) may optionally be followed by a plural suffix (+s): book+s bus* s sky* s
books buses skies
The shape of the stem affects the realization of the plural ending (esaftet s, sh, ch, Jc,z, !, otherwise s), and the presence of the plural affects the realization of the stem (the final y shows up as i). MorphologicalAnalysis
Rules are compiled into finite-state machinesby a compiler, or they can be hand coded directly as finite-state automata. There are several implementations of the two-level model at least in Pascal and in most dialects of LISP. These programs accepta lexicon system and a set of rules as input and are then ready for analyzrng word forms of that particular language. Descriptions exist for several languages (English, French, Finnish, Swedish, Old Church Slavonic, Romanian, Polish). The two-level model is bidirectional, i.e., the programs can both analyze and generate word forms using the same rule automata.
BIBLIOGRAPHY 1. P. H. Matthews,Morphology.An Introductionto the Theoryof Word Structure, Cambridge University Press, Cambridg", u.K., r974. 2. T. Winograd, Language a,sCognitiueProcess,Vol. 7, Synfar, Addison-Wesley,Reading,MA, pp. 544-549,1983 3. K. Church, StressAssignment in Letter to SoundRules for Speech Synthesis, Pro ceedingsof the Twenty -Third Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, pp. 246-253, 1985. 4. S. Shieber, The Design of a Computer Language for Linguistic Inforrnation, Proceedingsof the Tenth International Conferenceon Computational Linguistics (Coling84), Association for Computational Linguistics, Morristown, NJ, pp. 362-366, 1984. 5. H. L. Resnikoff and J. L. Dolby, "The nature of affixing in written English," Mechan. Transl. S(3), 84-89 (June f965); 9(2), 23-33 (June 1966). 6. J. B. Lovins, "Development of a stemming algorithm," Mecha,n. Transl. lO 22-31 (1969).
Morphological analysis is often carried out with language-specific procedureswith little reference to linguistic theories. A straightforward method is to proceed by stripping endings from the end of the word form and by tentatively undoing morphological alterations until a stem in the lexicon is reached(5-7). The problem with these methodsis that inflection has to be described in rather artificial terms. The treatment of ambiguous word forms may be defective, compounds 7. M. Kay and G. Martins, The MIND System: The MorphologicalAnalysis Progr&ffi,Memorandum RM-626512-PR,The RAND Corwritten without an intervening space(as in German) need ad poration, Santa Monica, CA, 1970. hoc procedures,etc. Inflectionally simple languages like En8. C. Sloat, S. H. Taylor, and J. Hoard, Introduction to Phonology, glish can successfullybe handled with thesemethods,but comPrentice-Hall, EnglewoodCliffs NJ, 1978. plex languages such as Sanskrit and Arabic are clearly beyond 9. M. Kay, When Meta-rules Are Not Meta-rules, in K. Sparck-Jones their scope. and Y. Wilks (eds.),Automatic Natural Language Parsing, Ellis Within linguistics, it has been taken for granted that morHorwood, Chichester,U.K., pp. 94-L16, 1983. phology should be describedaccording to principles universal 10. K. Koskenniemi, A General Computational Model for Word-Form enough to apply to all natural languages. The formalism of Recognition and Production, Proceedings of the Tenth Internagenerative phonology satisfies this requirement (8). Generational Conferenceon Computational Linguistics (Coling84), Assotive phonology (see Phonemes) has proved to be difficult to ciation for Computational Linguistics, Morristown, NJ, pp. 178implement as an efficient algorithm (see,however, Ref. 9). 181,1984. K. KosKENNTEMT University of Helsinki
GeneralMethodsfor Analysis A computationally feasible general approach to describe the processof word formation is provided by the two-level model (10). It consistsof a lexicon system and a rule component.The lexicon system has a set of lexicons, some for the word roots and others for various classesof inflectional and derivational endings.In addition to lexical representationsof the units, the system lists the coruectsequencesin which they may occur. A linkage mechanism using continuation classeshandles compounding, derivation, and the stacking of various classesof endings. The rule componentdefineshow the lexical representations are realized on the surface.All rules operate in parallel without intermediate stages. The realtzation of sky*s as skies on *:e s.'sand the surfaceis governedby rules like y:i <: ) *:e
MOTION ANATYSIS Determining the relative motion between an observerand his environment is a major problem in computer vision (qv). Its applications include mobile-robot (qv) navigation and monitoring dynamic industrial processes.For background material, the reader is referred to the two edited volumesof Huang (I,2), the pioneering and influential book of Ullman (3), several special journal issues(4-6), and proceedingsof several workshops on motion (7-9). In this entry the various approachesto the determination of three-dimensional motion of a rigid body based on timesequential perspectiveviews (image frames) are reviewed.The
MOTIONANALYSIS
621
axis passing through the origin of the coordinate system. Let fr - (nt, nz, ns)be a unit vector along the axis of rotation and 0 be the angle of rotation from t1 to t2. Then the elements of R I nA + can be expressedin terms of rlr, rtz,D3,&rd 0. Since n21 n3:1, there are six motion parametersto be determined:n1, rlz,0, Ax, Ly, and A,z.However,from the two perspectiveviews, it is impossible to determine the magnitude of the translation, i.e., if the objectsize and position as well as the translation are Two-ViewMotion AnalysisUsingFeatureCorrespondences scaledby the same factor, one gets exactly the sametwo image ProblemStatement.The basic geometry of the problem is frames. One can therefore determine the translation to only sketchedin Figure 1. The object-spacecoordinatesare denoted within a scale factor. To summarize, the problem is: Given two image frames at t1 by lowercase letters and the image-spacecoordinates by uppercaseletters. Let the two perspective views (central projec- and t2, find the motion parameters T (to within a scale factor) and R. As shown below, the equations relating the motion tions)betakenatt1andtz,reSpective1y,andt1< coordinates at t2 are primed, and the coordinates at t1 are parameters to the image-point coordinates inevitably involve the ranges (z coordinates)of the object points. Therefore, in unprimed. Specifically, consider a particular physical point P determining the inotion parameters, one also determines the on the surface of a rigid body in the scene.Let (x, y, z)be the object-spacecoordinatesof P at tim Q tu (x' , !' , z') the object- ranges of the observed object points. It will be seen that the spacecoordinates of P at time tz, (X, Y) the image-spacecoor- translation vector 7 and the object point ranges can be deterdinates of P at time h, (X',Y' ) the image-spacecoordinatesof mined to within a positive global scalefactor. The value of this P at time t2, arrd scale factor could be found if the magnitude of T or the abso( 1 ) lute range of any observedobject point is known. Y AYAY, AXAX,_X
first three sections describe methods that use a monocular two-dimensional sensor (such as a television camera); then methods are discussedthat use a stereo pair of sensors.Finally, there is a brief discussionon numerical accuracy,multipte objects,nonrigid objects,motion prediction, and high-Ievel motion understanding.
the image-spaceshifts (or displacements)of P from t1 to t2. It is well known from kinematics that the objectcoordinates of P at time instants f1 and t2 are related by
Considera two-stage SolutionUsingPointCorrespondences. method to solve the posedproblem. In the first stage,one finds point correspondences in the two perspectiveviews (images).A point correspondenceis a pair of image coordinates (Xi, Y), (X'i, Yl) which are images at tl and f2,r€spectively,of the same physical point on the object. Then, in the secondstage one determines the motion parameters from these image coordinates by solving a set of equations. Finding Point Correspondences.In order to be able to find point correspondences, the images must contain points that are distinctive in some sense.For example, images of manmade objects often contain sharp corners that are relatively easy to extract (10). More generally, image points where the local gray-level variations (defined in some way) are maximum can be used (11). Other important approachesinclude Nagel (12) and Kories and Zimmermann (13). In any case, in each of the two images a large number of distinctive points are extracted. Then one tries to match the two point patterns in the two images using spatial structures of the patterns (14).The matching will be successfulonly if the amount of rotation (0) is relatively small (sothat the perspective distortion is small). For example, in Ref. 10 goodmatching results are obtained if 0 < 5". This restriction may be relaxed if there is some a priori information about the object (15). BasicEquations.From Figure 1 there is the following relationship between the image-spaceand the object-spacecoordinates:
+ + r-[li e) l:1 ll] [l] [t] ll:,1-R['] where R represents a rotation and T a translation. To make the representation unique, the rotation is specifiedaround an
(X, Y = image-sPace coordinatesof point P at time 11
(AX, Al') : imagespace shifts from time f1 to t2 for p o i n tP
(x', Y')-
image-space c o o r d i n a t e so f pointP at time t2
X:F:
Y:FI,
(3)
Object space (x, y, z) : object-space coordinatesof a physical pointP on object at time f1
z Figure
(x', Y', z') : object-space coordinatesof the same pointP at time 12
1. Basic geometry for motion analysis.
For simplicity, assumethroughout that p - 1. The motion is describedby Eq. 2. From Eqs. (2) and (3) (r.;X+rpY*16)z*A,x
X'_ (r31X+rs2Ytrss)z+A,z (4) (rztX * rzzY * rzi? + A,z Y': (.r
622
MOTION ANALYSIS
where the r;i cdrlbe expressed in terms of nr, /Lz,rz3,&frd 0.By elimination of z from Eq. (4),
(Ar - X' Lz) {y'(rs1X * rs2Y * rgs) (r21X * r22Y * rzs)) _ (Ay - Y' Az) {X'( rslX * rs2Y * rss) (rnX + rnY + rrs)) (5) Also, z-
A,x - X' Az X'(rs1X * rs2Y * rgs) UnX + rpY * rrs)
(6)
Ay-Y'Az Y'(rs1X + rs2Y * rss) UztX + r22Y * rzs) Equation 5 is nonlinear in the six unknowns: L,x,A,y,A^2,rlr, n2,&fid 0. Also, it is homogeneousin Ar, A,y,and A,z.Therefore, as mentioned earlier, one can only hope to find T to within a scale factor. After finding T (to within a scale factor) and .R, one can find zifor each observedpoint to within the same scale factor using Eq. 6. To fix ideas, let the translation sought after be the unit translation vector
t' - (Aft, Lj,,Lil g
Q)
global minimum. Furthermore, with nonlinear equations it is very difficult to analyze the question of solution uniqueness. In fact, it is an open theoretical question: What is the minimum number of point comespondencesthat will ensure a unique solution for the five motion parameters Ai,, Lg, n!, h2, and g? With 5-point correspondencesthe number of equations becomeequal to or larger than the number of unknowns. However, since the equations are nonlinear, one would expectthat the solution may generally not be unique. This has indeed been verified by computer simulations in which global searcheswere made. The results of such simulations indicated that with 5-point correspondences there may be more than one solution; with 6-or-more-pointcorrespondences the solution is generally unique. It is to be noted that in the caseof 5-point correspondences, even though the solution may not be unique, if the iteration is started at a guesssolution that is closeto the true solution, one will most likely convergeto it. The conclusion is that the approach of solving nonlinear equations is viable if there is a good initial-guess solution. Otherwise, a better alternative is described in the next section: A linear algorithm that requires 8-or-more-point correspondences. A LinearAlgorithm. It turns out that by introduction of appropriate intermediate variables (which are functions of the motion parameters),Eq. 5 becomeslinear (17,18).Define
Then, Eq. 5 can be consideredas a nonlinear equation in the five unknowns: Lt,, A9,nt, tlz, and 0. Thus, with 5-point correspondence,there are five equations with five unknowns. Wellf u, €2 url (10) E _ lrn €s eal_ GR known iterative techniques can then be used to find solutions. € s ' n ) practice, In becauseof noise in the image data, one tries to find Lt, more than 5-point correspondences and seek a least-squares where solution. AlternativeFormulation. The motion-parameter Eq. 5 was 0 - A 2 arl [ derived by eliminating z Ln Eq. 4. Alternatively, one can for(11) G _ l n e o - L* | (skew symmetric) -Ay mulate equations in terms of the z coordinates of the points Lft oJ L under consideration without containing any motion parame- (Ai, L9, LA)is the unit translation vector definedin Eq. 7, T ters (16). This can be done by using the principle of distance conservationfor a rigid body. AssumeN point correspondences and R is the orthonormal rotation matrix. Then Eq. 5 becomes are given:
IX' And let (xi, !i, z;) and (xi, yi, zi)be the B-D coordinatesof the ith point at /1 and f2, r€spectively.Then, one has - x)2 * (yi - yj)z + ei - z), (g) - (xi *j), + 0i - yj), + (zi - zj), and from Eq. 3
- zix)'* !'|i;;, ]:,11,;':' + Qi ,,,"','-zjyj)2
(e) ,;),
For each pair of points, one Eq. 9 can be written. Thus, with five-point coruespondences, one can write 10 equations that (if 21 - 1) contain nine unknowns: 22, . , zE,zi,, , zL. A least-squaressolution for these unknowns can be found using iterative methods. Then the motion parameters are found by solving Eq. 2. Several methods for carrying out the last step are discussed under Motion from 3-D Feature Correspondences. Disadvantage of Solving NonlinearEquations.To find a leastsquares solution of a small set of nonlinear equations 5 or 9 using iterative methods is not computationally expensive. However,unlessthere is a goodinitial-guess solution,the iteration may not converge or it may converge to a local but not
Y'
Lwtfl:0
(L2)
LIJ
which is linear and homogeneous in the nine new unknowns: €L,
, €9.
The algorithm consists of two steps: Step /. From 8 or more point correspondencesdetermine E to within an unknown scale factor k. e kE to obtain R and ? Step 2. Decompos Step 1 is relatively simple; it amounts to finding the leastsquaressolution of a set of linear equations 12. Step 2 is more complicatedand is not discussedhere. The reader is referred to Ref. L7-2L for several algorithms. It can be shown (20,22) that, except for degenerate cases,8 or mo^repoint coruespondencesyield a unique solution for .E and ?. Planar PatchCase. In many applications the points observed may all lie on a rigid planar patch in 3-D. In this case the linear algorithm shown above breaks down. One can go back to use the nonlinear equations 5 or 9. However, it turns out that a more computationally efficient, and in fact linear, algorithm exists for the planar patch case (23-25). This linear algorithm, described below, also throws light on the uniqueness question for the planar case.
MOTTON ANALyS;S
Let the 3-D points observedall lie on a plane whose equation at /1 is a,x+by+cz:1
la, b, cf
[l]
-- 1
(13)
Later,the notation g : fa,b,c)t (the superscript f denotes transportation) is used. From Eqs. 2 and 13
+ Ll] ll:,)-R
+ rta,b,cr[]
T:R
[] :(R
Ltl ll:,1 _A
(14)
where
A:l1l 1l#,)-R+ [ll]
La,b, cf - R + Tg'
(15)
Y':
a+X*asY*a6 atX*aeY*as
Some other useful formulas are, from Eq. 13,
1-
ax+ bY+c
and, from Eq. L4, !:a7X*aaY*ag The two-step linear algorithm is as follows:
1
(1e) (20)
where (a, p) e (a', B').Note that one doesnot assumeany point correspondenceson the two lines. Unfortunately, d little reflection convinces one that no matter how many straightline correspondencesare known over two frames, it is impossible to delermine R and ? uniquely. Heuristically, one can argue as follows: From the imaging system geometry expressions for a' and B' canbe derived in terms of R, i, a, F, and some additional parameters that pin down the position of the 3-D line at t1.Given the 2-D image of a 3-D line, one needstwo additional parameters (y and 6, say) to determine the 3-D position of the line. Thus, a.' : a'(R, T o, F, y, 6) B' _ B'(R, f o, F, y, 6)
From Eqs. 3 and 14 atX*azY+as atX*asY*as
aX+BY-
tz: a'x + B'Y : 1
[l
X':
question arises: Can one estimate 3-D motion parameters by using straight-line correspondences? Images of man-made Finding Straight-LineCorrespondences. objectsoften contain straight edges.These straight edgescan be detectedusing edgepoint detectors(such as the Sobeloperator) followed by Hough transform (qv) (28). One first detects straight edges in both image frames and then uses structural information to match the two straight-line patterns. The algorithm of Cheng and Huang (zil can be used to do the matching if the motion from fi and t2 rs small. Twa-View Nonunigueness.By a straight-line correspondence over two frames, one knows the equations in the image plane at tl and tz of a 3-D line on the object: ti
+ Tla,b, cl)
62J
(2L) (22)
(16)
Each new straight-line correspondencegives two new equations 21 and 22 but also two new unknowns, 7 and E. Therefore, the number of equations is always smaller than the number of unknowns by five (the five motion parameters). (17) Three-ViewCase. With straight-Iine correspondencesover three image frames (at tt l tz S tt), it is possibleto determine 'i'r, the motion parameters Rn, f t , (from tr to t) and Rzs, (from t2 to ts). An equation involving Rn and Rzscan be ob(18) tained as follows. Let the equations in the image plane at t1,t2, and ts of a 3-D straight line be given by Eqs. L9,20, and
a set of linear Step / . From 4 or more point correspondences, homogeneousequations 16 are solved to find A to within a scale factor. Step.2 From A R, wT, and glw, are determined where u.'is a positive scale factor.
ts q."X + B"Y - 1
(23)
Equation 19 implies with the help of Eq. 3, that att1, the 3-D straight line lies in the plane Mc+Fy-z-0,
(24)
which has a normal
(25) q _ ( a ,F , - 1 ) Step 1 involves basically finding the least-squaressolution of a set of linear equations. Step 2 is more complicated; an algoSimilarly, at t2 and /3, r€spectively, there are the normals rithm using singular-value decompositionis describedin Ref. (26) q ' - ( a ' ,F ' , - 1 ) 24. It can be shown that, except for degeneratecases,given 4 generally two soluor more point comespondences, there are (27) q" _ (a", F", - 1) tions for R, T, and g. With 4 or more point correspondences over three views, the solution becomesunique (25). Then, it can be shown that the three vectors e', Rne, and R-Leq"are coplanar. Thus In the presSolution Using Straight-LineCorrespondences. (28) q' ' (RnQ x R-Lzsq")_ 0 ence of image noise and/or due to the spatial sampling, the coordinatesof feature points cannot be determined accurately. Here a three-element array is consideredas either a vector or This may make the estimation of motion parameters unrelia column matrix from context. Equation 28 is nonlinear in the able. Usually, it is easier to detect and determine the location six unknown motion parameters (three from eachrotation matrix). It has been found empirically that given seven or more of straight edges than feature points (26,27). Therefore, the
624
MOTIONANATYSIS
straight-line correspondencesover three frames, one can determine a unique solution to Rn and Rzsby finding the leastsquares solution of the set of nonlinear Eq. 28 using iterative methods. Once the rotations are found, the unit translation vectors can be obtained by solving linear equations. An alternative treatment of the line correspondencecase was given by Mitiche, Seida, and Aggarwal (90). SolutionUsingPlanarCurveCorrespondences. In somecases it may be possible to track the projection of a planar contour (e.g., the boundary of a face of a polyhedron) from one image frame to the next. The change in the shape of the 2-D region (in image plane) bounded by the contour contains information on the 3-D motion parameters as well as the orientation of the plane in 3-D. More generally, if more than one region can be tracked the change in the relative positions of these regions (in image plane) can also be utilized. Gambotto and Huang (31) have shown in a simple example how this region-based method can be used in motion analysis. However, a general methodology, even for the one-region situation, is yet to be developed.In the following, two special cases(one-region)are described. Small-MotionCase. Kanatani (32) has suggesteda method using line (or surface)integrals. It is assumedthat the amount of motion from fr to t2 is small. Then
R-f
l- 1
-Qs
L-Qz
6t
d'
611
-d'l
I
1 _J
(2e)
where (30)
0r: ni9
Let Cr and C2be the images at tl and f2,r€spectively,of a 3-D planar contour. The equation of the plane at f1 is
choosea function ,r;;': I(t) : I (tz) :
r:,;
1,, 1,,
:;,
and compur:t'
F(X, Y) dS
(31)
F(X, Y) dS
(32)
Orthographic Projections. For orthographic projections, instead of Eq. 3, one has
X-x
Again, assumethat the points observedlie on a plane in 3-D whose equation at f1 is cuc+by+cz:1
[;]
Then, it can be shown that A/ A l(tz) - I(tz) : K1 A^x* KzAy + Ks Az + K+6t * KsQz + KoQs* K7a Ar * Kaa Ly + Ksa Lz t KrcaS1 * KsaQ2 * Kpags + KBb Ar * Kub Ay + K6b L,z (34) + KrcbQt + Knbilz r K:f.bgs where the Ki are constants obtained by evaluating contour integrals around C1 whose integrands involve F , aFI aX, aFI AY,X,Y, d)(/ds, and,dYlds and where c = t has been set to fix the global scale factor. The detailed formulas for K; are given in Ref. 32. Equation 34 is nonlinear in the eight unknowns: Ax, Ly, Az,6u 62,6s, a, and 6. To find these unknowns, eight or more different functions F(X, Y) are first chosen. For each function one can calculate A.I and the K;to get one Eq. 34. Then one finds the least-square solution of the set of eight or more equations 34. Whether a unique solution can be obtained by this method is yet to be answered.
-Atrl+D
where
orrl+ A_ 1",,orr) lazr
and
D
lrrr - arn rtz arrrl arzs rzz brzt) Lrr,
a'-l _ [r''le [',, + + AyJ ldr|
Lrn
(c _ t has been set to fix the global scale factor). Thus, the relationship between (X, Y) and (X' , Y') is an affine transformation. This should be contrasted with the case of central projectionswhere the relationship is Eq. 16. One can attempt to find the motion and structure parameters (nu rL2,0, ax, ay, a, and b) in two steps: First, from a contour correspondenceover two frames, determine A and D in the affine transform Eq. 36 (a contour correspondenceimplies no point correspondences between the contour pair) and, second, determine the desired parameters from A and D. Several techniques for carrying out step t have been proposed.Reference34 describesa method that relates the moment tensors of the two regions bounded by the contours at f1 and f2, r€spectively; Ref. 35 describesa method that relates the Fourier coefficients of the two contours after a canonic param etertzation. A related work is Ref. 36. Unfortunately, step 2 is generally not possible.The unknown parameters cannot be determined from A and D without additional information. This is becausethere are six equations: Tn -
Afn:
An
ftz
brg:
an
TZt-
AtZg:
AZt
fZZ
brZS :
A2Z
fn
(33)
(13)
Then, from Eqs. 2, 35, and 13 (seeRef. BB),
where
ds:ffi
(35)
Y-y
+ A^x_ dr
f z s+ L y : d ,
but sevenunknowns: hr, r12,0, Ax, Ay, a, and b. Solution becomespossibleif one is givetr,€.9., (a, b), i.e., the orientation of the plane at t1. To closethis section,note the classicalresult of Ullman (3) for the orthographic projection case: four-point correspondencesover three views determine motion/structure uniquely. Motion From optical Flow Problem Statement. In the two-view case, if tz - h : A/ is small, R - I + S At where -ws . :n I[ 9 D ws l--*' and l is a 3 x 3 unity matrix.
*'1 -!t ' I w1 O -l
(37)
ANALYSIS
MOTION
The symbol O is used to denote the vector (wr wz, tt)B),the instantaneous angular velocities around the K, !, and z axes, respectively,at ty Also, !:uLt
625
ness.If the motion is small, this brightness-constancyassumption is reasonable in many situations. Then fz(X's,Y6l - fz (X0 + LX, Yo * Ay) is expanded into a Taylor series around (Xo, Yo) and only the linear terms are kept to get
Lf(xo,yd: -.LX#r*0, Yo)- ay ut"r*r,Yo) (45)
where
^ [u.l
This is an important equation mentioned again in the section Motion Estimation by Direct Matching of Image Intensities. Here, one can use it to find optical flow in the following way (37,38).If there are two or more image points (near each other) are the instantaneous translational velocities along the axes that one can assume to have the same (AX, Ay), by calculat+ 2 0, Eq. becomes at t1. Letting A t ing A/and tafl aX, afl aYl (using a difference approximation) at (38) each point, one can get a set of linear equations in the two (matrix equation) unknowns LX and A y. Finally, the least-squaressolution of these linear equations is found, and Eq. 42 ts used to get V, or, equivalently, and Vr. For generat 3-D motion (AX, Ay) vary with (X,Y). There(3e) O(f) x p(t) + u(t) (vectorequation) W: fore, it may not be reasonableto assumethat (AX, Ay) are the dt same at several im age points. Horn and Schunck (39) considwhere ered the casewhere (AX, An change slowly with (X, Y) and formulated a variational method for estimating (LX, Ay). 1-r(f)l (40) Other methods that are image-point-wise recursive are dep(t)alrftlI scribed in Refs. 40 and 41. Also, Nagel (42) attempted to imlz(t)) prove the estimation of ( AX, Ay) by including the secondAs At + 0, in the irnageplane, order terms in the Taylor series expansion of fz(Xo + AX, Y + d Y L Y dX AX t7 a = ar. ( 4 1 ) Ay). For a recent insightful study on the determination of vv: v': L t : d , t optical flow, seeHildreth (43). Seealso the pioneeringwork on i l S d t L t : ^tl$ optical flow by Prazdny @4). The image-plane velocity vector (V*, Vr) is referued to as the optical flow. The problem of interest is to determine u (to BasicEquations.Differentiating Eq. 3 with respectto t and within a scale factor) and O at time f1 from optical-flow infor- using Eq. 38, one gets mation. One takes an approach similar to that of using point correspondencesin the two-view case.Specifically, it consists v,: {+ - *\+ vxYwt + (r + xz)wz-Y*rl of two steps: Find optical-flow vectors at .l/ image points, [(Xi, Y), (Vxi, Vy)1, i - L, 2, . . ., N and solveequationsobtained (46) from the optical-flow information to determine u and O.
u4 l " l
l;.)
+ uG) ry- sa)p(t)
vy:
- xYwz* xw s l + Yz) wt
-
{ + *} .I- ( 1 FindingOptical Flow. Two approachesto finding optical flow are described. The first approach is to find point correspon- whence dencesbetween two image frames at tl and fz (with t2 - t1 : Lt t)x - Xu, small) using methods discussed above, and then obtain the V**XYwr- (t *X2)w2*Yws optical-flow vectors by fr/ x : _
AX f} L/ M'y
:-
AY At
(42)
The secondapproach is to relate temporal and spatial differencesof the image brightness. Let f{X, Y) and fz(X, Y) be the brightness at point (X, Y) in the two successiveimage frames (at t1 and f2, r€spectively). At any given image point (Xo, Yo) the time (frame) difference is
(47) and (u, - Xu")lVn * (r + Yz)wt - XYwz - X*tl _ (r, - Yu")[V, * XYwt
(48)
(f + Xz)wz + Y*u]
Equation 48 is nonlinear in the six unknownsi u", t)y,uz, wL, tt)z and ru3.Also, it is homogeneous in t)a,uytand ur. Therefore, u Assume the image point (Xo, Yo) at f1 and the image point (X'0, (us, uy, ur) can be determined only to within a scale factor. To Y'd, at t2 correspond to the same physical point on the 3-D fix ideas, let the sought after translation be the unit translation vector object, and let
Af(Xo, Yo) 2 fr(Xo' Yo) - fr(Xo, Yo)
AX - X[ - Xo
(43)
u :
AY:Y'a-Ys
/ A \ A \ux, uv, uz) :
w
(4e)
Then Eq. 48 contains five unknowns, e.g., 0*, 0r, wb w2, w3. If
Then
vectors at five or more image points, [(X;, (44) there are optical-flow -
Lf(Xo,Yi : fz(Xo,Yo) - fz(X6,Y6) if onemakesthe assumptionthat any givenpoint on the 3-D objectappearsin the two imageframeswith the samebright-
Y), (Vxi, Vyt)1,i 2, . , N, one can seek a least-squares solution to the set of .l/ nonlinear equations 48. Note that Eq. 46 can be derived from Eq. 4by letting Ar + g.
626
MOTION
ANATYSIS
A Linear Algorithm. Similar to the two-view point-corre-
Then, as before,
spondence case, a linear algorithm is possible here (45). In fact, in Eq. L2, if one sets
(50)
G_KAt
I
(51)
and then lets Lt + 0, one gets
1]Ks ou(lfl+tx,v, [fl
(52)
ars L:li::i;:i;:1
(53)
ls, ttt)
ThenEq. 52 is equivalentto lxz, Y2,L,XY, X, Y, Vr, -V, V*Y Vyxlh _ 0
(54)
where
ln lzz Iss Itz + lzt TB + lst Izs + lsz
hr h2
h_
h3 h4 h5 h6 h7 h8 hs
(55)
k+ + ksX + k6Y + kTXY + k8Y
(58)
Given optical-flow vectors at four or more image points, we can determine ft1, k2, . , ka from Eq. 58. Then 0 and O can be found from the ki as describedin Ref. 46. Similar to the twoview case, generally there are two solutions for the motion parameters. Reference 46 discussesthe physical meaning of the two solutions and the fact that in many casesone of the solutions can be ruled out. GeneralizedFlow Fields Basic Equations.In the discussions of optical flow so far, only the image-point velocities Vy and Vy have been used.A more general formulation using Vx, Vy as well as their derivatives (with respect to X and Y) up to the secondorder was proposedby Waxman and Ullman. Their approachis basedon studying the deformation of a small neighborhood in the image and provides much insight into the relationship between the 3-D motion/structure of a rigid body and its 2-D perspective views. Specifically, consider the vicinity of the image origin (X, Y) _ (0, 0), and assume that the object surface
(60)
around the point (0,0, zo),where z0: z(0,0) is smooth(twice differentiable). Then L2 observablescan be defined that are expressiblein terms of the six motion parameters (MrMi,
ll'rl:[l] h:, i:l I h','
Vv:
z _ z(x, y)
ux uy uz
From Eqs. 53 and 55
l h,r, ,I lh,
let + kzX + ktY + kzXz + ksXY
kz : an)x- cu, kg: bu, - tt)g, fu : ct)x+ w2 k + : C U y- t t ) 1 k s : a U , t w s k o : b u , - C t ) 2 ,( 5 9 ) k7_ -am,I w2 kg: -bu, - wg
(37)
R-I+SAr,
L/rt
Vx: where
and
[Vx, Vy,
(17)
Substituting in Eq. 46, one gets
where
xlli-,,l.'";"1
-aX+bY+c
2
(56)
Y uY t)' t u t ' w z w ' 2 ,;'4';o' and five structure parameters (TrT),
lazzl la2z1 lU1 l g r l l a z l 0'20 La*ail, LOJ L#lo'zoLav2lo'zo LarJo' 0
-W2
- wWg 3
:l jr I
The subscript 0 indicates that the derivative is evaluated at (0, -W2 0 _ LW 1 0, zi. Note that the five structure parameters give informa(5 surfaceat (0, 0, zi. 0 w D1 W22 "57) tion on the slopesand the curvatures of the l hh, a l (01-012)Vx, Vy, €Lt,€22,Qrz,w, Eesl are 12 The observables W1 0) Wg3 sl ll hnh, u 0Y,6wI 0X, and 6utI 0Y, where €rr,€22, l,X, 6e221 AX,\esl AY, 0e221 W2 w l)3 h u J L hhu, as €r2,and w are defined follows: Let 's. Fron r ei1 ptiightt oorr lmore opl r foOIIOW llows rIl pro rocedrtulrer i s AS n prt ion ltro The soluti e solu le :ale 1 lavt , ht, dvi es s h1, terr'mi nel ceterm onee) (det :tors, s, o , h ge t o vyitlh i n a rsca v(ectors, L-fow vec cal-flow 'hen, lElq. ' y ,, U ' ( u r , l ) gir L VES U " ) , 56 Lrt , Then Eq. lq. . T rar r E 54 5, a * : , linet ir a] m t h el I fi om factor :tcr fror: ") 'actor. lv. Firralilly, alee fa )r. IFi ithinn 8a scal to w withi v lEEql . o t i s rused tor find o (61) i, j : L, 2; (Vr , V) _ (Vx, Vi; (€r, €z) _ (X' Y) ( w 1r ,r Wl Uurc)) . w z.,, Wg
Itt
- LW 1
0)
- WW33
uxl
;_l 17,)
I.
#l .tlg-#lo,,i**ii
;;i
,'l;
7 n
PlanarPatchCase. The linear algorithm of the last section breaks down when all the image points under consideration correspondto 3-D points lying on a plane (46). However' similar to the two-view caso,d different linear algorithm is available. Let the equation of the plane in 3-D be e)c+by+cz:1
(13)
In terms of image deformation, e;i is the rate-of-strain tensor and wii the spin tensor. The physical meaning of these quantities erte,€11is the rate of stretch of a differential image line oriented along the X axis, €22the rate of stretch of a differential image line oriented along the Y axis, €t2 (: ezt) one-half the rate of decreaseof the angle between two differential line segmentsalong the image axes, and wzt (:_- wn - w) the rate
MOTTONANALyS1S 627 of rotation (i.e., the spin) of the differential neighborhood of image about the origin. The basic flow equations relating the observablesto the motion and structure parameters are derived in Ref. 47: 01 -M1 *Ms A2:Mz-M+ 03_-Ms-MtTt 04:-Ms-MzTz 05_-L(MzTt+MrTz) 06:Me +L(M{z-MzT) 07 -2(Ms+MsT)-MtTs 0s _ -M+*MsT2-MtTs
(62)
tions are solved to obtain the motion/structure parameters. In this section a description of a method basedon direct matching of image intensities is given (also, see Finding Optical Flow, above). Determining2-D Translationby DisplacedFrameDifferences. Consider first the simple caseof 2-D translation, i.e., assume that (AX, Ay) is constant for all image points correspondingto physical points on the rigid body. Again, let (40, 4L) f {X, Y) : brightness of first frame (at rr) fz(X, Y) : brightness of secondframe (at tz) Then the approachis to match f, and f2 drrectly: Find (AX, Ay) to minimizeD {ft(X,Y),fz(X + LX,Y + Ay)}, whereD is a distance measure. One commonly used distance measure is
0e:Ms+MsTt-MzTs 0ro:z(MsTz-Mi-MzT+ 0rr : i(M4 - MsTz - MzTs +M{s) 0rz : +(Ms + MsTt * MtT4 - MzTs)
D_ These flow equations form a set of L2 coupled nonlinear algebraic equations with 11 unknowns. A method of solving these equations (given 0r-0rz) is describedin Ref. 47. Finding the Observables.The problem remains: How does one measure the observablesfrom the image sequence?References36, 48, and 49 suggest a method based on evolving contours in the image plane. The 12 observablesare in terms of Wr and r*i', i,i : o, 1, 2 and,i + i s 2,
X,Y
It is important to point out that this direct matching approach makes the tacit assumption that the two image points at t1 and f2, r€spectively, correspondingto the same physical point on the object, have the same brightness: i.e., the brightness of an image point corresponding to a fixed point on the object does not change after motion. This is called the brightness-constancyassumption. Coming back to Eq. 66, one notes that D can be minimized where by using standard optimizatrontechniques. However, the comrvr ( i , i ) -A A ' r J V x I (63) putation can be simplified in the casewhere the motion (AX, x 6YaYilo Ay) is small. Then one can expand fz(X + LX, Y + Ay) in a Taylor series around (X, y) and retain up to only the firstand similarly for VVi'. These derivatives can be obtained in order terms. And Eq. 66 is reduced to the following manner. In the vicinity of (X, Y) : (0, 0), one can write (67) 22
Vx(X, Y) _
t tvyii,++Vv(X, Y)
j:0
J:0
(i+j=2)
v'
J'
D:; (or*^xfr*.^Y#)'
22
r:o"r:o (i+j=2)
il i!
where
Lf(x, Y) 4 fr(x, Y) - ft(x, Y)
(64) For curved surfaces Eqs. 64 are only locally (and approximately) valid. But for planes they are globally valid-see Eqs. 58. Assume a planar contour is tracked over two image frames separatedby a small At. If one measures at a point (X, Y) on the contour, the normal flow velocity Vr(X, Y) and the normal of the contour n(X, Y) - (nx, nv), one gets the equation
v"(x, Y) _
22 l:0 /:0 (i+j=2)
is the frame difference at (X, y) (40,4L). In practice, Af and afzlaX, Ef2l0Y is calculatedat N points: (Xi,Y), i: L,2, . . , N. Then the summationin Eq. 67 will be over these N points. Note that minimizing D in Eq. 67 is equivalent to finding the least-squares solution of the set of linear equations:
- ( A f ) i :a x ( # ) , . a y ( # ) ,
(i: r,z. ..,N) (68)
where a subscript i indicates that the quantity is evaluated at ##{rx(x,nv?i) + nv(x,YVV4(65) (Xi, Y,). This is the same as the method describedin Finding
Since there are 12 unknowns, one needsto measurethe Vn and n of at least 12 points on the contour. Note that several separate contours can be used as long as they lie in the same plane in 3-D. For curved surfaces the problem is much more difficult. Reference49 discussesthe truncation errors incurred by using the approximate Eqs. 64. Motion EstimationBy Direct Matchingof ImageIntensities All the techniques for 3-D motion determination described above fall into the category of two-step methods. First, correspondencesor optical-flow vectors are found, and then equa-
Optical Flow. Generalizationto 3-D Motion. The method of the preceding section can in principle be extended to the general caseof 3-D motion. Both AX and AY are expressedin terms of the 3-D motion parameters; then D in Eq. 66 is minimized with respect to the 3-D motion parameters. In practice, there are two difficulties. The first is computational: There must be searching in a high-dimensionalspace.The secondis that (as shownbelow), without further assumptions, the number of solutions is infinite. From Eqs. 1 and 4 one can get AX and Ay in terms of, e.g., X, Y, zlA,z,tlr, /Lz,0, A,xlL^2,and AylAz (assumingLz * 0. Then D in Eq. 66 is minimtzed with respect to these latter variables. Unfortunately, for each point (Xi, Y;) there is a new
628
MOTIONANATYSIS
unknown z;l Az. Therefore, one always has five more unknowns (the motion parameters) than the number of terms in Eq. 66, and as a result one has infinitely many solutions to the minimization problem. One can hope to get a unique solution if one knows the form of the object surface to within a finite number of parameters. The simplest caseis when the surfaceis a plane. Then it can be representedby arc+ by + cz - 1 (at t)
(18)
and
Az
"Y*or-
4 u "a s * , " # o n + u Y " o u#*o u
lx,#. xYp. x nf)oz
l*ffi. ",#* vgfaa f"ffi."# * a/]as:o
Q2)
a'X + b'Y * c'
(6e) This equation is linear and homogeneousin the nine un-
b' A bAz
(70)
where
a'!aM,
**or+
c'4cAz
As a result, D in Eq. 66 can be expressedin terms of the eight unknown parameters a', b', c'. rlt, ttz, 0, Axl A'2,and A'ylA'z independent of how many points (Xi, Y) arc used in the summation. Now the computational problem: To search in an eightdimensional spaceby standard optimization techniquesis very time-consuming. The situation is better if the 3-D motion is small so that all (AX, Ay) are small. Then one can use the Taylor series approach, and the problem of minimrzrng D is reducedto the problem of finding the least-squaressolution of the set of Eq. 68, where AX and Af are now written in terms of the eight unknowns mentioned above.Note that the equations . are now nonlinear (50,51). To summaruze:The method of determining 3-D motion parameters of a rigid planar patch is to calculate A'fand 6f2l6X , 6f2l6Y at eight or more points, and then find the least-squares solution fty some iterative method) of the set of eight or more nonlinear Eq. 68, where AX and Ay are written in terms of the eight unknowns a' , b' , c' , tlr, rL2,0, Lxl Lz, and AylA z by using Eqs. 4 and 69. Once again, note that the method assumesbrightness constancy. Linear Algorithm for Planar Patches.The nonlinear leastsquaresalgorithm for determining 3-D motion parameters of a rigid planar patch as describedin the preceding section can be reduced to a linear least-squaresproblem by introducing appropriate intermediate variables (23). Specifically,from Eq. 16 atX * azY * as - azXz - aaXY azX*asY*ag
knowns,ar, ., as. If one calculatesA/and 6f2l6x, 6f2l6Yat eight or more image points (X, Y), one gets a set of eight or more equations e.9.,Eq. 72. Therl a1, ., as can be solvedto within a scale factor. Recall that the ai are related to the motion/structure parameters by Eq. 15 and that the latter can be obtained from the former by a method describedin Ref. 24. Motion from 3-D FeatureCorrespondences The motion-estimation techniques described above are based on images taken by a monocular 2-D sensor such as a single television camera. With such an arrangement the 3-D translation and the range of the object can be determined to only within a scale factor. One can determine the absolute translation velocity and ranges of object points if binocular vision (seestereo vision) is used, €.9., two television cameras with known relative positions and orientations. The binocular method has several other advantages describedbelow. BinocularProcedure.A pair of stereoimagesis taken at t1, and another pair is taken at t2, and then the following procedure is used. 1. From the two images taken at t1 feature points are extracted, the two point patterns are matched to find correspondences,and then by triangulation the 3-D coordinates of these points are found. The same is done for the two images taken at t2. 2. The two 3-D point patterns at /1 and t2 ere matched to find 3-D point correspondences. 3. A set of equations involving the motion parameters are obtained from the 3-D point correspondences.These equations are solved to determine motion (52).
agX
Note that the matching problems in 1 and 2 are usually easier than the matching problem in the monocular two-view ( 7 1 ) case(seeabove)becausein 1, for a fixed point in one image of the stereo pair, the correspondingpoint in the other image is atX + asY + aa- azXY - aaY2 - asY restricted to lie on the so-calledepipolar line, and in 2, the AY:Y, Y_ atX+aBY*ag distancesbetween pairs of the 3-D points on a rigid body is invariant to motion. An algorithm for the maximal matching Assuming the motion to be small, one can substitute Eq. 7l of two 3-D point sets is presentedby Chen and Huang (53). into Eq. 68 to get
AX:X'-X-
Motion from 3-D Correspondences. Once one has obtained p; € pi, i : L,2, . ., N, where 3-D point correspondences
(azX + aeY + as)Lf _ (atX + a2Y * as
atXz - aeXY -
+ (aaX + asY * a6 - aTXY - aaYz -
"rn* oniI) #
p:lll and p':lr'l
MOTION
ANALYSIS
629
how does one get the motion parameters R and T? A related nant equal to +1). In that case a rotation matrix R'can be question is: What is the minimum number of 3-D point corre- found by using the algorithms in Refs. 55 and 56 to minimize spondencesneeded for unique determination of R and T of a 33 rigid body? A basic fact is that R and T are determined - ri)z ll.R'- Rllz4 j : I 7'ii uniquely by three 3-D point correspondences(assuming the i:l three points are not collinear). This becomesobvious if one notes that two points will fix a rigid body in spaceexcept for a possible rotation around the axis formed by joining the two where ry and r'ii are elements of .E and R' respectively. points. A third point then fixes the rigid body completely.Once one knows three 3-D point correspondenceson a rigid body, one can generate any number of other 3-D point correspon- ndditionalTopics dencesrigid relative to the original three points. In the preceding sectionsthe major approachesto determining To describe algorithms for finding R and T, Eq,.2 is rewrit3-D motion/structure of a rigid body are described in some ten as detail. This last section is a brief comment on someimportant ( 7 3 ) additional topics. These topics also represent areas where furp'-np+T ther research is needed. There are Six unknOwn parameterS,rLL,rlz,0, L^x,L!, and AZ. Each 3-D point correspondencegives one matrix Eq. 73 or three Numerical Accuracy of Algorithms. The reader should be scalar equations, which are nonlinear in the unknowns. An warned that computer simulations and experiments with real obvious method would be to find the least-squaressolution (by images (57,58)have indicated that in order to estimate motion some iterative technique) of the set of 3^l{ coupled nonlinear parameters reasonably accurately (around L}Vo error) from equations obtained from the N three-dimensional point corre- two perspectiveviews using a single camera,the image resoluspondences,where N > 3. However, much simpler linear algo- tion has to be quite high (typically 1000 x 1000 picture elerithms are available (54), one of which is describedbelow. ments, assuming image-point features can be measured to Assume there are three 3-D point correspondences within one picture element). Theoretical studies or even systematic simulation studies on how the estimation erro,rs dei-Lr213' Pi-P'i, pend on various factors are yet to be made. The situation with the two-camera caseis somewhat better (52). Somesimulation Let rAtl results for the two-camera case are given below to indicate A ft\:Pt tTLt :Pt Ps (74) Ps how redundant point correspondencescan be used to improve mz lpz - p; mz \pz - ps estimation accuracy. The algorithm of Motion form 3-D Correspondences(above) Then, from Eq. (73) requires only three 3-D point correspondences.If more than (75) three point correspondencesare available, the redundancy can ff12 -Rmz f/11 - Rmt be used to improve estimation accuracyin several ways, two of which are adaptive least-squares {. 2) and RANSAC (59). A If com(76) hybrid of the two was used in Ref. 52, from which some trls A ffl1 x trl2 trLs A rT\ X rrl2 puter simulation results are quoted. The imaging geometry is as follows: Two pinhole cameras with focal length 28-mm are (Consider rn; anid m'i as vectors.), then used, and the two image planes are coplanar; each image is 07) 38 mm x 50 mm and has a resolution of 5L2 x 5I2 picture tTLg : Rmg elements. The baseline distance between the two cameras is Combining Eqs. 75 and 77, 400 mm. The 3-D points are chosenrandomly in a cube centeredat a (78) lmt, mi, rn;l Rlmt, rtlz,msf point 3 m from the cameraseach side of which is 0.75 m long. The true motion is a rotation of 35" about an axis through the whence (7e) origin with direction 0.9, 0.3, and 0.316 followed by a translaR _ lmt, rnL, rn|llmt, trl2,ms]-r tion of 0.8, 0.2, and 0.6 m. The simulation is done as follows. The 3-D points before and after the motion are projectedonto and the two images. The image coordinates of these points are (80) quantized (with a resolution of 5L2 x 512). The quantized foti-L,2,3 T-p:-Rp, image points are then used in the method describedin Motion from 3-D Feature Correspondencesto estimate R and 7. That Note that the numerical accuracy of this algorithm is usually improved if normalized (to a magnitude of 1) versionsof rniar'd is, triangulation is done using these quantized image points to obtain the 3-D coordinates of the points, which are then used mi are used in the formulation. Two remarks are in order. First, the abovealgorithm can be in the algorithm describedabove. The errors in the estimated used not only for 3-D point correspondencesbut also for 3-D R and T are due to the inaccuraciesin the 3-D coordinatesof straight-line correspondencesand surface-normal comespon- the points, which are in turn due to the quantization of the dences.In the latter two casesonly two correspondencesare image coordinates.The results are: The average errors (tn 7o) needed.Second,in the pressure of noise in the data (3-D point of 0, rL!,rlz, ns, Lx. Ly, and A'zare, respectively:5.2,2.3, I4.5, coordinates),the matrix R obtained from the above algorithm 8.1, 10.1,30.7, and 10.7with seven3-D point coruespondences may not be a rotation (i.e., orthonormal and with a determi- and 2.2, 1.0, 7.L,3.1, 4.8, L4.9,and 4.4 wtth fifteen 3-D point
MOTION ANALYSIS
correspondences.For each of the two casesthe averages are computedover 100 trials.
center of gravity, which is moving on a polynomial curve (e.g., a parabola) in space.
Multiple Objects. The methodsdescribedin the earlier secHigh-LevelMotion Understanding.In many casesthe ultitions are for a single isolated rigid body. What if the scene mate goal of motion analysis is to come up with a symbolic contains several rigid bodies moving differently (this includes description of the dynamic sceneunder study. A completesysthe specialcaseof a single rigid body moving against a station- tem can conveniently be thought of as comprising two modary but textured background)?Segmentation needsto be done ules. The first module extracts from the observed raw data somewherealong the way. If one is working with the two-view (e.g., &il image sequence),low/intermediate-level features case described in Solution Using Point Coruespondences such as motion and structure parameters. Then the second (above)and if the motions of the rigid bodiesare small from f1 module arrives at a symbolic description of the dynamic scene to t2, the following approach can be tried. by high-level reasoning based on the low/intermediate feaAssuming the motions are small, one can still hope to get tures as well as other a priori information about the scene. correct point correspondences.However, one does not know One can find such complete dynamic scene-analysissyswhich points lie on which objects.This one attempts to find by tems in the literature in the biomedical area. Two excellent a clustering technique. The basic idea is to take all possible examples are Ref. 68, which describesa rule-basedsystem for octets from th. point correspondences,and for each octet com- characterizing blood cell motion, and Ref. 69, which describes pute R and 7 using the algorithm described above under A a system for analyzing the motion of left-ventricle walls. In Linear Algorithm. Then clusters are found in the five-dimen- both casesthe " scenes"are basically 2-D in nature, and therefore the task of the low/intermediate-level module is greatly sional (nt, n2, 0, Lx, Ay)-space.Ideally, each rigid body will grve one cluster. To save computation, one usesheuristics (qv) simplified. For truly 3-D scenesa complete dynamic scene-analysis to reduce the number of octets to consider and perhaps does clustering in subspacesof the five-dimensional space. Obvi- system is hard to construct. The main problem is that the low/ ously, the same approach can be used in the binocular case. intermediate-level features the high-level module needsfor its reasoning may be very difficult, if not impossible, to extract Here, one only has to deal with triplets. In order to handle the multiple-object caseeffectively, con- from the raw data. In fact, the low/intermediate-level module will probably need help from high-level reasoning to improve straints on the scenario should be used wherever possible.A very impressive piece of work in that direction has been done its performance.Someimpressive examplesof high-level modules are Refs. 62,70, and 71. Reference70 describesa system by Adiv (60). that observestraffic scenesand producesnatural-language deNonrigid Objects. Two casesare of particular interest: an scriptions of them. In particular, the system will recognizeand articulated object(i.e.,dh objectcomprising severalrigid parts verbalize interesting occurrences(events)in the scene-€.g., connectedthrough various joints) and an elastic object. Some one car is overtaking another. Reference7I describesan exaspects of motion analysis of articulated objects have been pert system for event identification. The applications considstudied by Asada, Yachida, and Tsuji (61); O'Rourke and ered are simple assembly-linetasks. However,in both systems Badler (62); and Webb and Aggarwal (63).In particular, Webb the low/intermediate-level features neededby the high-level and Aggarwal investigated the case where the rotation axis modules are furnished at least in part by human operators. can be assumed fixed in direction throughout the observed Future Research.To summaruze,the following are imporimage sequence.The same authors (64) have also studied a special caseof elastic objectswhere the object is assumedto be tant researchtopics in motion analysis: locally rigid, which implies an affine transformation between two image planes under local parallel projection. This ap- 1. to find robust algorithms for motion estimation, proach is being extended by Chen (65) to handle general elas- 2. to find algorithms for estimating motion of multiple objects, tic bodies.Finally, Koenderink and VanDoorn (66) are investi- 3. to find algorithms for estimating motion of nonrigid objects, gating the special case of bending deformation. The class of 4. to find algorithms for predicting motion, and bending deformations encompassesall deformations that conserve distances along the surface but not necessarily through 5. to link and coordinate low/intermediate-level and highlevel motion analysis. space. Motion Modeling and Prediction.This entry has been concerned mainly with estimating the motion parameters R and T of an object between two time instants \ and f2 based on image frames taken at these time instants. In most practical problems one is more interested in predicting rather than just estimating motion. In order to predict, one needs a model of the motion that is valid over a number of image frames and contains a small number of parameters that remain constant over these frames. One can first estimate these parameters basedon the first few frames and then use these estimated values to predict future motion and hence where the object will be in future frames. One such approachis describedin Huang, Weng, and Ahuja (67), where the object has a precessional motion around its
BIBLIOGRAPHY Analysls, Springer-Verlag,Hei1. T. S. Huang (ed.),Image Sequence delberg,FRG, 1981. 2. T. S. Huang (ed.),Image SequenceProcessingand Dynamic Scene Analysls, Springer-Verlag,Heidelb€rg,FRG, 1983. 3. S. Utlman, The Interpretation of Visual Motion, MIT Press,Cambridge, MA, L979. 4. IEEE Trans. PAMI, special issue on Motion and Time-Varying Imagery 2(6),493-588 (November1980). 5. IEEE Comput. Mag., special issue on Computer Analysis of TimeVarying Images 14 (8) pp. 7 -69 (August 1981). 6. Comput. Vis. Graph. I*g. Proc., special issues on Motion and
MOTION ANALYSIS Time-Varying Imagery 2L (1 and 2),I-293 (January and February 1983). 7 . Proceedingsof the Workshop on Computer Analysis of Time-Varying ImagetA, Abstracts, University of Pennsylvania, Moore School of Electrical Engineering, Philadelphia, PA, April L979. 8 . Proceedingsof the ACM Worh,shopon Motion: Representationand Perception,Toronto, Ontario, April 4-6, 1983. 9 . Proceedingsof the IEEE Workshopon Motion: Representationand Analysls, Kiawah Island, SC, May 7-9,1986. 1 0 .J. a. Fang and T. S. Huang, A Corner-Finding Algorithm for Image Analysis and Registration, Pittsburgh, PA, August 18-20, pp. 46-49, L982. 1 1 . H. P. Moravec, Obstacle Avoidance and Navigation in the Real World by a SeeingRobot Rover. Ph.D. Dissertation, Stanford University, September1980. L 2 . H. H. Nagel, "Constraints for the estimation of displacementvector fields from image sequenc€s,"Proc. of the Eighth IJCAI , Karlsruhe, FRG, Aug. 8-L2, 1983;pp. 945-951.
1 3 . R. Kories and G. Ztmmermann, A Versatile Method for the EstiPromation of DisplacementVector Fields from Image Sequences, ceedings of the IEEE Workshop on Motion, Kiawah Island, SC, May 7 -9, 1986.
631
in Ref. ture of a Rigid Body Using Straight Line Correspondences, 2, pp.365-394.
27. Y. C. Liu and T. S. Huang, "Estimation of rigid body motion using straight-line correspondenc€s,"Proceedingsof the IEEE Workshop and Analysis, May 7-9,1986, Kiawah on Motion: Representa,tion Island, SC, pp. 47-52.
28. R. O. Duda and P. E. Hart, Pattern Classificationand SceneAnalysis,Wiley, New York, p. 373, 1973.
29. J. K. Cheng and T. S. Huang, "Image registration by matching relational structures,"Patt. Recog.17(1), 149-160 (1984). 3 0 . A. Mitiche, S. Seida, and J. K. Aggarwal, Line-BasedComputation of Structure and Motion Using Angular Invariance, Proceedings of the IEEE Workshopon Motion: Representationand Analysls, Kiawah Island,SC, May 7-9,1986, pp. 175-180.
3 1 .J. P. Gambotto and T. S. Huang, Motion Analysis of IsolatedTargets in Infrared Image Sequences,Proceedings of the Seuenth ICPR, Montreal, Quebec,July 30-August 2, L984. 32. K. Kanatani, "Detecting the motion of a planar surface by line and surface integrals," Comput. Vis. Graph. Img. Proc.29, 13-22 (1e85).
3 3 . T. Y. Young and Y. L. Wang, "Analysis of 3-D rotation and linear shape changes,"Patt. Recog.Lett. 2r 239-242 (L984).
14. J. Q. Fang and T. S. Huang, "Some experiments on estimating the 3-D motion parameters of a rigid body from two consecutive image frames," IEEE Trans. PAMI 6(5), 547-554 (September 1984).
34. J. A. Orr, D. Cyganski, and R. Yaz, Determination of Affine
18. R. Y. Tsai and T. S. Huang, "Uniqueness and estimation of 3-D motion parameters of rigid bodies with curved surfaces,"IEEE Trans. PAMI 6(1), L3-27 (January 198a). 19. B. L. Yen and T. S. Huang, Determining 3-D Motion Parameters of a Rigid Body: A Vector-Geometric Approach," Proceedingsof the ACM Workshopon Motion, Toronto, Ontario, April 1983.
38. J. Limb and J. Murphy, "Estimating the velocity of moving im-
Transforms from Object Contours with No Point Correspondence Information, Proceedings of the /CASSP 85, Tampa, FL, March 26-29, 1985,pp. 24.10.1-4. 15. W. K. Gu, J. Y. Yang, and T. S. Huang, Matching Perspective 3 5 . D. Cyganski and J. A. Orr, 3-D Motion Parametersfrom Contours Using a Canonic Differential, Proceedings of the /CASSP 85, Views of a 3-D Object Using Composite Circuits, Proceedingsof Tampa, FL, March 26-29, pp. 24.9.I-4. the SeuenthICPR, July 30-August 2, 1984. 16. A. Mitiche and J. K. Aggarwal, A Computational Analysis of 36. K. Kanatani, "Tracing planar surface motion from a projection without knowing the correspondence,"Comput. Vis. Graph. I*9. Time-Varying Images,in T. Y. Young and K. S. Fu (eds.),HandProc. 29, L-12 (1985). book of Pattern Recognition and Image Processing, Academic Press,New York, 1985. 3 7 . F. Rocca,TV Bandwidth CompressionUtilizing Frame-to-Frame Correlation and Movement Compensation, in T. S. Huang and 17. H. C. Longuet-Higgins, "A computer program for reconstructing a O. J. Tretiak (eds.),Picture Bandwidth Compression,Gordon and scene from two projections," Nature 293, 133-135 (September Breach,London, L972. 1981).
20. X. Zhu&trg, T. S. Huang, and R. M. Haralick, "Two-view motion analysis: A unified algorithm," J. Opt.Soc. Am. 3(9), 1492-1500 (September1986). 2L. T. S. Huang, Determining 3-D Motion/Structure from Two PerspectiveViews, in T. Y. Young and K. S. Fu (eds.),Handbook of Pattern Recognition and Image Processing,Academic Press, New York, 1985. 22. H. C. Longuet-Higgins, The Reconstruction of a Scenefrom Two Projections-Configurationsthat Defeat the 8-Point Algorithm," Proceedingsof the First Conferenceon Artifi,cial IntelligenceApplications,Denver, CO, Decembet5-7,1984, pp. 395-397. 23. R. Y. Tsai and T. S. Huang, "Estimating 3-D motion parametersof a rigid planar patch," IEEE Trans. ASSP 29(7),IL47-L152 (De-
agesin TV signals," Comput.Graph.Img. Proc.4,3LL-327 (1975).
39. B. K. P. Horn and B. G. Schunck, "Determining optical flow," Artif. Intell. 17, 185-203 (1981).
40. J. D. Robbins and A. N. Netravali, RecursiveMotion Compensation: A Review, in Ref. 2.
4r. C. Cafforio and F. Rocca,The Differential
Method for Image Mo-
tion Estimation, in Ref. 2.
42. H. H. Nagel, "Displacement vectorsderived from 2nd-orderintensity variations in image sequences,"Comput. Vis. Graph. Img. Proc. 21, 85-117 (January 1983).
43. E. C. Hildreth, The Measurement of Visual Motion, MIT Press, Cambridg", MA, 1984. 44. K. Prazdny, "Egomotion and relative depth map from optical flow," Biol. Cybernet.36, 87 -102 (1980). 45. X. Zhuang, R. M. Haralick, and J. S. Lee, "Rigid body motion and the optic flow under a small perturbation," IEEE Trans. PAMI, in press.
46. H. C. Longuet-Higgins, "The visual ambiguity of a moving plane," cember1981). Proc. Roy.Soc. SeriesB 223(D, 165-170 (1984). 24. R. Y. Tsai, T. S. Huang, and W. L. Zhu, "Estimating 3-D motion parameters of a rigid planar patch, II: Singular value decomposi- 4 7 . A. M. Waxman and S. Ullman, Surface Structure and 3-D Motion tion," IEEE Trans. ASSP 30(4), 525-523 (August 1982); correcfrom Image Flow: A Kinematic Analysis, CAR-TR-?A, CS-TRtion, 31(2), 5L4 (April 1983). L332, Center for Automation Research,University of Maryland, October 1983. 25. R. Y. Tsai and T. S. Huang, "Estimating 3-D motion parametersof a rigid planar patch, III: Finite point correspondencesand the 48. A. M. Waxman and K. Wohn, Contour Evolution, Neighborhood three-view problem," IEEE Trans. ASSP 32(2), 2I3-22A (April Deformation, and Global Image Flow: Planar Surface in Motion, 1984). CAR-TR-58, CS-TR-L394,Center for Automation Research,University of Maryland, April 1984. 26. B. L. Yen and T. S. Huang, Determining 3-D Motion and Struc-
632
MS. MALAPROP
49. A. M. Waxman and K. Wohn, Contour Evolution, Neighborhood Deformation and Image Flow: Textured Surfaces in Motion, in W. Richards and S. Ullman (eds.), Image (Jnderstanding ,94. Ablex, Norwood,NJ, 1984. 50. T. S. Huang and R. Y. Tsai, Image SequenceAnalysis: Motion Estimation, in Ref. 1. 51. T. S. Huang, Three-Dimensional Motion Analysis by Direct Matching, ConferenceDigest, Optical Society of America Topical Meeting on Computer Vision,Incline Village, Nevada, March 2022, L985,pp. FAI-I to 4. 52. T. S. Huang and S. D. Blostein, Robust Algorithms for Motion Estimation Basedon Two Sequential StereoImage Pair s, Proceed,ings of the Conferenceon Computer Vision and Pattern Recognition, San Francisco,CA, June 10-18, 198b. 53. H. H. Chen and T. S. Huang, Maximal Matching of Two 3-D Point Sets, Proceedings of the Internationql Conferenceon Pattern Recognition, Paris, France, October 27-.31,1986. 54. S. D. Blostein and T. S. Huang, Estimating Motion from Range Data, Proceedingsof the First Conferenceon AI Applications, Denver, CO. December1984, 55. O. D. Faugeras and M. Hebert, A 3-D Recognitionand Positioning Algorithm Using Geometrical Matching between Primitive Surfaces,Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1989, pp. 996LTO?. 56. T. S. Huang, S. D. Blostein, and E. A. Margeruh, Least-squares Estimation of Motion Parameters from 3-D Point Correspondences,Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, Miami Beach, FL, Jun e 28-26, 1986. 57. J. Q. Fang and T. S. Huang, "solving 3-D small-rotation motion equatiors," Comput. Vis. Graph. Img. Proc.26, 189-296 (1984). 58. J. Q. Fang and T. S. Huang, "Some experimentson estimating the 3-D motion parameters of a rigid body from two consecutiveimage frames," IEEE Trans. PAMI 6(5), 547-55b (September1984). 59. M. A. Fischler and R. C. Bolles, "Random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography," CACM Z4(G),881-g9b (June 1g81). 60. G. Adiv, "Determining 3-D motion and structure from optical flow generated by several moving objects,"IEEE Trans. PAMI 7(4), 384-401 (July 1985). 61. M. Asada, M. Yachida, and S. Tsuji, "IJnderstanding of B-D motions in blocks world," Pattern Recognition,l7(L), 57-84 (1984). 62. J. O'Rourke and N. Badler, "Model-basedimage analysis of human motion using constraint propagation,"IEEE Trans. ?AMI z, 522-536 (1980).
70. B. Neumann, Natural Language Description of Time-Varying Scenes,Bericht No. 105, FBI-HH-B-105/84, August 1984, Fachberich Informatik, University of Hamburg, FRG. 7L. G. C. Borchardt, A Computer Model for the Representationand Identification of Physical Events, Technical Report T-L42, Coordinated ScienceLaboratory, University of Illinois, Urbana, IL, May 1984. T. S. Huamc University of Illinois The preparation of this entry was supported by Scientific Services Program, Battelle Columbus Laboratories contract DAAG29-81-D0100.
MS. MATAPROP This is a natural-language-understanding (qv) system in which the inference (qt) process is directed by frame-structured knowledge. Knowledge about mundane situations is structured in a modular hierarchy of frames, thereby allowing sharing of information between frames (see Frame theory). Developedaround L977 by Charniak at the University of Geneva (see E. Charniak, Ms. Malaprop, a Language Comprehension Progr&ffi, Proceedings of the Fifth IJCAI, Cambridge, MA, pp. L-7, L977). K. S. Anone SUNY at Buffalo
MULTISENSOR I NTEGRATION
Object-recognition systems using single sensors(typically vision) are still limited in their ability to correctly recognize different three-dimensional objects.By utilizing multiple sensors, in particular vision (qt) and touch, more information is available to the system. This entry is an attempt to show the utility of multiple sensorsand explore the problems and possible solutions to converging disparate sensory data for object recognition (seealso Color vision; Motion analysis; Proximity sensing;Sensors). 63. J. A. Webb and J. K. Aggarwal, "structure from motion of rigid Humans are able to make use of multiple-sensor input to and jointed objects,"Artif. Intell. 19(1),107-190 (1989). perform such tasks as object recognition very easily. In trying 64. J. A. Webb and J. K. Aggarwal, "Shape and correspondence," to recognize an object, one is able to integrate color, motion, Comput. Vis. Graph. Img. Proc. 21, t45-100 (1989). touch, shape, and language. The disparate kinds of informa65. S. S. Chen, Shape and Correspondenceof Nonrigid Objects,Protion supplied by these sensors is somehow able to converge ceedingsof the IEEE Workshop on Computer Vision, Bellaire, MI, into a coherent understanding of the objects perceived in a October15-18, 1985, scene.Robotic systems (see Robotics)will eventually have to 66. J. J. Koenderink and A. J. Van Doorn, Depth and Shape from incorporate this multisensor capability (1). Single robotic senDifferential Perspective in the Presenceof Bending Deformation, sors [e.9.,vision and, even more so,touch (2)] are yet to be well Preprint, Department of Medical and Physiological Physics, understood and utilized on anything approaching a human Princetonplein 5 Utrecht, The Netherlands, 1985. scale. So why should one bother with more than one sensor? 67. T. S. Huang, J. weng, and N. Ahuja, 3-D Motion from Image The answer is that multiple sensorscan provide information Sequences:Modeling, Understanding, and Prediction, Proceedthat is difficult to extract from single-sensorsystems.Further, ings of the IEEE Workshop on Motion: Representationand Analymultiple sensorscan complement each other to provide better sis, Kiawah Island, SC, May 7-9,1986, pp. 125-130. 68. M. D. Levine, P. B. Nobel, and Y. M. Youssef,A Rule-BasedSys- understanding of a scene. Similar issues are addressedby Henderson and Fai (3). The additional cornplexity posed by tem for Charactefizing Blood Cell Motion, in Ref. 2. multiple-sensor environments is tempercC by the great re69. J. K. Tsotsos,J. Mylopoulos, H. D. Corvey, and S. w. Zucker, "A wards in resolving the ambiguity that more than one sensor framework for visual motion understanding," IEEE Trans. PAMI, 2(6), 563-573 (November 1980). can bring.
INTEGRATION 633 MULTI-SENSOR The utilization of multiple sensorspresents five important issues for object recognition. They are representations for object models, organization of the database of models, accessingthe databaseof models, strategies for using sensors,and convergenceof sensory data. This entry is an exploration of these issues involved in converging multiple-sensor data for object recognition. Needless to say, these issues are of interest to both the robotics and AI communities. Recent proceedingsof conferenceson AI, as well as on automation and robotics are a manifestation of this point. The discussion focuses on possible solutions to these problems using as an example the limited domain of the kitchen. for Obiect Models Representations In model-based object recogRition the sensed data must be related to the object models at hand. If a multiple-sensor environment is postulated, one needsto have multiple representations of objects.The nature of the data sensedfrom vision and touch is quite distinct and suggestsdifferent representational models at work (2,4). Many different object model representations have been used in the past, including generalized cylinders (qv) or cones (5-7), polyhedra (8), and curved-surface patches (3). These systems, in general, try to compute these primitives alone from the senseddata. A major difference between systemsis the richness of the models in their databases. Systems that contain large amounts of information about object structure and relationships reduce the number of false recognitions. All of these systems are discrimination systems that attempt to find evidence consistent with a hypothesized
model and for which there is no contradictory evidence (9). Although the approach discussed here is similar, it is not based on a single primitive but on multiple features and surface properties that the sensorscan derive. The models used contain geometric, topological, and relational information about the objects to be recognized. Semantic discrimination similar to the net described in Ref. 10 is also an important candidate for inclusion in such a multiple representation system but is beyond the scopeof this entry. In human perception this discrimination is done in many different ways. One might perceive a unique feature, shape,or topolory that will lead down a path of recognition (seeFeature extraction; Shape analysis). It is not clear a priori what the path will be. For this reason one cannot rely on a single representation from a single sensor as the mechanism for recognition. Rather, one must leave the system open and available to follow any representational avenue presented to it from the multiple-sensed data. It is important that one tries not to impose arbitrary hierarchies on these representations that will limit the strategies that can be used with the sensors.One should be as aggressive and opportunistic as possible in exploring multiple paths toward recognition. SensorEnvironment.Multiple sensorsprovide the opportunity to discriminate between objectsbasedon features that are derived from different sources. The sensing environment shown in Figure 1 and describedin detail in Ref. 11 consistsof a stereo pair of CCD cameras along with a robot manipulator containing a tactile sensor (see Stereo vision; Manipulators). The tactile-sensormounted on a finger and on a hand is shown in Figures 2a and 2b, respectively. Recognizingthe strengths and weaknessesof each sensorsystem is important in order to effectively utihze them (seeFig. 3). The camerasare capableof extracting sparse three-dimension (3-D) data from the scene. The robot manipulator receives feedback from the touch sensor, allowing it to trace surfaces subject to varying sets of
Data base
V is io n system
Control system
T h e h an d / a r m system
Output description P U M A5 6 0 + P A H H
Stereopair System overview Figgre
1. System overview.
634
MUTTI.SENSOR INTEGRATION
parallelepipeds, and orthogonal slices across the object. Although these descriptors are by themselves not sufficient for recognition, their use with other sensedfeatures allows further discrimination among competing models.Failure to find a gross shape description will cause a greater space to be searchedbut will not prevent recognition since multiple pathways of sensory recognition are available. Surface properties are an extremely useful discriminator and are especially important with touch sensing.By modeling the objects as collections of surfaces,it is possibleto match a rich set of surface descriptions. The surface characteristics to be computed are area, curvature (including surface cavities), and 3-D moments (if a closedsurface).Area is a weak discriminator but will add support to hypotheses.The other measures are much stronger surface characteristics and will narrow the range of possibilities greatly. Holes are distinguishable features that can be measured with touch. Vision processing can hypothesize holes, and touch-sensing can be used to verify their existence. Further, touch sensing can quantify the holes to aid in matching. This approach emphasizescomputlng as many features as possible from the different sourcesto come up with a consistent set of interpretations of the data. Tl]. features range from weak descriptors like gross shape and area to specificdescriptors such as surface curvature, holes, and cavities. The conjunction of this senseddata will lead to a correct interpretation. It is important to note that the measures are three-dimensional. This allows one to utilize 3-D features of objectsrather than projective features, as Lowe does (6). Organizationof the Model Database
Figure 2. The tactile sensoron (o) a fingerand (b) a hand.
constraints. The vision is passive in nature and fast, with large bandwidth, whereas the touch-sensor is slow, has low bandwidth, and must be actively controlled. Vision is subject to the vagaries of lighting, reflectance, and occlusion.Touch, however, can feel occludedsurfacesand report back 3-D world coordinates as well as surface normals. Featuresfor Discrimination.Given this sensing environment, one should be able to derive features for discriminating among objects.The desirable features for the model are gross shape descriptors and surface properties and topology. Gross shape descriptors are important becausethey limit the search spacewithin the database(LZ).The most important gross feature to be distinguished is the planarity of an object. If an object appears to be planar, there are different representations and modeling techniques available for recognition than if it is three-dimensional. Determining if an object is planar is difficult with vision alone, but by tracing acrossthe object and analyzing the 3-D data, one can easily determine planarity. A tactile trace acrossthe contour of the object in two orthogonal directions allows one to interpolate a surface and test for planarity. In the kitchen domain planarity is irnportant in distinguishing flatware and plates from 3-D objects such as cups and glasses.Gross shape descriptors can be further extended to 3-D objectsby computing volumes, bounding
If multiple representations are used as stated above, these representations need to be organized in a coherent way for accessto the model information. The important points to consider here are the relationships between different representations and allowing these different representationsto converge (as discussedbelow). The databaseof objectmodelsconsistsof object records.Each object record contains a vector of features that can be sensed.An object record contains the following information: object name, gross shape, list of surface descriptions that comprisethe object, list of holes, list of cavities, and list of boundary curves joining the surfacesin the record. Gross Shape. Gross shape properties in the object record include the volume of the object, a measure of its planarity, and a description of its bounding rectangular parallelepiped. These properties allow a coarse filtering of the objects to be recognized. SurfaceDescriptions.The surface descriptions may be planar, quadric, or bicubic in nature, allowin g a wide variety of surface models to be used (as from CADiCAM systems).The objects in the database are modeled as collections of these surfaces. Besides a parameterized description of the surface, the surface's area and locations of curvature maximum and minimum are included. If the surface is a closedsurface that contains a volume, a 3-D moment set is also included for the surface. An example would be the handle of a cup. This is a very powerful feature becauseit provides the center of mass of the enclosedvolume of the surface and the moments of inertia, which form an orthogonal basis (13). The center of mass allows one to find the translational parameters that take one from
99 qJ
@
O \o
C-.|
r'\ CT
t . 1
C\
t.
co
L I
q lrJ
a
a E
o
qJ
7. TJ C o E o0 o a
r..l
tf (! a C' td
o
)r F
c o
TJ
z
H
o
TJ
U .-,
O !J
(0
F
b0 o a
(! O{
fi!
ct Id
o b.
F{
!
x
t=
b0
t1
o
'r{
c) E b0 o (n
tJ
$ q,
.l{
o
!
o 7.
.F{
--.,i
o
q-.r
tl
tl -t
O{
v1
'rJ
-o F
o X
>\ a
co
\T
O r\
-r C
g
c.l
O
ca
co
r , 1
o t!
ca
U)
:t
(n
crf
rn
o\
C'J
cn
(-, (-
@
r-1
tn
V)
7-
oo (n
q)
b{ \u a
7.
F
{tJ
(rl 0)
rl cJ qr !
.Fl
'rt
0) E
+J rl r-J frl
o0 .1)
!
qJ
q)
bo
E co
q)
q)
(n
a
!/ r.l
--1
X
OJ
..1
rn
tll
a
0,
.F{
-o
ftl r+l
! t
r--l a)
0) a
! .1
q,
.--l !J
arl
(n
q)
(n
(-
b0 5 'r{
q)
'rl
frl
I'r .1
lrJ
o
o0
r--J rr{ tl-{
cd q-]
(|..r tJ 5 a
t1-{ E
H
c,
U (0
G) -c
fl
o !
z
z
(/)
U
.F.{
cd \+-l L a a
t1 -,1
a{
U
X
tr
TJ
{J
!J
H
c q)
fil
a^)
O r,1
J
q,
z
F
qJ
q)
.r{
H
!
'z f
0, .o
q)
qJ 'l1
!J
€
0)
bl CN
l-r rYl
! -.t
5
a) a
=
o) )*-a
IJ
a
E
-q)
!
(n
n l
o
c.)
a
tt
€ c) ro o
E
+) I o
5 o co q)
ca
()
r-O
-t
a
(,
..
.2,
JJ
0 'r{
..J
+J U
+r f-{
U
(-)
O
a
td c-n
0)
(u a +) k
'-i
rt)
t! a
@
a
qJ
+-, rt
@
AI
@
a
E o 7
-{ q,
o0 0) o
o
'r{
H
0) r{
o
(t .r{ X
t'.r 4J
a qJ U
€ )
qr
-
TJ
CJ
Lr
q)
co ca
-.t
o\ :a
a
o U)
'lJ
7 4-.,
@
x H
co TJ
b0 o
a)
a)
q) U (!
o.
r-l-{
.F.l
E
€
c\
C{
a
fi t-
co
7,
!J
).
a 0) U rn t+-{ f-{
tJ 'r{
t-t
q,
l'1
b0
q)
O
.r{ 1J
a. E
-.|
'r{
(n
(, c0
U 'r.{
a
l.J n
o U
q)
!
CC q-{
n
!
E
rn
!
(n
TJ
c\^l
o E bl a
co
d
'----
o
ftl
.|-J l.J
c) .r-)
-)
E
-f,
Lrl
\o
rlJ a
frl a
t! a
c..|
a.l
u)
(n
c^) O a
a
O U)
a
O
(-
b0
H
IJ
H
'r{
.i{
U
T,
qJ '--1
-1
o C
U)
-
a
TJ
l) a q) 'rl
'.{ !J
&
o c.
r-.t
ri c)
q) t-. a
dl r l
:c
(,) o
J r--{
+1 Lr a CN
rFl
-l
q)
..1 l.J
.-{ q,l
t!
t-r a
('
nl f
l
o r-l-{ t-.
..'{ U lr 'r{
a c,) o U! !
!
a
a!
IA 'r{ q)
z
(^
O O
l.i
b0 .n -U
r l
Q
ca
co
E a)
d
nl
(-
tr
q)
c) rj-a q-t
Q
tJ
q)
@
U 0,) .r-.)
F{
IJ
'r{
Ll-{
E r)
b0
z
7"
{J
a o U
E
c, 1,
7.
IJ
tr
X
dl
q)
F{
tr
an
b0 .t<
fr
E
X
l-J
FT t
rn aJ
qJ
c) 'IJ
E
li
cfl
cl
L..
t! a
L)
Lr-\
q,
z
c) tr 4
q,
@ c\
tn
v)
'oc)
E .rl
t, c)
-$ c\,1
qJ
7-
06 0)
-{
E +J TJ
rn
636
MULTI-SENSOR INTEGRATION
world to model coordinates, and the inertial axes can be rotated to conform to the model's axes and determine the rotational parameters of the transform. Surface descriptions are a good choice of primitive since surfaces can be interpolated from the combined visual and tactile data (L4,L5). Once these surfaces are built from the data, one can begin matching of objectsagainst these descriptions. Holes. Holes are modeledas having a center,an axis, and a diameter. The description of holes is useful for matching and determining the transform parameters from the model coordinates to sensedobject coordinates.If a canonical upright position is assumedfor objects,one needsonly to find three translational and one rotational parameters to affect the transform. If an arbitrary position is assumed,one wilt need to find two extra rotational parameters (one assumes no scaling of objects). Holes are especially powerful in that they can be hypothesizedby vision and explored by touch to determine their extent. Cavities. Cavities are useful also. In this domain the size of a cavity can distinguish between a glass,cup, or bowl. This can be done with a measure of the cavity's depth versus diameter, both of which can be sensedby touch. This discrimination is especially useful in visually occludedparts of a scene. BoundaryCurves. The boundary curves represent the joining of surfaces.They are important in that they may be sensed visually and by touch. A curve discoveredby vision may be nothing more than a lighting artifact; touch-sensingcan verify or contradict the ambiguous visual data. Finding a boundary curve will help in ascertaining relations between surfaces as modeledin the database.They may be modeledby segmentsor spacecurves. Accessingthe Database:A SearchProblem With multiple representation schemes the intractability of search (qv), the combinatorial explosionproblem, becomesimportant. This has been always one of the central activities of AI. Berlin (16) has comprisedthe approachesto deal with the combinational problem into three ideas: The first is the concept of abstraction. The secondis the notion that some subproblems can constrain the number of solutions and hence should be tackled first, and the third is the idea that if possible one should always apply the best operator for getting from the initial state to the goal. If one acceptsthe view that the recognition processis problem solving (17), all of the problem-solving apparatus can be applied to battle the combinatorial explosion in accessingthe database.In particular, in representing 3-D shapes, one must go above the pixel level toward more abstract concepts,like surfaces, cavities, and holes/handles. Second,the database must be accessiblevia attributes, i.e., parts of the object, since frequently only partial views of the objectsare avalable to the sensors.Furthermore, these partial views constrain the plausible interpretations. This complies with the idea of subgoals and how they constrain the search. The third point made by Berlin is applicable during the sensing strategies and is discussedbelow. In accessingthe database, one can index on the features that are found from the sensing. The object record is a rich data structure that tries to capture as much 3-D information
about the object as possible. It is necessarily broader than spatial data structures in which the multisensor data are indexed on a common thread, the thread being the 3-D coordinates of some types of data. This is clearly too limiting a data structure. Point data are too small to achieve the kinds of higher level recognition needed. The primitives needed are surface descriptions (with a spatial extent much greater than a few points), areas, moments, curvatures, holes, and cavities. Further, relationships are neededthat are more complex than nearest 3-D point neighbors, as has also been proposed.What this means is that the representations are disparate kinds of data linked back to the common thread of the object itself. This indexing mechanism contains pointers to all objects with commonality in terms of a particular feature. Indices have been built for any access method available from the senseddata. There will be accessavailable through the attributes of surfaces,holes,cavities, and boundary curves as well as through gross descriptors in the object record. Strategies Given the multiple representations and the organization of the database above, how are these sensors strategically employed? Here one deals with vision and touch as the sensors. One must obviously begin with a bottom-up approach(seeProcessing, bottom-up and top-down) to recognition. The stereo algorithm used (10) supplies sparse sets of 3-D points that reflect changesin image intensity due to reflectance,lighting, and geometry. Attempting to segment this image will yield regions that are surfaces,holes,or background. Tactile inspection of these regions will help one determine their true nature. If a closedcontour of a region from vision can be defined,this will allow one to trace acrossthe interior of the surface and create a bicubic surface description (14). This description can be analyzed for curvature and area and finding the location of possiblecavities. This analysis can then generatesearchpaths into the database.If a hole is seenfrom the vision-sensing,the tactile sensor can quantify it and further index into the database.Edges that form surface boundaries can be seen,verified by tactile sensing, and again used to index into the database. It is important to note that the sensing is providing true 3-D data that are less prone to multiple interpretation than projective data. Orthogonal slice-tracing can be performed acrossthe whole of the object to infer bounding volumes and planarity as discriminators. The intersection of these different pathways into the database will give rise to candidate objects. If the intersections are few, one can proceed to verify the objects with appropriate sensorsby following the relational pointers betweenparts of the objectsunder consideration.The idea is to have small amounts of bottom-up search invoke high-level knowledge to guide the remaining search. The high-level knowledge is useful as a guide and important becauseit is three-dimensional in nature and allows one to ignore the problems posed by the distortion of the visual image that takes place as a result of projective transformation and can suggest what to look for with touch-sensing in occludedparts of the scene. Another useful strategy is the detection of outliers. Outliers are regions detectedby vision that are largely surrounded either by background or holes (e.g.,the handle of a cup when imaged). This is a candidate for being a closed surface that encompassesa volume. By tracing both the visible and nonvisible sides of the outlier, a closed surface can be interpolated
INTEGRATION MULTI-SENSOR
637
and from this surface 3-D moments computed. This moment BIBLIOGRAPHY set can then be matched against the database for correspon1. M. Shneier, S. Nagalia, J. Albus, and R. Haar, Visual Feedback dence.Finding a match will allow transform parameters to be for Robot Control, in IEEE Workshopon Industrial Applications of calculated from object to world coordinates. Vision, pp. 232-236, May 1982. Industrial view another confusitg, or If this initial sensingis fruitless Int. J. Robot. L(2),3of the object is possible.This can be done by moving the cam- 2. L. D. Harmon, "Automated tactile sensing," (L982). 32 eras or having the tactile-sensor perform a search of the oc3. T. Hendersonand W. S. Fai, Pattern Recognitionin a Multi-Sencluded part of the scene. Convergenceof SensorYData Given the multiple sensors,how doesone convergeon an object?Since one is able to accessthe databasefrom many sensory avenues, one can eventually hope to yield a unique or possibly small number of interpretations of that data. A unique interpretation implies that convergence has been achieved with respect to the features sensed. It is entirely possible that an object with the same set of sensedfeatures (but unknown to the database)has been sensed.Only convergence can be reported at this level of detail (18). This also implies refinement in sensing. There are geometric features that are small enough to escapethe sensor'sdetection. One can only convergeon an object spacethat is within a neighborhood of the sensory resolution of the feature. How then is convergencemeasured?One way is to measure a unique convergencebased on the nurnber of features found and also on the number of visual views and tactile probes performed. If the measure is less than a predeterminedthreshold, one can use the object record to guide a further search to verify the existenceor absenceof other features. Another way is to view the features as partially ordered sets (lattice structure) and compute the intersection between those measured and those, in the database (19). The cardinal number of the intersected set is the measure of convergence.Independent of the ways in which the measure of convergenceis obtained, it cannot be said with certainty that the object being sensedis the modeled object without a lengthy and painful verification of all modeledfeatures. Therefore, convergenceis definedto be a measure basedon the number of features found, granularity of the sensors,and completenessof the model. If the environment is constrained to already modeled objectswith exact object models, unique convergenceof a feature set ts l00Voconvergence. As the constraints are relaxed, the measure decreases. With a small number of candidateseach one can be checked for properties predicted by the model database. The sensing will soonarrive at a single candidate that can then be verified. If one ends up with a large set of models,the view is lacking in discernablefeatures and perhaps a new view is neededto converge the database accessmechanisms.This will prevent one from wasting time on views that are not rich in feature data. Convergencewill be complicated by sensoryerrors. If there is a nonconvergenceof sensory data, one can choosethe largest uniquely converged subset of the feature data to try to eliminate the poor data. However, if convergencestill is not possible, one can resort to new views and/or recomputation of sensory information to try to find the errors. In this case nonconvergenceis useful in isolating error sensing. If a world is assumedwhere objectsother than the modeled objects exist, one has a mechanism for partial matching (qv). The largest uniquely convergedsubset of the features will be an indicator of object similarity and will allow one to proceed with partial recognition of similar objects.
4.
5.
6. 7.
8.
sor Environment, UUCS TechnicalReport 83-001,Department of Computer Science,University of Utah, Salt Lake City, UT, July 1983. L. A. Cooper,Flexibility in RecognitionSystems,in J. Beck, B. Hope, and A. RosenfeldHuman and Machine Vision, Academic Press,New York, 1983,pp. 97-106. P. Allen, Integration of Vision and Touch for Recognition of 3-D Objects,Ph.D. Dissertation, University of Pennsylvania,Philadelphia, September1985. D. G. Lowe, PerceptualOrgqnization andVisual Recognition,Kluwer Academic,Boston,MA, 1985. D. Smitley and R. Bajcsy, Stereo Processing of Aerial Images, Proceedings of the Seuenth International Conferenceon Pattern Recognition,Montreal, August 1984. R. Nevatia and T. Finford, "Description and recognition of curved objects,"Artif. Intell. 8,77-98 (1977).
9. M. Posner and A. Henik, Isolating RepresentationalSystems,in J. Beck, B. Hope, and A. Rosenfeld(eds.),Human and Machine Vision, AcademicPress,New York, 1983,pp. 395-4L2. 10. J. H. Connell and M. Brady, Learning Shape Descriptions,Proceedings of the Ninth International Joint Conferenceon A/, Los Angeles, August 1985, pp. 922-925. 11. A.Izaguirre, P. Pu, and J. Summers,A New Developmentin Camera Calibration Calibrating a Pair of Mobile Cameras,Proceedings of the IEEE International Conferenceon Roboticsand Autamation, St. Louis, March 25-28, 1985,pp. 74-79. L2. L. Shapiro and R. Haralick, A Hierarchical Relational Model for Automated Inspection Tasks, Proceedingsof the IEEE International Conferenceon Robotics,Atlanta, March 1984,pp. 70-77. 13. P. Allen, Surface Descriptionsfrom Vision and Touch, rn Proceedings of the IEEE International Robotics Conference, Atlanta, March 1984,pp. 394-397. 14. W. E. L. Grimson and T. Lozano-Perez,Model Based Recognition and Localization from Sparse Three-dimensional Sensory Data, AI Memo 738, MIT AI Laboratory, Cambridg", MA, August 1983. 15. A. P. Reevesand B. S. Wittner, ShapeAnalysis of Three-Dimensional Objects Using the Method of Moments, Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, June 1983, pp. 20-26. 16. D. L. S. Berlin, "SPAN: Integrating Problem-SolvingTactics," Proceedingsof the Ninth Intl. Joint Conf. on Artificial Intelligence, Los Angeles,August 1985,pp. 1047-1051. L7. I. Rock, The Logic of Perception,MIT Press,Cambridge,MA, 1983. 18. J. R. Hobbs, Granularity, Proceedingsof the Ninth Intl. Joint Conferenceon Artifi,cial Intelligence, Los Angeles, August 1985, pp. 432-435. 19. T. Matsuyama and V. Hwang, SIGMA: A Framework for Image Understanding: Integration of Bottom-Up and Top-Down Analysis, Proceedings of the Ninth Intl. Joint Conferenceon Artifi,cial Intelligence,Los Angeles,August 1985,pp. 908-915. General References M. Brady, Criteria for Representationsof Shape,in J. Beck, B. Hope, and A. Rosenfeld (eds.),Human and Machine Vision, Academic Press,New York, 1983,pp. 39-86.
638
MUS|C, At tN
R. Brooks, "Symbolic reasoning among 3-D models and 2-D imag€s," Artif. Intell. 17, 285-349 (1931). R. B. Fisher, Using Surfacesand Object Models to RecognrzePartially Obscured Objects,Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1983,pp. 989-995. T. D. Garvey, J. D. Lowrance, and M. Fischler, An Inference Technique for Integrating Knowledge from Disparate Sources, Proceedingsof the SeuenthInternational Joint Conferenceon Artificial Intelligence,Vancouver,August 1981,pp. g1g-82S.
Related to full-fledged composing programs are "interactive" compositional editors. Although the AI scope of these editors is reduced to the extent that decisionsare undertaken by the human user-rather than the program itself-Al techniques can help to isolate musical "objects" within a working score (by parsing (qv), pattern matching (qv), or other methods)for modification by the user (5). The area of musical performance has recently seen several promising developments, including programs that combine analytic capabilities with "performance practice" knowledge M. Potmesil, Generating Three Dimensional Surface Models of Solid for the purpose of "interpreting" musical scoresfor totally auObjectsfrom Multiple Projections,IPL Technical Report 038, Imtomated performance (6); programs that analyze acousticdata age ProcessingLaboratory, RPI, Rensselaer,Ny, October Lgg2. or keyboard input in order to adjust an automated accompaniL. Roberts, Machine Perception of Three-Dimensional Solids, in J- Tippitt (ed.), Optical and Electro-optical Information Process- ment to a live soloist (7); and programs that acquire and analyze psychophysical data in order to determine corresponing, MIT Press,Cambridge,MA, pp. 1b9-I97,196b. dences betweensubjectivemusical attributes such as loudness I. Weiss, 3-D Shape Representation by Contours, Proceedingsof the or timbral similarity and objective synthesis parameters such Ninth International Joint Conferenceon Artificial Intelligence, Los as amplitude or frequency modulation index (8). Angeles,August 1985,pp. 969-972. The programs that comeclosestto emulating human listening are analytic programs that reduce note-by-note descripRuznxe Be;csy ANDPnrBn AIIBN tions of musical scoresinto stylistic and/or structural descripUniversity of Pennsylvania tions. The most elaborateprogram for analysis of musical key relationships is the 1968 EXPLAIN program by Winograd (9). EXPLAIN automatically parses chordal progressionsin order MUSIC,At tN to describe the key scheme and the role of each chord within this scheme.In determining theseroles, the program draws on I merely report-I cannot uerify-that composersalready claim an extensive knowledge base of chordal types. Winograd's to haue discouered musical applications of decision theory, write-up of the program draws strong parallels between the mathematical group theory, and the idea of 'shape'in algebraic nested key relationships of music and the generative gramtopology.Mathematicians will undoubtedly think this all uery mars of Chomsky (see Ref. 9). Nesting is also an important naiue,and rightly so,but I considerthat any inquiry, naiue or feature of a program for phrase analysis of unaccompanied not, is of ualue if only becauseit must lead to larger questionstunes developedby Tenney and Polansky (10). Tenney and in fact, to the euentual mathematical formulation of musical Polansky's program applies heuristic guidelines (seeHeuristheory, and to, at long last, an empirical study of musical tics) derived from Gestalt psychology to dissect a tune into facts-and I mean the facts of the art of combination which is structural components. composition(Igor Stravinsky, Ref. 1). There remain further applications of AI that are peripheral to direct musical experiencebut nonethelesssignificant. One is score printing, where automated decision making can Background greatly facilitate layout of musical symbols according to conElaborate "mathematical" formulations of compositional pro- ventions of musical topography (11). Another is the use of cedure had in fact appeared well in advance of Stravinsky's pattern recognition (qv) and parsing algorithms (seeParsing) statement (2). Indeed, detailed codificationsof "correct" and for score reading (I2) and for transcribing performance data "incorrect" practices have formed an integral component of into scoreoutput (13). musical theory since the late Rennaisance(3). It has been the Other areas include utilities that leave decision making advent of digital computers and AI, however, that has transcompletely under the user's control but employ intricate data formed Stravinsky's dream of "empirical study" into a practimanagement in order to adapt their memory requirements to cal reality-for it is now possible not only to formulate truly users' changing needs. Of particular significance in this area quantitative models of how composerswork but also to subject have been sound synthesisutilities developedby Vercoe,such these models to rigorous evaluation. AI techniquesare by no as the 1978 MUSIC11 program implemented with Hale and means limited to modeling the processof composition;these Howe (14).Whereasearlier synthesisutilities such as the MUtechniques show equal promise for investigating the two other SICV program by Mathews et al. (15) fixed ceilings on the modesof human intellectual involvement with music: the de- number of simultaneous notes a user could employ and the cision making undertaken by performers as they "interpret" amount of memory available to each note (".9., 32 notes,each musical scoresand the cognitive processesassociatedwith the filling 35 words of memory), MUSIC1l achievedmuch greater experienceof listening. flexibility by allocating a variable-leneth record for each note Of the three areas-composition, performance,and listen- and by invoking "garbage collections" whenever it ran out of ing-musical composition has so far been the area of most available memory. intense activity (4). Composingprograms have servedboth as analytic tools for testing models of preexisting compositional practices and as creative tools for crafting new artworks. Such ContextualPatternSelection programs have been the only ones so far to exploit one of the An effective way of making musical decisions is to work out most powerful of AI techniques,backtracking (qv), and it is for conceivable situations in advance of specific decisionsand to this reason that the bulk of the programs singled out for dis- establish a repertory of appropriate responsesto each situacussionin this entry will be composingprograms. tion. As simplistic as this approach might seem,it accurately
MUSIC, AI IN
modelsthe practices of many human musicians, especiallyimprovisers responding to the real-time pressuresof live performance. As an example of how this approach may be emulated by a machine, considera tune-writing program conceivedjointly by Hiller and Ames for the 1985Tsukuba Exposition (16,17).Portions of this program drew upon repertories of characteristic jazz patterns such as passing chordal progressionsor walking bass figures, which could be employed only in certain harmonic contexts. For instance, supposethe program wished to elaborate the basic chordal progression 1234 CM I
639
S c h e d u l ep i t c h e s a c c o r d in g t o p r e f e r e n c e
S e l e c tf i r s t p i t c h in schedule
Selectnext pitch in schedule
Testpitch againstrules
lDm lG7 I
(Chordal qualities are abbreviated as follows: M - major, m minor, 7 _ dominant, and o from CM in measure 1 to Dm in measure 3 might be left alone; it might be elaborated by a single chord in measure 2 such as Am, A7 , or Ebo; it might be elaborated by two chords in measure 2 such as Am-A7, Em-A7, or Em-Eb7, etc.Similarly, the progression from Dm to G7 might be left alone or elaborated by passingprogressionssuch as D7, Ab7, and Am-D7. For each pair of potential "source" and "destination" chords, the program maintained a repertory of appropriate progressions. Once the program had determined an appropriate repertory, selection from within this repertory was further conditioned by stylistic criteria. For example, Ab7 would only be considered between Dm7 and G7 if a global flag encouraging"substitute domin artt" progressionshad been set.
A d v a n c et o next note
ls schedule exhausted?
Backtrackto most recentconflict
Figure 2. Heuristicschedulingwith backtracking.
ConstrainedSelection Constrained selection models the behavior of a composerwho is released from the pressures of live performance and therefore free to evaluate potential choiceson the basis of generalized criteria. The kernel of a constrained selectionprocessappeared in composing programs as early as Hiller and Isaacson's L957 llliac Suite for string quaftet (18). In this work Hiller and Isaacsonused the mechanism illustrated in Figure 1 to select pitches for notes according to 16 simple "melodic" and "harmonic" rules drawn from eighteenth-century counterpoint. Examples of their seven "melodic" rules are a rule against any one pitch repeating three or more times and a rule requiri.tg any "skip" to be followed by a "step"; among their nine "harmonic" rules are restrictions against dissonant intervals such as seconds and sevenths (restrictions that eigh-
S e l e c tp i t c h at random
Test pitch a g a i n s tr u l e s
Doespitchpass a l lr u l e s ?
Doespitch pass all rules?
A d v a n c et o next note
Figure 1. Decision mechanism of the Illiac suite (18).
teenth-century pedagoguesnever imposed), rules forbidding motion in parallel perfect consonances,and requirements for contrary motion. The element of randomnessin their decision mechanismensuredthat choiceswould be unbiasedwithin the universe of "correct" music defined by the rules. Hiller and fsaacson's procedure remained the primary mechanism for constrained selection in computer composition for almost two decades,being applied not only to pitches but also to other musical attributes such as rhythm and timbre. By the late 1970s,however,two seriousdefectshad becomeapparent: Potential conflicts between rules often gave rise to elaborate qualifications and special cases,and the mechanism was incapableof dealing with situations in which none of its available options (e.9.,pitches) passedall of the rules. Remediesfor both defectshave since been incorporatedinto programs developedby Ebcioglu (19,20)and independentlyby the author (16,L7,2L,22).Figure 2 illustrates basic improvements in Hiller and Isaacson'soriginal method.This improved approach has served Ebcioglu as an empirical means for testing rule-baseddescriptions of traditional styles (23) (seeRulebased systems);for the author, the same approach has served directly as a tool for creating new works in contemporary style. One major improvement shown in Figure 2 is that although the least flexible criteria are still implemented as rules, many qualifications and special casesare eliminated by converting more flexible criteria into heuristic guidelines. These guidelines enable recent composing programs to organize options into schedulesso that the most desirable options will be considered first. (For some problems it is more efficient to apply the rules prior to scheduling.)Examples of the inflexible crite-
M U S I C ,A I I N
c
A+
G
c
E
E
A+
c
G
A I
-l
D+
B#
c
A+ G
I I I
B
D
I
M
I
F
E
Ab
DT
B
C E
w
E+
D,'
E
A#
Gf
C
G
A+
E
C
Eb
B
G
A
D IEf G+IBf
DT Af
I I
I
M
ffi
Ef G
I
Ef
I
fr
+ A+
a
DI C
I
C
GT
tr A
G
F*
E D'
B+
At,
Gf
GW |
(rf
A+ G' E b F* A} B
ffi l-il-] I B
I
C G
U
E+
ts
E Ab DC B C
D+
U
B
E Dt,
B
E#
A# D'
tA+ D#
G+
Gf l-
C
A+
G
G M
I
E
t Af
)r
Eb B G A b Ff
E I-E
I - - J
Figure 3. Hierarchic generation of a musical structure in Ames's Crystals. Each successivediagram describesthe structure at an additional level ofcompositional detail. The horizontal dimension ofeach rectangle indicates duration, the vertical dimension indicates register. and the letters indicate degrees ofa 36-tone scale (from Ref.29).
MUSIC,At tN ria, which for Ebcioglu's chorale-writing program (20) are expressedstrictly as rules, include downward resolution of dissonant tones and avoidance of parallel perfect consonances; criteria that he expressesheuristically include admonitions encouraging upward resolution of leading tones and discouraging similar motion. Ebcioglu's program also incorporates capabilities for short- and long-term harmonic planning using procedures developed by the Austrian musical theorist Schenker Q0. Rules, heuristic guidelines, and form-generating procedures in Ames' composingprograms of this type (16,L7,2L,22) have been designed freshly with the aesthetic goals of each specificpiece in mind. Many of his rules are determined by the instrumental media he has selected;the style of a work will often be partly describedthrough lists of intervallic relationships designated by him as appropriate. Among his heuristic proceduresis a mechanism that maintains pitches in statistical balance by favoring the least-usedpitch in any decision. A secondand highly critical improvement is that should the program exhaust an entire schedulewithout finding any pitch that passesall of the rules, it has the capability to backtrack, revise one or more earlier notes that might be causing difficulties, and try again (24). This capability substantially increases the ability of a composingprogram to copewith very stringent constraints, and it clearly models the behavior of a human composerin the act of applying a pencil eraser. Both Ebcioglu's and the author's programs addressthe demands of contextual sensitivity by representing polyphonic textures as linked networks. An instance of a polyphonic network has been describedQ7); each description of a note includes pointers to the note's predecessorand successorin the same contrapuntal part, along with a pointer to a list of simultaneous notes. This list may be reordered "on the fly" by the program, so that those parts with the most urgent tendencies (e.g.,resolution of dissonances)may be given greatestpriority. Similar use of pointers can help a composingprogram to coordinate melodic details with broader relationships such as harmonic progressions. ComparativeSelection An alternative to constrained selection is to evaluate every possiblecombination of options for the purposeof choosingthe "best" solution to a problem. This approach is practical only when the number of combinations is manageable to a computer. The set of programs used to create Protocol for solo piano (25) illustrate types of conditions under which this approachis feasible. As an example, consider the stage where the description of Protocol had been refined to the point where each segment of the composition had been allocated a repertory of chords, and the next task at hand was to determine in what order these chords should occur. When consecutivechordshad many tones in common,the resulting progressionwas inappropriate to the work's aesthetic purposes.Procedureswere therefore designedto evaluate each progressionby tallying pairs of consecutivechords with no common tones, pairs with one common tone, etc. The program combined these tallies into a heuristic measure that most greatly emphasized those pairs of chords with the most tones in common, and it evaluated every possibleprogressionof chordsby systematically permuting the repertory. The best progression was the one for which this heuristic measure was minimal.
641
HierarchicProcesses Hierarchic procedures for musical composition begin with a general description of a musical form and then deduce the work's specificcontent by applying productions capableof acting on their own results. Although such an approachwas first advocated by Roads as a compositional analog of Chomsky's "context-free grammars" (26), it was quickly generalized to the "context-sensitive" casein a utility developedby Holtzman (27). The user of Holtzman's utility describesforms and productions using alphabetic tokens; the program expands "nonterminal" tokens by applying productions and ultimately imparts acoustic significance to the results by mapping "terminal" tokens into "sound objects" such as notes or larger aggregates.Results may be modified by applying rudimentary musical "transformations" such as transposition or retrograde. Procedures developed by Jones (28) and independently by Ames Qg) differ from Holtzman's utility by incorporating musical awarenessdirectly into production mechanisms.Figure 3 illustrates how the productions operatedfor Crystals (1980)for string orchestra, the first complete musical score composed using hierarchic productions. Hierarchic analysis programs reverse the top-down process (seeProcessing,bottom-up and top-down) describedabove,applying reductions first to the musical details and then to their own results until finally a general description of the form has been obtained. Instancesof such processesinclude the analysis programs of Winograd (9) and Tenney/Polansky (10); one component of Ebcioglu's chorale-writing program (20) is a routine that uses Schenker's procedures to isolate short- and longterm cadencesin a user-supplied chorale melody.
BIBLIOGRAPHY 1. I. Stravinsky and R. Craft , Expositionsand Deuelopments,University of California Press,Berkeley, CA, p. 99, 1959. 2. J. Schillinger, The Schillinger System of Musical Composition,2 vols., Carl Fischer, New York, L94L. 3. G. Zarlino, The Art of Counterpoint, (1558) G. Marco and C. Palisca (transl.), W. W. Norton, New York, 1968. 4. Plans for a comprehensive AI system embracing composition, analysis, and performance-with the ability to "learn from its failures" are described in O. Laske, "Towards a musical intelligence system: OBSERVER," Numus-West (3), 11 (1973). OBSERVER was never fully implemented. 5. The notion of an "intelligent composer'sassistant" is discussed from an opposing viewpoint in C. Roads, "Researchin music and artificial intelligence,"ACM Comput.Suru. L7(2),163(1985). 6. M. Clynes, Secrets of Life in Music, Proceedingsof the International Computer Music Conference,Paris, p. 225, 1984. 7. B. Vercoe and M. Puckette, Synthetic Rehearsal: Training the Synthetic Performer, ICMC Proceedings,Vancouver, B.C., p.275, 1985. Also J. Bloch and R. Dannenberg, Real-Time Computer Accompaniment of Keyboard Performances, ICMC Proceedings, Vancouver, 8.C., p.279, 1985. Vercoe'sand Puckette'sis by far the more sophisticatedsystem, for it respondsto acoustic(rather than keyboard) data and "learns" with rehearsals. 8. W. Martens, PALETTE: An Environment for Developing an Individualized Set of Psychophysically Scaled Timbre s, ICMC Proceedings,Vancouver,B.C., p. 355, 1985. 9. T. Winograd, "Linguistics and the computer analysis of tonal harmony," J. Mus. Theor. Iz(L),2 (1968).
642
MYCIN
10. J. Tenney and L. Polansky, "Temporal gestalt perception in music," J. Mus. Theor. 24(2),205,(1980). 11. D. Byrd, "An integrated computer music software system," Comput.Mus. J. L(2),55 (1977).SeealsoJ. Maxwell and S. Ornstein, MOCKINGBIRD: A Composer'sAmanuensis, videotape, Xerox Corporation,Palo Alto, CA, 1981. 12. A. Andronico and A. Ciampa, On Automatic Pattern Recognition and Acquisition of Printed Music,ICMC Proceedings,Rome,Italy, p.245, L982. 13. C. Chafe, B. Mont-Reynaud, and L. Rush, "Toward an intelligent editor of digital audio: Recognition of musical constructs," Comput.Mus. J. 6(1),30 (1982).Rhythmic pattern analysisas a means for verifying acquired scoredata is describedin B. Mont-Reynaud and M. Goldstein, On Finding Rhythmic Patterns in Musical Lines, ICMC Proceedings,Vancouver,8.C., p. 391, 1985. L4. B. Vercoe, MLISIC11 Manual, MIT Experimental Music Studio, Cambridg", MA, L979. 15. M. Mathews, J. Miller, F. R. Moore,J. R. Pierce,and J. C. Risset, The Technology of Computer Music, MIT Press, Cambridge, MA, 1969. 16. L. Hiller and C. Ames, "Automated composition:An installation at the 1985 International Exposition in Tsukuba, Japan," Persp. New Mus. 23(2),196(1985). 17. Technical aspects of the Tsukuba composing programs are detailed in C. Ames, Applications of Linked Data Structures to Automated Composition,ICMC Proceedings,Vancouver,8.C., p.25I,
1e85. 18. L. Hiller and L. Isaacson,Experimental Music, McGraw-Hill, New York, 1959.Reprinted, Greenwood,Westport, CT, 1979. 19. K. Ebcioglu, Computer Counterpoint, ICMC Proceedings,Flushirg, NY, p. 534, 1980. 20. K. Ebcioglu, An Expert System for Schenkerian Synthesisof Chorales in the Style of J. S. Bach,ICMC Proceedings,Paris, p. 135, 1984. 2L. C. Ames, "Stylistic automata in Gradient," Comput.Mus. J.7(4), 45 (1983). 22. C. Ames, "Notes on Undulaftt," Interface, L2(3) 505 (1983). 23. A useful and detailed breakdown of a more recently developed
NATURAL.LANG UAGEGENERATION Natural language generation is the process of deliberately constructing a natural-language text in order to meet specified communicative goals.The term "text" is intended as a general, recursive term that can apply to utterances or parts of utterancesof any size, spoken or written. In people,whether a text is spoken or written has implications for the amount of deliberation and editing that may have gone on; if "spoken" language is identified with a lack of revision, most programs today "speak" even though nearly all only display words on a display screen. Since the choice of whether to revise or whether to use print or voice is usually not an option for a generation program today, these particulars are only mentioned when they are an issue in a program's design.
melody-harmonizingprogram appearsin M. T. Thomas,VIVACE: A Rule-Based AI System for Composition, ICMC Proceedings, Vancouver,B.C., p. 267, 1985. 24. Backtracking in composing programs was first advocated in S. Gill, "A technique for the composition of music in a computer," Comput. J. 6(2), L29 (1963).
25. C. Ames, "Protocol: Motivation, design, and
production of a com-
position for solo piano," Interface ll(4), 2 1 3 ( 1 9 8 2 ) . 26. C. Roads, Composing GrarnFrlers,Computer Music Association, San Francisco,CA, 1978. 27. S. Holtzman, "A generative grammar definitional language for music,"Interface,9(L),1 (1980).
28. K. Jones, "Compositional applications of stochastic process€s," Comput.Mus. J. 5(2),45 (1981).
29. C. Ames, "CrAstals: Recursive structures in automated composition," Comput. Mus. J. 6(3), 46 (1982).
C. Aups Eggertsville,NY
MUSICALCOGNITION. SeeCognitive modeling.
MYCIN A medical consultation system (see Medical advice systems) for diagnosisof blood infections (e.g.,meningitis) and the recommendation of drug (antibiotic) treatment, MYCIN was written in I97 5 by Shortliffe at the Stanford Heuristic Programming Project (see E. H. Shortliffe, Computer-Based Medical Consultations: MYCIN, American Elsevier/North Holland, New York, 1976). M. Tnm SUNY at Buffalo
The goals come from another prograffi, perhaps an expert reasoning system or a ICAI tutor, that is motivated to talk to a human user. The texts that are produced may range from a single phrase given in answer to a question, through multisentence remarks and questions within a dialogue to full-page explanations. Generation is a different matter from simply having programs use English: Programs have been printing natural-language messagesat their users for as long as there have been computers, yet one doesnot want to think of an error message from a FORTRAN compiler as either constructed or goal directed, however well written it may be. An error messagedoes not "mean" anything to the program that prints it: Any connection between the string of words and the program's situation is strictly within the mind of the programmer who wrote
NATURALTANGUAGEGENERATION
that preprogrammed, "canned" text. Even the use of parameterized "format" statements, where the cannedword string can be augmented by names or simple descriptionsby substituting for variables, is not really generation. These "fill-in-theblank," or "template," techniques depend for their effectiveness on a tacit limitation in the number and complexity of the situations in which the program will need to use them; that they have been adequate up to now for expressing what programs have had to say is more of a comment on the simplicity of today's programs than on the capabilities of template-driven generation. In contrast with such "engineering treatments," research on natural-language generation, like the other areas of its parent field of computational linguistics (qv), has as its goal not just competent performance by a computer but the development of a computational theory of the human capacity for language and the processesthat engageit. For generation this focuseson the explanation of two key matters: versatility and creativity. What do people know about their langudge, what processesdo they employ that enables them to be versatile, varying their texts in form and emphasis to fit an enormous range of speaking situations, and creative, with the potential to express any object or relation in their mind as a naturallanguage text? The need to accommodatethese capabilities is the prime organizing force behind generation theories and is the basis of the special contributions that the peoplewho work on generation make to the rest of computational linguistics and AI. This entry describesAI research on natural-langu agegeneration with a historical perspective,emphasizing the special character of the problems to be solved.It begins by contrasting generation with language understanding in order to establish basic conceptsabout the breakdown of the processinto components and the flow of information and decisionsthrough it. A section of excerpts from the output of illustrative generation systemsfollows, showing what kinds of performanceare possible and where the difficulties are. In the remainder of the entry the common approachesto generation are surveyed, including characteristic messagesand the nature of a generator's lexicon. A separate section continues the survey with alternative approaches to the representation and uses of a gTammar. Character of the Generation Process.To understand why generation has the organizatton that it does,it helps to make a brief comparison with its more studied complementary process, natural-language understanding (qt). In contrast with the organization of the understanding process-which to a first approximation can follow the traditional stages of a linguistic analysis: morpholory, syntax, semantics, pragmatics/ discourse-the generation processhas a fundamentally different character. This fact follows directly from the intrinsic differences in the information flow in the two processes.Understanding proceedsfrom texts to intentions; generation does the opposite.In understanding, the "known" is the wording of the text (and possibly its intonation). From the wording the processconstructs and deducesthe propositional content conveyed by the text and the probable intentions of the speakerin producing it. Its primary effort is to scan the words of the text in sequence,during which the form of the text gradually unfolds; the scanning requirement forces a processbased on the management of multiple hypothesesand predictions that feed a representation that must be expanded dynamically. Major
643
problems are caused by ambiguity-one form can convey a range of alternative meanings-and by underspecificationthe audiencereceivesmore information from situationally motivated inferencesthan is conveyedby the actual text. In addition, mismatches in the speaker'sand audience'smodel of the situation (and especially of each other) lead to unintended inferences. Generation has the opposite information flow. It proceeds from content to form, from intentions and perspectivesto linearly arrayed words and syntactic markers. Its "known" is its awareness of its intentions, its plans, and the text it has already produced. Coupled with its model of the audience, the situation, and the discourse,they provide the basis for making choices among the alternative wordings and constructions that the language provides-the primary effort in constructing a text deliberately. Most generation systems do produce the surface texts sequentially from left to right, but only after having made decisions top-down for the content and form of the text as a whole. Ambiguity in a generator's knowledge is not possible(indeed,one of its problemsis to notice that it has inadvertently introduced an ambiguity into the text). Rather than underspecification, a generator's problem is to choose from its oversupplied sources how to adequately signal intended inferences to the audience and what information to omit from explicit mention in the text. With its opposite flow of information, one might assume that a generation process could be organized like an understanding processbut with the stages in opposite order. To a certain extent this is true: Identification of intention (goals) largely precedesany detailing of the conceptual information the audience should be given; the planning of the rhetorical structure that will be imposed on the information largely precedesany construction of syntactic structures to realize it; and the syntactic context of a word must be fixed beforethe precise morhological and suprasegmental form it should take can be known. But to emphasizethis ordering of linguistic representational levels would be to miss generation's specialcharacter, namely that generation is above all a planning (qv) process.It entails realizing goals in the presenceof constraints and limitations on resources; its efforts consist of making decisions: decisions to use certain words or syntactic constructions and decisionsto post constraints on later decisions.It is best organized as a processof progressive refinement. This perspective on generation as planning permeatesthe views of the people who work on it. A language's syntax and lexicon become both resources and constraints, defining the elements available for the construction of the text and also the dependenciesbetween them that determine their valid combinations. These dependencies,and the fact that they tacitly govern when the information on which each decision depends can becomeavailable are the fundamental reasonwhy generation programs do largely follow the conventional stagesidentified by linguists. Goal identification precedescontent selection and rhetorical planning, which precedessyntactic construction, only becausethat is a natural order in which to make decisions;it is simpler to go with the flow of the dependencies rather than jump ahead and take the chancethat a premature decisionwill have to be undone becauseit later turns out to be inconsistent. Today's research concentrateson understanding how best to represent what decisionsare possibleand the dependenciesamong them, as well as on how to represent the constraints and opportunities earlier decisionsplace on later ones as the processproceeds.
644
NATURALLANGUAGEGENERATION
The focus on planning and intention in generation research puts the underlying program in a pivotal position methodologically. Computational theories of processesmust be implemented-embodied in a program that actually performs the behaviors under study-before they can be tested for coherence and procedural adequacy. One cannot test a theory of talking without having the underlying program talk about something-planning and realization must be in the serviceof some actual goals. One is therefore forced to generate "for" some underlying progTam or else run the risk of basing one's theory on an unrealistic, incoherent foundation. Unfortunately, underlying programs that one can pick up "off the shelf" have inevitably been designed without the concernsof generation in mind. They turn out to be lacking in conceptual support for subtleties of intention and representation that generation researchersneed and to have structured their internal expressionsin ways that make it difficult for a generation system to use alternative perspectivesor gToupings. Faced with the potential problems of using underlying programs built to suit independentconcerns,generation researchers have adoptedvarious approaches.Somedeveloptheir generators as stand-alone facilities and concentrate on studying grammar or planning in isolation (1-3). Others have dedicated a great deal of their own development effort to building a task-basedconceptual program on their generator and give it something substantive to talk about (4-6). Stitt others work from an independently developedprogram but have interposed some kind of independent "planning" system in between to patch over the differences(7,8). None of these approacheswill lead soon to a general-purposegeneration facility that can be attached freely to any underlying program, though somework has been directed that way (9-11). Standard Componentsand Terminology.The natural-language generation component does not stand by itself. It fits within a man-machine interface, which it shares with a component that doesnatural-language understanding-the input to the system. In a good man-machine interface today one would also expect provisions for coordinated graphical input and output, complementing the natural-Ianguage I/O. Bridging the two is a representation of the ongoing discourse,which they both add to and use for reference.The interface may end here, or it may extend further back with other shared components such as a discoursecontroller that directs the actions the generator takes and coordinates the interpretations made by the understander. Behind the interface is the nonlinguistic program that human users employ reasoning (qv) or databas_e the interface to talk to. This program will be referred to here uniformly as the underlying program. It can be almost any type of AI system one can imagine: cooperativedatabase,expert diagnostician, ICAI tutor, commentator, apprentice,advisor, mechanical translator. The nature of the underlying program presently has no significant influence on the generator's design. Today most generation researchers work most often with underlying programs that are expert advisors,e.9.,Refs. 7 and 12. With an advisor program the control of where the conversation goesis most likely to rest with the program rather than the person using it. In addition, advisor programs and intelligent machine tutors are likely to have a goodunderstanding of what their human interlocutors are thinking. These features make them able to motivate fairly sophisticated texts, which makes them attractive to those generation researchers who are looking for already developed-programsto work with.
The generation process starts within the underlying program when some event leads to a need for the program to speak. In the simplest case this may be the need to answer a question from the user; with a sophisticateddiscoursecontroller it may be the perception of a need to interrupt the user's activities in order to point out an impending problem. Once the processis initiated, three kinds of activities must be carried out: f. identifying the goals the utterance is to achieve, 2. planning how the goals may be achieved, including evaluating the situation and the available communicative resources,and 3. realizing the plans as a text. Goals are typically to impart certain information to the audience or to prompt them to some action or reasoning. People, of course,talk for social and psychologicalreasonsas well as practical ones; but as these needs are beyond the ken of today's computer prograffis, AI research on generation is forced to largely ignore them. Planning involves the selection (and deliberate omission) of the information units to appear in the text (e.g., concepts,relations, individuals) and the adoption of a coordinating rhetorical framework or schemafor the utterance as a whole (e.g.,temporal progression,compareand contrast). Particular perspectivesmay be imposedon the units to aid in the signaling of intended inferences. Realization is the processof manifesting the planner's directives as actual text. It dependson a sophisticatedknowledge of the language's grammar and rules of discoursecoherency, and typically constructs a syntactic description of the text as an intermediate representation. The term "realtzation" is used technically within the field: For example, one speaks about choosing to "reaIize" a modification relationship as either an adjective or a relative clause. It emphasizesnot only attention to linguistic form but also knowledge of the criteria that dictate how those forms are used. In many research projects the process that does grammatical rcalization is called the linguistic component (10), and in some the planning and goal-identification processesare together called the strategic component (13). Usually it is only the linguistic component that has any direct knowledge of the grammar of the language being produced. What form this grammar takes is one of the points of greatest difference between generation projects, though all projects largely agree on the function a grammar should serve in generation. For the traditional linguist, d grammar is a body of statements in a notation. The content of the statements-the specific facts of a given natural language-is of lessinterest to the linguist, by and large, than the theoretical properties of the notation. These properties are measured by how expressive the notation is, what primitives it identifies, and what representations and principles it makes use of. The situation is not much different in theories of generation except that the notation-the procedural and representational framework-is designed to serve a very specific function with which the traditional linguist is not concerned, namely, to guide and constrain the processof generating a text with a specificcontent and goals in the presenceof a specificaudience.This has an overriding effect on the form gfammars take; more importantly, it also strongly influences the information they must include. The grammar is now responsible for defining the choicesthat a langu ageallows in form and vocabulary, and it
NATURATLANGUAGEGENERATION
must further include criteria of usage.Generation researchers must ask what circumstanceslead to deciding on one alternative rather than another, as well as what functions the various constructions of the language serve that make them appropriate for fulfilling a certain goal. Only by including such information can a grammar serve as a resourcedefining the options available to the text planner. The other, more obvious, function of a grammar is to ensure that the texts that are produced follow the rules of the language-that they are "grammatical." How exactly this is done is another point where the different schoolsof generation often part ways, but a commontheme is that the grammar functions by defining dependenciesand constraining decisions. The nonlinguistic plan or specificationthat directs realization is typically called the message; some researchers talk about '\ealizing the message"and speak of the conceptualand rhetorical representations maintained by the planning and goal-identification processesas being at "the messagelevel" (as opposedto realization activities at "the surface-structure level"). This is a convenient and commonsenseterminology, but one must be careful not to presume too much from it. The typical mental image evokedby the term "message"is of written notes passedfrom one personto another, e.g.,8s the result of a telephone conversation; however, this image does not fit the situation: Researcherswho study both planning and realization continually make the point that there is no clean line betweenthe two activities (see,€.9.,Refs.7, L4, and 15). Planning proceedsin layers of refinement and must appreciatethe linguistic consequencesof its decisions;the rcalization of units in early layers creates a grammatical context that imposes constraints on the range of realizations that can be planned for later. Goals may emerge or change in priority opportunistically as planning and even realization proceeds.
645
By the late 1970s generation systems of this simple but effective sort had become quite important in the early rulebasedexpert systems.They were neededto translate the large numbers of rules in these systems into an easily appreciated format in stylized English. A generator of some kind is required within these systems becausethe number of rules is large and their internal variation is too high to capture with a set of fixed, fill-in-the-blanks templates. It is a straightforward matter to provide a simple generation capability for any program where the objects in the knowledge base have a consistent structure, and there is only one situation-one communicative context-in which the text must appear. Such capabilities are developedquickly, typically on an ad hoc basis as the rule-basedsystem is developed(18,19). Generation researchers, however, are interested in more complex texts than the context-free presentation an expert system's rules can motivate. Today this almost invariably means that as well as working on their generator they must develop their own underlying programs to provide an adequate conceptual sourceto work from, but there are numerous technical problems in generation that can be profitably approachedwith only a minimal base. As an example, here is a simple description from a program by Sigurd (20). Sigurd's point was to study how grouping is signaled though intonational effects;this text is actually spoken by a Votrax speechproduction system. The submarine is to the south of the port. It is approaching the port, but is not close to it. The destroyer is approaching the port too.
Although its content will not win it a place on the New York Times Best Seller list, its structure, especially its use of the inference-directing function words "but" and "too," represents an important contribution. The source propositions in Stateof the Art the database of an expert system that reasoned about submarines and destroyerswould not be "packaged"with the concepThere is a firm consensuswithin the field (16) that versatility tual equivalents of such function words already in place and and creativity in machine-generatedtext is possibleonly if able to be read out by a simple template. This is becausethe inferencesthe words control are specificonly to one particular 1. the generator incorporates a comprehensivelinguistically choice of what facts are being mentioned and how they have principled grammar; been grouped-a planning decision that is not part of the rea2. the underlying program has a sophisticated,commonsensi- soning system'sjob but cannot be omitted in generation. cal, conceptual view; and A similar technical problem that is not yet well enough 3. the text planner can make use of models of the audience understood is "Subsequent Reference" (2I). What wording should be chosenwhen a reference to an object appears more and the discourse. than once in the text? Always using a pronoun may introduce Unfortunately, such generators are still only the subject of ambiguities; in general, careful reasoning can be neededabout research today. When none of these conditions are met, the how the audiencewill characterizethe actors in a text in order to judge what phrasing to use. Below is an example text from a state of the art in generated text is still about the same as it recent study of this problem by Granville Q2). He classifiesthe (17). program 1970 in SHRDLU in SHRDLU was Winograd's produced original sentences, which it constructed dynami- relations between a referent and its last point of mention and cally, as replies to the questions it was asked.It took program developsa set of structural rules for making subsequentreferexpressionsout of its model of the state of the blocks on its encesbased on it. table and the actions it had performed and applied what today would be called a "direct-replacement" procedureto make sim- Pogo caresfor Hepzibah. Churchy likes her, too. Pogo giues a ple grammatical adjustments to the verbs and linearize the rose to her, which pleases her. She does not want Churchy's expressionsto yield comfortably readable texts such as the one rose.He is jealous. He punches Pogo. He giues a rose to Hepzibah. The petals drop off. This upsets her. She cries. below. When did you pick up lthe green pyramid]? While I was stacking up the red cube, a large red block, and a large green cube.
The principal problem with that text as a piece of prose is that it is "choppy": No attempt has been made to group its individual propositions into larger units and the resulting sentences feel too short. Ultimately, such textural decisions re-
646
GENERATION LANGUAGE NATURAL
quire a linguistically sensitive analysis of text style; but they require a conceptual basis for the gfouping and an appre"t*o ciation of what a grouping will signal to the audience. This information is not easy to come by in today's candidates for underlying programs. It is no wonder then that the very best performances by generators have come from systems in which the generation ,"r."rcher was also the person who developedthe underlying program. That way one is sure that there will be a basis in the uttJerlying representation for any rhetorical attitudes or distinctions that the subject matter calls for and will be a conceptual perspectiveby which to organize groupings. An important caseitt poittt is the program PRorEus, developedby Davey in Lg74 (4). This program produced descriptions of games of tictac-toe (also called naughts and crosses)that are still among the most fluent texts ever produced by a machine.
thesis are what ERMA was planning to say before it cut itself off and restarted. You know for some reason I just thought about the bill and payment again. (You shouldn't giue rrle a bill.) (Uh> I was -thinking tiat I (should,n'tbegiuen a bill) of asleingyou whether it wouldn't be all right for you not to giue rrle a bill. That is, I usually by (the end,of the month know the amount of the bill), well, I immed,iatelythought of the objectionsto this, but my idea wasthat I would,simply count up the number of hours and giue you a check at the end of the month.
Clippinger and Brown developedan architecture of five major interlocking componentsthat took a thought from its first appearance as an interpersonal goal, through a fleshing-out u"a lexicalization, evaluation for social acceptability, interjection of attenuating phrasings, and sometimes a complete reworking to soften harsh impacts, while all the time realizing The game started.with my taking a corner' and you took an and uttering whatever text plan was in force at that moment. ad,jaient one.I threatenedyou by taking the middle of the edge This requirld something of a tour-de-force in terms of comoppositethat and adjacent to the one which I had iust taken but puter programming for 1974, and the project was not carried and io" blocked.it and threatenedme. I blockedyour diagonal further. haue you would mine, you blocked had. forh,edrne, forked,you. If but you took the midd,teof the edgeoppositeof the corner which HistoricalPerspectives on the Problem I tookfirst and,the one which you had just takenand soI won by It is quite striking to reahze that two of the most competent completing my diagonal. generation programs ever developed,Davey's PROTEUS and the oldest in the field. The naturalness of PROTEUS's descriptions come largely Clippinger's ERMA, are also among until the early 1980s First, for this: reasons two from its appreciation of tic-tac-toe as a game: It has a rich There are on the problem of worked people had ever few comparatively model of how specificmovesmay be seenas threats or counters hard-harder in very problem is the generation second, and, principle one that rhetorical the to threats, and it incorporates the area understanding, language than opinion writer's this in a information salient most the only put text in a should has processing natural-language in AI work the most of situation, e.g.,missedopportunities or forks, while leaving the where good deal A not independent. are matters These concentrated. other information to be communicated implicitly by inference. going on in the early 1970s, PROTEUS has the equivalent of an underlying program in its of work on generation was in fact dissertations that built Ph.D. of context in the principally proThese moves. tic-tac-toe the of analysis the for routines in language processresults significant of rush vide an annotation of the moves in terrns of threats, blocks, upon the first with the work of years before few a had come ing that etc., providing input to a planning facility that selectsthe best (17) with augmented Woods of and on SHRDLU Winograd (e.g., The vS. "fork"). "block" level of descriptionfor each move (23) (see Grammar, augmented(ATNs) networks transition into at a time three or two moves groups the planner then Davey and Clipping€r, sentencesaccording to what game-level relationship seemsto transition-network). In addition to on the adaptation Slocum and of Simmons work (e.g., was the there "threatprovide the best description of their motivation focusing Goldman of thesis the and generation to ATNs of Q4) grammar A but-block" or "although-threat-block-&-counter"). concepfrom generating when choice word how to organize on grouped and described the takes then and reahzation facility (25(9) works other (qv) as well as networks moves, works out the details of their form as English sen- tual-dependency that of reports initial the that however, say, fair to is It 28,64). tences, and producesthe words of the text. meetTINLAP important the at principally generation work, A rival to Davey's PROTEUS in fluency is Clippinger's (29-31), fell largely on deaf ears, and researchon Lg74 program ERMA (5). ERMA is the only program to date ing in 1975 of a hiatus for the last half of that has attempted to deal with the fact that people speak in generation went into something no work was done during that say not to is This real time and continue to think and plan as they do so. People the decade. perceivedby the larger not was generation years; rather, those reflect on what they are saying and notice, midsentence,omisto be working on. By problem important as an community dynamiby fix sions or unintended interpretations that they are entire sessionson generation at any cally replanning and restarting their speechin midutterance. contrast, today there natural-langu age processing is inwhere To model this behavior, Clipping€r, working with an under- large conference have also been three international There topic. as a cluded graduate assistant, Brown (63), analyzed 40 hours of transince 1983 with an everspecialists generation of workshops understand to in order scripts of a patient in psychoanalysis participants. of thai patientis motivations and reasoning patterns sufficiently increasing number 1980s generation was consideredby most early the Until parathe one of of account computational a provide to wetl (those who did not work on it) to be a relatively graphs in that transcript (shown below), which the program people in AI Indeed, it is a simple matter to take a stateproblem. (Actually, simple the detail. ERMA was able to reproduce in every representation of the sort people used in original transcribed paragraph included several additional ment in an internal (#supports :block 6 :block 3), coupleit s&Y, 1970s, middle the "UhS"and a "yOUknow"; there was also no attempt to account for the individuals, and proseparately stored attributes with of the some for for the specific time delays that occurred or big red block supports a gTeen one": Winograd's sentence-initial perseverations.)The text segmentsin paren- duce "The
NATURALLANGUAGEGENERATION
SHRDLU could do this in 1970.If this were all the competence one needed, generation would not be an important research problem. However, as soon as one begins to considerthe various ways that simple sentencecould be rewritten-the versatility the English language invites speakersto make use ofthe difficulties begin to emerge. In that text, for example, should one always say "a green one" and not "a green block"; what kind of circumstances call for one but not the other? Supposeone wanted to use the Support assertion as an attribute of the green block, for example, as a way to distinguish it from the other green blocks: ". . . the green block that's supported by the big red one." How doesone represent the grammatical knowledge that allows a generator to use its representation of the syntactic structure of the statement form of the text to produce the corresponding relative clause?How does the generator represent to itself in a general way the fact that the relative clause is even available or that such a use for the assertion is possible?Few people worked on generation in the later 1970s(or stayed with the problem for more than a year or two), either becausethey found the task too simple to be interesting (when working forward from the sorts of texts that reasoning programs neededat that time) or becausethey found it too difficult to make any headway (when working backward from the complexities of actual human texts).
647
cally similar versions of the same text and of the impact they will have on an audience?The secondquestion is control of the generation process.What defines the choicesthat have to be made in a given speaking situation? What provides the basis for ordering them? How does one organi ze and represent the intermediate results? What awarenessdoesthe systemhave of the dependenciesbetween choices?How are these dependencies representedand made to influence the control algorithms? Alternative answers to these questions will be described throughout the rest of this entry. This section covers the nature of messagesand approachesto the lexicon; the following section considersvarious treatments of grammar.
Control by ProgressiveRefinementof Message.All treatments of the diversity of forms have been bound together with accountsof control, making control the proper place to start in looking at the schoolsof thought as to how generation is actually done. Among generation systems that were built specifically to work from underlying systems, the predominant approach to control is to treat the messagefrom the text planner as a kind of progr&ffi, i.e., to see it as an expressionthat one evaluates with a special kind of interpreter. Again a caution is in order. These "messages"are not simply expressionswhose context and form are isomorphic to the target text that happen to be encodedin a non-natural computer language. They cannot just be translated. Of course, in the simplest treatCommonApproaches ments of generation, translation might be sufficient (as in most existing expert systems),but in treatments that focus on It is difficult to identify the common elements in the different research projects on generation. By contrast, in language-un- generation, the relations and arguments in a messageare best viewed as instructions to achieve a certain effect by linguistic derstanding research one can identify any number of primary approachesto the problem: using ATNs, semantic grammars means. The evaluation proceeds by progressive refinement (qv), demon-basedsystemsgroundedin conceptual-dependency from outermost instructions to inner. This control technique is representation, procedural semantics, and many more. These natural to the developers of the systems since it mimics the style of the programming languages that they use and takes schools of thought have names, a body of literature, and a coherent historical development over decadesor more. Gener- advantage of the almost unconsciouspreferenceamong practiation research cannot yet be said to have any schoolsin this tioners of AI to follow a function-parameter, predicate-argument style of representation. sense.This is partially becausehistorically only a small number of individuals have made this their primary area of reThe most common messagestoday are not constructed by search (as just discussed);Iarge research groups focused on any planner but are simply data structures extracted from the generation have formed in only the last few years. A more underlying program and given a special interpretation by the significant reason is that the nature of generation research generator. This is common practice in programs that need to has made it difficult to see the commonalities among the difexplain their reasoning (32,33)(seeExplanation), with one of ferent generation systems.The principal problem is the lack of the clearest and earliest examplesbeing to explain the reasona common starting point: Unlike parsing research,where it is ing embodied in simple natural-deduction proofs in the prediobvious that one must start by identifying and grouping the catecalculus. Below is such a proof, and the text producedis by words of the text, independent research efforts in generation an early version of McDonald's program MUMBLE (10); the inevitably construct their messagesusing different internal proof itself is taken from earlier generation work of Chester (34). representation languages, use differing amounts of planning, and focus on orthogonal technical problems. This lack of any immediate basis of comparison has made it hard for peopleto Input build on each other's work or even to test their own examples Line /: premis on another researcher's system. Nevertheless, the various generation projects have more in common with each other Exists(x) lbarber(x) and than not. There are common threads running through the Forall(y).. shaves(x,y)iff not.shaves(y,y)l projects: similar approaches,similar representations, similar Line 2: existential instantiation (/) glammars. Two organizing questions are of common concern.The first barber(g) and Forall(y).. shaves(g,y)iff not.shaves(y,y) is how to confront the diversity of forms in natural languages Line 3: conjunction reduction (2) to develop functional accounts of them-to answer the question of why a person will use one form rather than another and Forall(y).. shaves(g,y)iff not.shaves(y,y) to do so with a formal, computational account that a machine Line 4: universal instantiation (3) can use in dealing with people. Put another way, what is a person'smodel of the differencesbetween syntactically or lexishaves(g,g)iff not.shaves(g,g)
648
NATURALLANGUAGEGENERATION
Line 5: tautolory @) shaves(g,g)and not.shaves(g,g) Line 6: conditionalization (5,1) (Exists(x) [barber(x) and Forall(y).. shaves(x,y) iff not.shaves(y,y)l implies (shaves(g,g)and not.shaves(g,g)) Line 7: reductio-ad-absurdum (6) not(Exists(x) barber(x) and Forall(y).. shaves(x,y) iff not.shaves(y,y)) Output Assume that there is some barber who shaueseueryonewho doesn't shaue himself (and no one else). Call him Giuseppe. Now, anyone who doesn't shaue himself would be shaued by Giuseppe. This would include Giuseppe himself. That is, he would shaue himself, if and only if he did not shaue himself, which is a contradiction. This rrleansthat the assumptionleads to a contradiction. Thereforeit is false, there is no such barber. The fluency of this text derives from an ad hoc model of the communicative force that accompaniesa given instance of an inference rule of natural deduction (e.g.,"premis" or "universal instantiation"). The model provides an accountof the motivations of the proof writer in selectingwhat rule to apply, e.g., that the point of the right side of the biconditional in the first line is to place a restriction on the variable Y (". who doesn't shave himself"). These motivations license the decisions to realize the lines of the proof in specific ways. These motivations, however, do not appear anywhere in the proof (which was the sole input to the program). They are only presumed and so are valid only for a few example proofs written with that particular personal style of natural deduction. The paucity of information or motives and perspectivesin the messagesof the underlying program is a perennial problem of work on generation: Computational linguists are forced to read into the data structures of the underlying programs because they do not already include the kinds of rhetorical instructions the generator needsif it is to employ the syntactic constructions of the language in the way that a person would. Without such "extra" information, the coherency of what is said-especially for texts more than a few sentences in length-will depend on how consistent and how thorough the authors of the underlying programs have been in their representational conventions:A generator has no choicebut to treat a symbol like "premis" or the biconditional in the same way each time it seesthem in the same context. If consistencyis maintained, the imaginative designer can make up for the deficienciesby embellishing the data structures oncethey are inside the linguistic component. When a text planner is brought into the process,messages can be built from a combination of data structures from the underlying program and instructions about perspective and rhetorical effect that the planner introduces (7). Below is an example of a complex message-a "generation progra111"that leads to text of the quality a person would produce(taken from a design study reported in Ref. 35). Specificationof effects to achieve are marked by colons in front of the symbols. The content information to be conveyed is given by reference to
internal frame objectsnamed in angle brackets. This content is to be put in specificperspectives(e.g.,main event and particulars), and the effect is to direct reasoning about linguistic alternatives in the presenceof given rhetorical, and eventually grammatical, constraints. If the researcher's goal is to approximate the fluency and specificity of texts authored by people, messageswill normally be as complex as this. Specification (the-day's-events-in-the-Gulf-tanker-war :events-require-certification-as-to-source (m ai n -ev ent#( same-event -typ e-varyi n g-pati en t # (hit-by-missiles Thorshavet) #(hit-by-missiles Liberian)) :unusual#(number-of-ships-hit 2) :identify-the-ships) (particulars #(damage-report Thorshavet Oslo-officials) #(damage-report Liberian Lloyds))) Output Two oil tankers, the Norwegian-owned Thorshauetand a Liberian-registered uessel,were reported to haue beenhit by missiles Friday in the Gulf. The Thorshauet was ablazeand under tow to Bahrain, officials in Oslo said. Lloyds reported that two crewmen were injured on the Liberian ship. The goal of fluency and intentional specification of form motivates many of the more elaborate bits of computational machinery that constitute the common threads running through different research projects, particularly the use of phrasal lexicons and an intermediate linguistic representation. Stepping through a simple example will show why these are needed. Consider the logical formula below, given in the prenext notation that a program would typically use internally. (This example follows the treatments of Chester and McDonald described above.) This is the commonestkind of messageone will find today: an expression straight from the model of an underlying program (the natural-deduction proof system),now given a special interpretation becauseit is being used to specify a text. (exists x (and barber(x) (forall y (if-and-only-if shaves(x,y) (not shaves(y,y)))))) In this formula the generator is immediately confronted with choices of realization. Should the quantification be expressed literally ("There exists an X such that ."), or should it be folded within the body as determiner information on the realization of the variables (" some barber,,)? Should the biconditional if-and-only-if be realized literally as a subordinating conjunction or interpreted as a range restriction on the variable ( yielding the modifying relative clause "anyone who doesn't shave himself"T). A predication like barber(x) should presumably always be decodedand converted to a specification of how the variable is to be describedsince it reflects the logician's convention of expressingtype restriction through initial conjunctions;the alternative of using an extra sentence("X is a barber") would be too unnatural. The other
NATURALLANGUAGEGENERATION
choicesare substantive, however, and need to be deliberated over. In message-directed progressive refinement treatments, such deliberations are usually managed by grouping the alternatives according to the type of object involved. The objects that populate the "mind" of the underlying program, in this case logical connectives,predicates, and bound variables, are all linked to the words and grammatical constructs that are appropriate for realizing them through "specialist procedures" maintained within the generator. These procedures are the equivalent of the lexicon in an understanding system. The specialistsbuild a realiztng phrase by drawing on lexical information associateddirectly with the individual logical objects. They are able to look at properties of the objectssuch as when they were last mentioned or what kinds of objectsthey have as arguments. Each object typically has associatedlexical items: A constant may have a name; a predicate may have an adjective or a verb. The specialist does its work by putting these into a phrasal context that will be completedby the recursive application of other specialists,e.9., the two-place predicate "shaves(x,y)"becomesthe clause template "x shavesy." In this control regimen the execution of each of the specialists is compartmentalized and taken up in the order dictated by the hierarchical form of the controlling expression,in this casethe formula. The quantifier "exists" would be dealt with first, then the "and," the "forall ," etc.Consideration of how an element of the formula is to be interpreted is delayeduntil it is actually reached in the stepwise,incremental refinement process.Relations provide linguistic templates by which to order the realtzations of their arguments, and the processproceeds recursively. This provides the benefits of the principle of least commitment, expediting the generation processas a whole by avoiding the possibility of having to "back up" out of prematurely made realization decisionsthat turn out to be incompatible with the glammatical context defined by a higher template.
649
by peoplethat the internal symbolswill correspondto naturallanguage words, and indeed, there is invariably an intended coffespondencein at least the back of an AI programmer's mind between the symbol and the word when they use it. Careful representation researchers point out that their conceptual terms have no real meaning in and of themselves: They could perfectly well be replaced with artificial print forms like G007 and the programs would continue to work perfectly well. The fact that one is forced to make deliberate discriminations and word choices when working from expressionsover neutral, underspecified primitives means that the problem will receive a good deal of attention. A discrimination net design invites the generation researcherto go beyondthe base distinctions by object type and to include contextual factors like the speaker'semotional perspectivein the decisions.Consequently, generation work based on underlying programs written using conceptualdependencyhas involved someof the most creative and interesting work on coordinatedword choice of any in the field. Below is a sample from work by Hovy (37). Hovy's aim is to bias the text to emphasizea desired point of view, in this caseto report on this February primary in such a way that the results look good for Carter even though he lost. Kennedy only got a small number of delegatesin the electionon 20 February. Carter just lost by o small number of uotes.He has seueral delegatesmore than Kennedy in total.
In contrast, representations based on frames, for whatever historical reason, tend to involve the use of a very large number of "primitive" terms, in principle at least one for every word sense in a natural langu dge, with the commonalities among terms indicated by reference to an abstraction-generalization network. When working from such representations, lexical choice is often a nonissue since each term can be uniquely associatedwith a natural-language word. This is not to say that choice of wording on the basis of affective perspectiveor LexicalChoice. Someapproachesto machine reasoningem- degree of specificity for words cannot take place; rather, they are now seen as conceptual decisions rather than linguistic phasize the selection of a small set of primitives (qv) and the statement of a program's knowledge as a set of expressions decisions. As a pragmatic matter, generation research that works off of such fine-grained representations tends to largely over these primitives plus a set of constant terms for individuals. This has the advant agefor reasoning of giving the com- ignore the problem of lexical choice and put its energies elsewhere. monalities among situations a structural prominence. This makes inferences easy to draw becausethey can be bundled PhrasalLexicons.What word to associate with simple coninto natural groups by the primitives. However, the reduction of the range of human actions to a set of, e.9., only 13 concep- ceptual terms like "barber" or "shaves" is obvious; however, for the objectsin complex underlying prograffis, lexical choice tual primitives means that a great deal of the specificity that can be more problematic. Representations based on "frame verbs, will the case in this language carry, the of words the systems"employ structured objectsthat denoteencapsulations have been distributed throughout the expressions and will of entire conceptual schema, whose "names" will consist of a have to be collected and discriminated during generation if specific verbs are to be used. Goldman pioneered this use of single, highly hyphenated symbol, e.9., "example-intrinsicsimilarities-with-compeditive-product." Such conceptuallyundiscrimination nets to determine the best words for realizing whole expressionsin his thesis on generation from conceptual- interpreted "primitives" have a reasonable place in underlydependency representation (36). He demonstrated how one ing prograffis, at least pragmatically, since an expert system can note qualitative properties of a phenomena without havwould determine word choice by working outward from the core primitives, testing the other parts of an expression for ing the common sense to understand it in enough detail to derive the term compositionally the way a person could. Techcertain properties. For example, from the action primitive "innically these terms can be a considerableproblem for generagest" one might get the verbs "drink," "eat," "inhale," tors since they may encodeentire sentencesat onceyet will be "breath," "smoke," or "ingest" by testing whether, e.g., the used in rhetorical contexts where they may needto be modified object ingested was a fluid or smoke. Notice that one of the available words was "ingest," the with adverbs or adjectives or elaborated by subordinated clauses. least marked (most abstract) alternative the discrimination The natural recourse in this situation is to use a phrasal net allowed. It is inevitable in computer programs developed
550
NATURALLANCUACEGENERATION
lexicon. This notion was identified in 1975by Becker (38) and is an important tool of generation systems. Linguistically, a "phrasal" lexicon is a conceptual extension of a standard, word-based lexicon to include entire phrases as unanalyzed wholes on the same semantic basis as words. This provides a means of capturing in a natural way the open-endedidioms and manners of speechthat peopleuse every day. Since people appear to use these "fixed phrases" as undigested wholes, Programs need to be able to do the same. This means that there need not be any internally represented expressions whose parts and relations are the direct source of the words and syntactic relations of the phrase-precisely what is neededto deal with heavily hyphenated symbols.Such texts can be quite good even though the underlying program understands little of what it is saying. The example below is from work by Kukich (6); another notable effort specifically employing a phrasal lexicon is that of Jacobs(39).
be consideredin turn from the perspectiveof the problemsthat have particularly motivated its use. Someof the details that make a text "grammatical" arguably do not and should not have any counterparts in a message from an underlying program. Person and number agreement of subject and verb are an obvious case in English; relative pronouns (e.g.,"who" vs. "whom"), the infinitive marker "to," and very large numbers of other linguistic phenomenaare the same.This is not to say that these have no conceptualcounterparts: agreement can be viewed as an expressionof the semantic relation of predication, the lack of tense is often an indication of the action being generic, etc. The point is rather that this class of grammatically motivated information is not relevant to the text planner-it is not a natural part of the messageand consequentlyshould originate in the linguistics component. The question for the generation researcher is how to state this information and how to ensure that it is brought to bear at the appropriate moment. Wall Street securities markets meandered upward through Parsimony encourages the computational linguist to atmost of the morning, beforebeing pushed downhill late in the tempt to share as much of this information as possiblebetween doy yesterday.The stock market closedout the day with a small both generation and understanding systems.Given the radical loss and turned in a mixed showing in moderatetrading. differences in the intrinsic character of the control and information flow in the two processes,this leads researchers to This information announcement was computed directly declarative accountsof language rather than procedural ones, from an analysis of the data for the day's market behavior. with elaborate derivational paradigms like generative gramQualitative points in the results were paired directly with the mar being ruled out of considerationquickly. When the purstereotypicalphrasesof such announcements:"a small loss,""a pose of the generation system is not to provide a communicamixed showin gl' "in moderate trading." Objects,actions, and tions facility for a mechanical actor, systemsbasedliterally on time points were mapped directly into the appropriate word versions of transformational generative gTammar have been strings: "Wall Street securities markets," "meandered up- quite appropriate. Two casesin point are the rule-testing facilward," "(be) pushed downhill," "late in the day." The composi- ity developedby Friedman for the use of linguists to check the tional template driving the assembly of these phrases into a consistencyof large sets of rules (40) and the pedagogicalICAI text was basedon clausesbuilt out of the S-V-Advp phrase: system of Bates that has been used in the teaching of English (market) (action) (time point). The clauseswere then grouped as a foreign language (3). into sentencesaccording to a few heuristics. Among the long-standing linguistic traditions, about the most neutral paradigm that survives this criterion of being able to provide a declarative account is a system of rewrite Treatmentsof Grammar rules. One of the very earliest mechanical generation systems In the study of generation the choice of formalism for repre- of any sort was developedby Yngve in 1959using a pushdown senting the language's grammar has always been bound up autom ata and a body of context-free phrase-structure rules with the choiceof control protocol. Broadly speakirg, there are with ad-lib lexical insertion (41). Though it was not messagethree approachesto this combined design decisionthat can be driven and generated text that was semantic nonsensewhich identified: consequently would make it uninteresting as a generator today, it did establish the legitim acy of the enterprise of provid1. stating the grammar as an independent body of statements ing explanatory accounts of psycholinguistic phenomena and filtering against it (with functional unification gram- through appeal to the computational properties of a virtual mar as the prime example); machine operating over representations of linguistic rules, a 2. using the grammar to specify all the valid surface struc- methodology that is becoming increasingly important to comtures that texts the language can have and then stating the putational and noncomputational linguists alike. planner's choicesand the output of realization in terms of There are, of course, new linguistic paradigffis, many of surface structure (message-drivenapproaches,TAG gram- them now put forward by people with computational backmars); and grounds. One of these, functional unification grammar (FUG), 3. stating the grammar as a traversable graph structure and developedby Kay (42), has been employedin generatorsand is giving it control of the whole processonce a text plan has deliberately put forward as a "reversible" grammar, i.e., able been constructed (ATNs and most uses of systemic gram- to serve equally well as a controlling description in generation mars). and understanding. There has yet to be any thorough comparative evaluation of these three alternative designs; individuals have adoptedone or the other largely becauseof accidentsof their own history: who they studied with, what was available locally, etc. This entry will maintain a studied neutrality. Each approach will
FunctionalUnificationGrammarin Generation.As presently used, functional unification grammars (FUGs) give a generator a modular, independent way of supplying the purely linguistic information that the processmust have and do so without imposing specific demands on its control structure. The
NATURALLANGUAGEGENERATION
651
lack of demands to specify a control structure carries the enFUGs are used to flesh out minimal, conceptually derived tailment that one must be willing to live with whatever con- functional descriptions, e.g., that the head of some noun trol structure is supplied. For Kay's FUG this is nondeter- phrase is to be the word "screwdriver." Recent work by Patten ministic unification. If efficiency of execution is not relevant, (49) uses a systemic grammar in very much the same way. this, of course, is no problem, however, there are indications Operations at a semantic level of the kind performed in other that the generality of the FUG notation gives them undesir- approachesby planning level specialistsspecifya set of output able computational complexity properties, i.e., generating a features within the systemic grammar, the equivalent of the structure from an arbitrary FUG appears to be NP-complete initial functional description that drives a FUG. A backward(43). Certainly, a specificindividual grammar may not require and then forward-chaining sweepthrough the systemic gramthis complexity to process;however, this result means that mar then determines what additional linguistic features must implementers of FUG generators must be especially careful in be added to the specification for a grammatical text to result. the construction of their algorithms since the formalism itself FUGs are used in a process of successivemergers, conis not efficient. strained by the rules that govern how two descriptionsmay be The term "functional" in the name of the paradigm speaks unified. The key idea is that the planner first constructs a to an intention on the part of its practitioners to go beyond minimal description of a phrase, which it can do using specialdescription of the structure of linguistic forms to addressthe ists in the conventional way (e.g.,that it wants to producea reasonswhy language is used. In contrast with the practice in clause with a certain verb and two NPs whose heads are cersystemic grammars, however, the functional elements in tain nouns). To flesh out the description to the points where it FUGs are thus far only a minimal extension beyond the stan- would be valid grammatically, it is then unified with the dard categorical linguistic vocabulary used traditionally to de- grammar: The description of the phrase and the specification scribe syntactic form (e.g., "claus€," "noun phrase," "adjec- of the grammar are progressively merged, with specifiedfeative") and are more in keeping with their paradigmatically tures in one being melded into unspecifiedor compatibly conclose neighbor, "lexical-functional grammar" (44). In the strained features in the other. The instantiation of someof the FUGs actually employed in generators, i.e., the Telegram description'spreviously unspecifiedfeatures by grammar-supgrammar developedby Appelt (2) and the realization compo- plied constants then brings about a ripple effect throughout nent written by Bossie (45) for the generation system of the whole system: Decisions that are dependent on a just-inMcKeown (8), the extensionsare just the addition of terms like stantiated feature force further unifications cyclically until a "subject," "premodifiet," or "head"-descriptions of the role a grammatically complete description of the utterance has been constituent plays within the category that dominates it. Clas- formed. In addition, elements in planner's description will sically functional concerns, such as the distinction between force selections among the disjunctive specifications in the "given" and "new" information in a sentencestudied by the grammar. For example, specifying a verb will force choice of Prague School(46) or the similar distinction between "theme" grammatical subcategorization,which in turn will force a seand "rheme" defined by the Firthian tradition (47), have not lection among the alternative clause-ordering patterns that yet been incorporated into FUGs. the grammar defines since only one of them will have a comFigure 1 shows an example taken from Appelt (48). It de- patible specification. scribesthe constituent roles that accompanythe phrasal cateThe complete description will amount to a rooted tree of gory, noun phrase. A full definition of the notation may be subdescriptions(constituents) as definedby the "pat" (pattern) found in Kay's 1984 paper (42); briefly, the brackets define feature, which dictates sequential order at each level. The systems of features and values: square brackets define con- actual production of the text is performed by scanning this tree junctive sets, a description must specify all of the features and reading out the words in the lexical features of each conwithin them; and curly brackets define disjunctive sets,where stituent. Constraint has come about tacitly through the unifionly one of the conditions defined by the feature-value pairs cation process-only compatible partial descriptions survive must be met. into the final result. This has the benefit that the planner need not be concernedwith grammatical constraints and dependencies but also implies the corresponding potential deficit that the planner cannot make use of knowledge of the grammatical CAT : NP pAT - (...(DET)(PREMODSXHEADXPOSTMODS)...) constraints should it want to. AGR - (HEAD AGR) From the point of view of grammar development,FUGs are -1 l-nrao - [cAr - N] a satisfying treatment becausethey allow one to state the facts of the language compactly, i.e., interactions between statel 1 l _ n n- r : N o N E J l I ments need not be explicitly spelled out in the notation (as L LTYPE COMMON ) J they would have to be in unaugmented treatments of phrase: : PRor I fsnAD : tcAT_ \I structure grammar) since they will come about automatically I \HEAD ICAT SCOMP]J I through the action of unification. D E T : N O N E I I I PREMODS: NONE I J LposrMoDS : NoNE SurfaceStructureas an lntermediateLevelof Representation. Faced with the difficulties under a message-directed,directfpnnuoDs : NoNE ) replacement approach of realizing conceptual relations di\ennuoDs : [cAT : ADJPI J rectly as words, a number of generation researchers have independently chosento interpose a level of explicitty linguistic representation between the levels of the messageand the Figure 1. words of the text [McDonald (27,50), Kempen and Hoenkamp
I f ITP -*lTg,."r*l\ |
652
NATURALTANGUAGEGENERATION
(51),Jacobs(52), and Swartout (53)J.They believe that a synalso independently adapted by Shapiro (24,28)whose generatactic description of the text under construction is the best tor is the most elaborate of the group. All of the systemshave a means of dealing with the problems of grammatically moti- similar design. They scan a data structure provided by an vated detail and the implementation of linguistically defined underlying program, in effect "parsing" it. The networks folconstraints and dependencies.The specificsof their individual low the top-down format found in most ATN parsers, leading treatments differ, but a common thread is clearly identifiable. naturally to a progressive refinement processas the generator The linguistic structures are producedas the output of real iza- scans its governing data structure from the most important, tion, which tends to be organi zedas choicesmade by special- widest scoperelations on down. For the early ATNs this strucists. The representations consist of a phrase structure of one or ture was a semantic net (qv) based on the concept of verbanother sort, i.e., hierarchies of nodesand constituents.They centered caseframes (another "functional" linguistic system). incorporate functional concepts like "subject" and ,,focus.,, A special node in the network, e "modality vector," specified They are most aptly characterized as a kind of "surface struc- the root-level information such as tense and aspector whether ture" in the generative linguist's sense;i.e., they undergo no a sentencewas to be active or passive.The primary function of derivation and are a proper description of the syntactic proper- the ATN in the early systemswas to line arizea network structies of the text that is produced. ture that was for the most part already encodedin a linguistic Loosely speaking, this intermediate level of surface struc- vocabulary and to supplement the conceptual information in ture is used by the control structure in the same manner in all the semantic net with the purely linguistic information that treatments. It is given as a tree, and its constituencypattern is all grammars must provide in generation. used directly as the specificationof a path-top-down and left As a linguistic formalism, ATNs are essentially a proceto right through the tree-that controls the sequenceand en- dural encoding of a generative grammar (Zg). The registers vironment of realization and the order in which the words that give them their "augmented" power are used u deepappear. The crucial consequenceof this "folding together,, of structure representation of grammatical relations,"r and the the processof realizing the elements of the messageand trapaths through the network encodeall of the alternative surversing the surface structure is to provide an explicit, examinface-level constituent sequences.Constraints propagate from able representation of the grammatical context in which an higher parts of the surface-structure tree to lower (i.e., to reelement will appear and thus make it available to constrain cursive subnets of the ATN) through the values in designated the choicesopen to realization and the text planner. registers, bringing the activity of those subnets under contexThe most elaborated theory of surface structure as an intertual control. Shapiro's ATN design is particularly enlightenmediate representationis McDonald,s(4b).His designincorpo_ itg, as his controlling data structure is the underlying prorates several points beyond the common elements of this ap- gram's entire computational state. (This state is encodedin a proach. Figure 2 shows a surface structure as it would be in particularly sophisticated intensional network formalism the middle of producing the text "Two oil tankers were reknown as SNePS (56).) The "parsing" his ATN performs ported hit by missiles." amounts to the constructionof an assessment,in terms approThe traversal path through the structure is indicated by the priate for directing the generation of a text, of the steps that arrows; the system is just about to select a reahzation for the must be taken to satisfy the program's intended communicaunderlying program predicate #(hit-by-missites). The reahza- tive goals-in effect an implicit dynamic message. tion is performed in the context of the constraints dictated by A further aspectof the ATN design, the fact that the means its position as a constituent within the sentence,which is rep- of actually producing the words of the text is the executionof a resented by the labels in brackets above it. The labeled circle side-effect action on the traversal of an arc, brings out the fact marks an "attachment point" where the surfacestructure may that this approach commits the generator to action almost at be extended by splicing in additional phrase structure, in this the very moment that a situation is perceived;e.g.,identificacase the verb phrase and complement structure for the verb tion of the object that is to serve as the subject is followed "report." This provides the capacity for producing texts whose directly by its realization and actual production. That this is hierarchical structures are different from that of the message possible is particularly striking when one appreciates that that leads to them, the custom ary form of texts constructed Shapiro'sATN never backs up (bB).This is quite unusual be_ under a message-drivencontrol structure. havior for an ATN, given that they are usually thought of as expresslynondeterministic devices,and it servesto emphasize DirectControlof Realizationby Grammar:SystemicGrammar the fact that generation is in its essencea processof planning. and ATNs. The augmented transition network, or ATN, was Since modern planning processesare characteristiculty deteradapted for use in generation almost from the moment of its minate, proceedingby incremental refinement and the posting definition. It was used first by Simmons and Slocum in Lg70 of constraints rather than trial and error, the behaviot of Sha(54,55),whose system was then used by Goldman (9). It was piro's ATN is to be expected. Viewed as a planner, the most significant deficit of the ATN designs is the difficulty of decoupling perception from action. ...-) TSENTENCEI-)... Generatorsbasedon systemic grammar deal with this problem directly by introducing an intermediary representation in the -/s\ ,-\ form of a set of features, abstract symbolsthat serve as partial tsuBJECTl TPREDICATEI specificationsof the text. To make a choice is to select a feaNP VP (passive) ture, which in turn creates a need to make certain other choiceswhile rendering still others irrelevant. As was the case two oil tankers [verrbl -t [infinitive-complement] with surface structure, the use of an intermediary representareport (hit-by-missiles...) tion allows the specificationof a text to be accumulatedgraduFigure 2. (from Ref. 35). arry,giving constraints an opportunity to propagate and influ-
,>
-/\
NATURALLANGUAGEGENERATION
ence later decisions.In this instance the abstract linguistic properties doing the constraining are not already bundled and formed as a phase structure but are distributed as a feature space. The overall specificationof the text is determined in recursive layers top-down, as it is in nearly all of the approaches (the prime exception being systemsthat use phrasal lexicons). Features are accumulated at a given level, e.g., the main clause of a sentence,until all of the aspectsin which clauses can vary have been consideredand the options settled. During this phase the issue is what functions are appropriate for the clauseto carry out given the situation and the speaker'sintentions; with those determined, the functional features are realized as a group and specify the clause's form. That form now createsan environment for the constituents of the clause.The determination of what functions each of them should serve is then carried out and, when completed,will lead to the reahzation of their forms, which in turn will lead to a functional analysis of their own constituents,and so on recursively until the constituents are words, at which point the text is read out as it would be from the description constructed with a FUG. As a linguistic tradition, systemic grammar owes its form and perspectiveprincipally to one person,Halliday (57), who was himself influenced by the London Schoolof functionalism lead by Firth (47). The influence of systemic grammar on generation research is considerably wider than just the systems that employ it directly since it is the sole well-known linguistic formalism that has as its very basis the identification of the choicesimplicit in a language. Choices form the notational basis of systemic grammars, which, like ATNs, are written as traversable graph structures that define the spaceof possible control flow for at least the linguistic portion of the generation process.The very small fragment of a grammar shown in Figure 3 illustrates how the graph is formed. Choice systems are given either as AND paths (leading curled brace), where one choicemust be made from each of the systems named on the right, or as OR paths (leading square brace), where only one of the alternative features listed may be selected.The selection of a feature opensthe system that it names (note:the feature will be the leftward "root node" of the tree on its side that constitutes a system within the network), which means that a choice from that system must now be made. Choices continue as the locus of control moves left to right through the network (usually simultaneously active in several choices at once due to the presenceof the AND systems), until a rightmost system is reached that consists of a
bare feature without an accompanying system. These rightmost nodes are the concrete elements from which specifications of form are built up. Leftward-pointing curled braces indicate path mergers in the control flow, where decisionsin disjoint systems have a combined influence. Two important generation systemshave been basedon systemic grammar, Davey's PROTEUS (a) (discussedearlier) and Mann and Matthiessen'sNIGEL (1,58).NIGEL is the largest systemic grammar in the world and very likely one of the largest machine grammars of any sort. Besidesthe quite important contribution simply of articulatin g a systemic grammar so thoroughly, Mann and Matthiessen have developedan original technique for formalizing the usage criteria that govern the choices the grammar defines (59). A set of criterial predicatesare defined for each choice system in the grammar, which act as functions from the internal state of the planner and underlying program to features. The generation processis carried out by starting at the leftmost entry system of the nextwork and applying successive"chooser"proceduresto determine the path through the network (i.e., the feature set) that best captures the speaker's intentions. Other ResearchAreas The field of natural-language generation, even as seenonly by researchers in AI, is considerably larger than this entry has been able to accommodate.Two areas must at least be mentioned in passing. Planning.Pioneering work by App elt (2,L4) supplied a rigorous logical framework by which to encodebasic notions such as intention and reference. His planning technique, the progressive elaboration of goals through the use of Sacerdoti's procedural networks formalism (60), builds on a tradition of viewing the articulation of a generator's goals by chaining backward from fundamental communicationsgoals (49,61). From a complementary direction, McKeown has presented a theory of the organi zation of paragraphs into groups of conversational moves (8), drawing on earlier work by Grimes (62). She employs paragraph schemasas reahzations of high-level moves such as "compare and contrast." The schemas act as templates to organize the content selection and rhetorical structuring that the planner does. PsycholinguisticTheory. Once there are generation systems that have a significant capability, it becomespossible to consider deliberately chosenrestrictions on the power of the vir-
Transitivity Mood .
. imperative nominal subject
[*"'"' Clausel
Ir"ro'-ation LrVrinnrn
Theme
I
Cohesisn-
I |rn"-"tization
653
I
) cohesiv" {-fAnaPhoric -tlliptical | tl r coniunct Noncohesive -L Norr.o.rjunct
Figure 3. (from Ref. 26).
654
NATURATLANGUAGEGENERATION
tual computational engine underlying the system's capacity. Such restrictions gain the possibility of providing an explanatory account of aspects of the human generation processby appealing to intrinsic properties of the machine that make it impossible for its behavior to be otherwise. There has been work toward this end by Kempen and Hoenkamp for restarting phenomena (51) and by McDonald for an account of people's fluency and lack of grammatical error and certain classes of speecherrors (50). Generation is a young research area. It is populated by a vigorous, mutually identifying group of researchers that is growing at an ever-increasing rate. The intellectual climate within the generation community is not unlike that of the language-understanding community of about L974, with a roughly similar number of players and a similar feeling in the air of significant things happening. There is every reason to believe that the further developmentand contributions of generation research to AI as a whole in the next L2 years will be every bit as large as the contribution of understanding research in the last 12.
BIBLIOGRAPHY 1. W. Mann and C. Matthiessen,Nigel: A SystemicGrammarfor Text Generation,in Freedle(ed.),SystemicPerspectiues on Discourse:SelectedTheoretical Papers of the Ninth International Systemic Workshop,.lrblex, Norwood, N.J., 1985. 2. D. Appelt, Planning English Sentences,Cambridge University Press,Cambridge U.K., 1985.
16. W. Mann, M. Bates, B. Grosz,D. McDonald,K. McKeown, and W. Swartout, "Text generation: The state of the art and literatur€," JACL 8Q) (1e82). L7. T. Winograd, Understanding Natural Languag€, Academic Press, New York, L972. 18. K. Forbus and A. Stevens,"Using qualitative simulation to generate explanatiohs," Proc. of the Third Annual Conf. of the Cog. Sci. Soc.,Berkeley, CA, August, 1981,pp. 2L9-22L. 19. C. Frank, A Step Towards Automatic Documentation, MIT AI Laboratory WP-213, 1980. 20. B. Sigurd, Computer Simulation of SpontaneousSpeechProduction, Proceedings of COLING, Stanford, CA, July 1984. 2L. D. McDonald, Subsequent Reference: Syntactic and Rhetorical Constraints, in Theoretical Issuesin Natural Language Processing .I/, Association of Computing Machin€ry, New York, pp. 38-47, 1978. 22. R. Granville, Controlling Lexical Substitution in Computer Text Generation, Proceedingsof the COLING, Stanford, CA, pp. 381384, 1984. 23. W. Woods, "Transition network grammars for natural language analysis," CACM 13 J0), 591-606 (1970). 24. S. C. Shapiro, "Generalized augmented transition network grammars for generation from semantic networks," JACL 8(1), 12-25 (1e82). 25. G. Brown, Some Problems in German to English Machine Translation, MIT LCS TR L42, L974. 26. M. A. K. Halliday and J. Martin (eds.),Readings in Systemic Linguistics, Batsford Academic,London, 1981. 27. D. McDonald, A Preliminary Report on a Program for Generating Natural Langu &ga,Proceedingsof the Fourth IJCAI, Tbilisi, Georgia, pp. 401-405, L975. 28. S. Shapiro, "Generation as parsing from a network into a linear string," JACL Fiche 33, 45-62 (1975).
3. M. Bates and R. Ingria, Controlled Transformational Sentence Generation, Proceedingsof the ACL, Stanford, CA, 1980. 4. A. Davey, Discou.rseProduction, Edinburgh University Press,Ed29. B. Bruce, Generation as Social Action, Proceedingsof TINLAP-1, inburgh U.K., 1979. ACM, pp.74-78, 1975. 5. J. Clippinger, Meaning and Discourse:A Computer Model of Psy30. J. Clippinger, Speaking with Many Tongues: Some Problems in choanalytic Speech and Cognition, Johns Hopkins Press, BaltiModeling Speakersof Actual Discour se,Proceedingsof TINLAP -I , more, MD, 1977. ACM, pp. 68-73, L975. 6. K. Kukich, Knowledge-BasedReport Generation: A Knowledge 31. N. Goldman, The Boundaries of Language Generation, ProceedEngineering Approach to Natural Language Report Generation, ings of TINLAP-I, ACM, pp. 74-78, L975. Ph.D. Thesis, Information Science Department, University of 32. W. Swartout, A Digitalis Therapy Advisor with Explanations, Pittsburgh, 1983. MIT LCS Technical Report, Cambridge,MA, L977. 7. D. McDonald, Description-Directed Natural Language Genera33. W. Clancey, Tutoring Rules for Guiding a Case Method Dialog, tion, Proceedingsof the Ninth IJCAI, Los Angeles, CA, pp. 799Proceedingsof IJMMS //, pp. 25-49,1979. 805, 1985. 34. D. Chester, "The translation of formal proofs into English ," Artif. 8. K. McKeown, Text Generation,Cambridge University Press,CamIntell. 8(3), 26L-278 (1976). bridge, U.K., 1985. 9. N. Goldman, Conceptual Generation, in R. Schank (ed.),Conceptual Information Processing, North-Holland/Elsevier, Amsterdam, pp. 289-372, L975.
35. D. McDonald and J. Pustejovsky,TAGs as a Grammatical Formalism for Generation, Proceedingsof the ACL, Chicago,July 1985, pp.94-103.
10. D. McDonald, Natural Language Generation as a Computational Problem: An Introduction, in M. Brady and R. Berwick (eds.), Computational Models of Discou,rse,MIT Press, Cambridge, MA, pp. 209-266,1983. 11. W. Mann and J. Moore, "Computer generation of multi-paragraph English text," JACL 7(L) (1981). t2. R. Wilensky, Y. Arens, and D. Chin, "Talking to UNIX in English: An overview of UC," CACM, 577-593 (June 1984). 13. H. Thompson, Strategy and Tactics: A Model for Language Production, Proceedingsof the Chicago Linguistic Society,L977. L4. D. Appelt, Problem Solving Applied to Language Generation,Proceedingsof the ACL, Philadelphia, PA, pp. 59-63, 1980. 15. L. Danlos, Conceptual and Linguistic Decisions in Generation, Proceedingsof the COLING, Stanford, CA, pp. 501-504, 1984.
36. M. Kay, Functional Unification Grammar: A Formalism for Machine Translation, Proceedingsof COLING, Stanford, CA, July 1984,pp. 75-78. 37. E. Hovy, Integrating Text Planning and Production in Generation, Proceedingsof the Ninth IJCAI, Los Angeles, August 1985, pp. 848-851. 38. J. Becker, The Phrasal Lexicort,Proceedingsof TINLAP-I, ACM, pp. 60-64. Also as Bolt Beranek and Newman Report 3081,Cambridge, MA. 39. P. Jacobs,PHRED: A Generator for Natural Language Interfaces, Berkeley Computer ScienceDepartment, TR 85/198, 1985. 40. J. Friedman, "Directed random generation of sentences,"CACM t2(6), 40-46 (1969). 4L. V. H. A. Yngve, A Model and a Hypothesis for Language Struc-
NATURAL-IANGUACE INTERFACES 655 precise. (The problem is well illustrated by recent fierce debates about whether chimpanzeesthat have been taught some sign language are really using language.) For most of the history of the human race, the only entities using "natural" language have been human, so it is difficult to separatelinguistic capabilities from other human capabilities such as memory, reasoning (qv), problem solving (qv), hypothesis formation, classification, planning (qv), social awareness, and learning (qv). This makes it sometimesdifficult to distinguish between 45. S. Bossie, A Tactical Component for Text Generation: Sentence a natural language interface (NLI) and the underlying system Generation Using a Functional Grammar, University of Pennsylto which it is an interface. On one hand, one doesnot want to vania, TR MS-CIS-81-5,1981. require a computer to have all of these capabilities before say46. F. Danes, Papers on Functional SentencePerspective,Academia, ing that it can use language; on the other hand, without these CzechoslovakianAcademy of Science,L974. capabilities, any computer system will use language differ4 7 . J. R. Firth, Papers in Linguistics 1934-1951, Oxford University ently than human beings do and thus is open to the charge Press,Oxford, U.K., 1957. that it is not really using language. 48. ReferenceL4, p. 108. Without defining precisely what NL understanding is, most people would accept Woods's(1) statement that "natural lan49. R. Power, "The organisation of purposeful dialogu€s,"Linguistics 17, L07-151 (1979). guage assumes understanding on the listener's part, rather 50. D. McDonald, Description Directed Control: Its Implications for than mere decoding. It is characterized by the use of such Natural Language Generation, in Cercone (ed.), Computational devices as pronominal references, ellipsis, relative-clause Linguistics, Plenum, New York, pp. 403-424, 1984. modification, natural quantification, adjectival and adverbial bl. G. Kempen and E. Hoenkamp, Incremental SentenceGeneration: modification of concepts, and attention-focusing transformaImplications for the Structure of a Syntactic Processor,Proceed' tions. It is a vehicle for conveying concepts such as change, ings of COLING, Prague, August 1982. location, time, causality, purpose, etc.,in natural ways. It also 52. P. Jacobs,A Knowledge-BasedApproach to Language Production, assumesthat the system has a certain awarenessof discourse Berkeley Computer ScienceDepartment, TR 86/254, 1985. rules, enabling details to be omitted that can be easily in53. W. Swartout, personal communication, Information SciencesInferred." This charactenzation absolutely excludes systems stitute, Los Angeles, JulY 1984. that merely use English words to replace symbols in what 54. R. Simmons and J. Slocum, "Generating English discoursefrom would otherwise be an "unnatural" language. semantic networks," CACM 15(10),891-905 (1972). Human conversational partners share a lot of information, 55. J. Slocum, Question Answering via Cannonical Verbs and Seman- can model one another's knowledge and capabilities, can protic Models: Generating English from the Model, University of cess huge amounts of information (even conflicting informaTexas, Department of Computer Science,TR NL-23, 1973. tion), and can update all of these structures in amazing detail 56. S. C. Shapiro, The SNePS Semantic Network ProcessingSystem, as the conversation progresses. Computers are currently a in Findler (ed.), AssociatiueNetworks, AcademicPress,New York, long way from having this very genial, very powerful, very 1979. broad-basedlanguage capability. Fortunately, a more limited 57. M. A. K. Hatliday, "Notes on transitivity and theme in English," language capability will suffrcefor many applications, and huJ. Ling. 3(1),37-81 (1967). mans can easily adapt at least some aspectsof their language b8. W. Mann, The Anatomy of a Systemic Choice, Information Scibased on their knowledge of their conversational partner. If encesInstitute TR/RS-82-104,L982too much adaptation is required however, the communication 59. W. Mann, Inquiry Semantics:A Functional Semanticsof Natural becomesunnatural even if it is conducted in English. One Language, Information SciencesInstitute TR/RS-83-8,1983. must carefully distinguish between natural-language com60. E. Sacerdoti,A Structure for Plans and Behauior, Elsevier Northmunication, natural communication (which may use language Holland, Amsterdam, L977. or not, and requires no learning by the user), and user-friendly 61. P. Cohen, On Knowing What to Say: Planning SpeechActs, Uniinterfaces (which generally do not use language and are easy versity of Toronto, TR 118, 1978. to learn but are not necessarily natural). 62. J. Grimes, The Thread of Discou.rse,Mouton, The Hague, 1975. A critical obstacle to the use of many computational re63. R. Brown, LJseof Multiple-Body Interrupts in DiscourseGenera- sources such as database-managementsystems (DBMS) and tion, Bachelor's Thesis, MIT, Department of Electrical Engineerdecision support systems (DSS) is the mismatch between the ing and Computer Science,1974. needsof users and their ability to communicate these needsto 64. H. K. T. Wong, Generating English Sentencesfrom Semantic the computer. The developmentof graphical interfaces such as Structures, University of Toronto, Department of Computer Scispread-sheet systems, menu systems, and the "electronic ence,TR 84, 1985. desktop" are important steps toward improving the interface D. D. McDoNALD for a class of stereotyped, semirepetitive tasks. For many University of Massachusetts tasks, however, greater flexibility is needed, and NLIs can provide this capability to a wide range of users. In the area of NATURAL-LANG UAGEI NTERFACES DBMS, such interfaces allow users who are unfamiliar with the technical characteristics of the underlying database-manThe term natural language (NL) is very deceptive. Everyone agement system to query a database using typed English inhas an intuitive feel for what it means to communicate in put. The output is usually plain data, a statistical summary, or natural langusg€, but it is very difficult to make this notion a graphical representation of the required data. NL interfaces ture, Proceedings of the American Philosophical Society, pp. 444466, 1960. 42. M. Kay, Functional Grammar, Proceedingsof the Berkeley Linguistic Society, L979. 43. G. Ritchie, The Computational Complexity of SentenceGeneration using Functional Unification Grammar, Proceedingsof COLING, Bonn, FRG, August 25-29,1986. 44. J. Bresnan (ed.),The Mental Representationof Grammatical Relations, MIT Press, Cambridg", MA, 1984.
NATURAL.LANGUAGE INTERFACES
are also used to specify the input to decision-supportsystems and expert systems and to pose questions to them.
data ("What were our sales last year?"), but one might also want to format that data ("Graph last year's salesby monl[,'), enter new data ("Set my department's projectedsales for next month to $87,00Q"),query the system about its capabilities when ls Englishthe Most AppropriateInterfaceLanguage? ("How far back do your sales figures go?"),place standing orIn the rush to make computers more accessible,it is easy to be ders ("Don't show anyone's first name"), or do a myriad of taken in by the following false argument: Not everyone who other tasks. wants to use a computer can or will take the time to learn a The ambiguity of NL is often an advantage in retrieval special language for dealing with it; everybody already knows tasks, but can be a serious problem when updating. A good English (or some other natural language); therefore, the only example of this was given in Kaplan (2): If someone says way to get everyone to use computers is to let them use En- "Change Brown's manager from Jones to Baker," does this glish. This section shows the flaws in this argument and sets mean that Brown is to be moved from the group managed by the stage for examining NL interfaces in a more realistic way. Jones to that managed by Baker, or that Jones is being reNatural language may be useful when the user of a system placedby Baker as the manager of the group that Brown is in? does not know the capabilities or limitations of the system, Even a system that has excellent NL capability within a when he or she cannot or will not learn a formal interface small domain (such as accessingdata about sales figures) may language, when the underlying interface is not user friendly not be useful for users who have no idea of the limitations of (and hence would be awkward to use even if the user were that domain or who want to perform tasks outside the scopeof prepared to get technical), or when the nature of the task to be the system. Such users may want to ask questions about the performed is not well specified. system's capability such as "What can you tell me about perEven under the conditions stated in the previous para- sonnel?" and the system may not be able give to any kind of graph, natural language is not useful when the content of coherent answer. interactions is so limited that the brevity of an artificial language (such as a menu of choices)is desirable; systemswith a Interfaces sophisticated interactive graphic and menu interface can be state of the Art of Natural-Language operated without English and with very little training. This In an attempt to jump on the bandwagon of NL interfaces, use of icons is effective only becausethe users thoroughly un- some software producers simply take their current system inderstand the conceptual model underlying the domain (open- terface and modify it slightly so that is usesEnglish words and ing files, sending messages,etc.) and becauseonly a small thus, at first glance,looks like it can understand English. One amount of detail must be conveyed by the icons and by the way to detect such exaggeration is to compare the "English" user's manipulation of them that is allowed with the underlying interface. If there is a English is not useful when physical controls are appropri- fairly clear correspondence between the two, very little NL ate-imagine driving a car or playing most video ga*.r 6ing processingis going on. Even without accessto the underlying written or spoken English! Thus, in graphics-oriented situainterface, it is usually easy to confuse such systemsby giving tions, such as laying out a slide for a presentation, or in com- them simple, natural variations of input. If "List male manputer-aided design (qv), the exact placement of the elementsof agers" works but "Give me the managers who are men" and an image is best done with some form of pointing device. In "Which managers are male?" do not work, the interface is not these casesEnglish may still play a role in initially specifying very closeto English. the images to be placed on the screen if the set of porrible Another distinction, and one that is harder to detect by stored images is large and not readily broken down in a way simply observing the system in operation, is what Moore (B) that would make single or multiple menu selections appro- calls special-purpose vs. general-purposesystems. Generalpriate. purpose systems have the domain-dependent knowledge English is not useful for object identification when the user clearly separatedfrom more general syntactic and/or semantic can more easily point to something (as with a mouse or a knowledg"; such systems are of great interest to researchers. touch-sensitive screen) than describe it. (However, pointing Special-purposesystems have knowledge about their particucan be as ambiguous as English. For example, doesa particular application domain built in at very low levels of processing; lar pointing action refer to "that line," "that trianglq,, ,,that they ffi&Y,for example, be able to reco gnizeunits around a key region of the screen," "the object depictedby that triangle,', or word like "sales" but may not dependat all on general linguissomething else?) tic entities such as noun phrases.They may have specialrules One intermediate position between formal interfaces and of inference for deducing new information from old, but the English is the use of treelike menu systems in which each rules are formulated only for the particular application dochoiceof a word or phrase from a menu causesthe display of a main, not in general terms. Special-purposesystemsare somenew menu dependent on that choice (seeMenu-basednatural times called semantic grammar systemsot pragmatic gromlanguage). This is a good alternative to complex interfaces, mars becausethey combine the semantics andior pragmatics provided that the speed of the display is comfortable for use, of the domain directly with syntactic analysis in a single that the amount of data to be presentedin arrymenu is not too gfammar (see Grammar, semantic). large to be visually processedeasily, and that the user can By mixing the domain model, databasemodel, syntax, and identify the branch he or she wants to take at any point. semantics of a particular domain, special-purposesystemscan Even an application as apparently restricted as using a achieve high performance for that domain. Their drawback is DBMS does not necessarily make the choice of interface easy that it is difficult or impossible for anyone but the original becauseof the many different kinds of tasks a user might want system designer to make significant changes to the system, to perform. The most obvious DBMS task is to simply retrieve and it must be almost entirely rewritten if a new domain is
NATURAL.LANGUAGE INTERFACES
required. Moore (3) has said that it takes between 2 months and 5 years for programmers experienced in building these systemsto produce a special-purposeNL front-end for a small but useful domain. On the other hand, general-purposesystemsoffer the promise of easy transportability from one domain to another by changing the lexicon and the domain-dependent semantics. Their disadvantages are the long development time required to produce the domain-independent componentsand the fact that for some applications this approach brings more to bear on the problem than is necessary,with a correspondingprice tag.A critique of this approachis presentedin Ref. 4. Research systemssuch as TEAM (5-7) and IRUS (8) use this model. It wilt probably be several years before general-purpose(by this definition) systems begin to be widely available, but when they are, the effort required to adapt them to a particular application is expectedto be a few weeks or months. There have been many publications in the research literature about natural-langu age interfaces. Most, but not all, focuses on general-purpose systems. Some of these papers describe research systems that are being used to investi gate various aspectsof the NL problem or are offered as "proof by example" that (limited) NL understanding is possible(9-2L). Others try to present general issues and problems relating to applied NL interfaces, particularly for database access (1,3,22-24). Several conferenceshave had panels or sessions devoted to this subject (25-27) and several special issues of journals have also focused on it (20,28). These research successesimply that the technological basis for commercial successhas been achieved.Commercial ventures using this technolory have begun to appear, and more are sure to follow. How ShouldProspectiveUsersfudge NL Systems? In this section are presented a number of topics that should be investigated when one examines a system that claims to understand English. It is important to keep in mind that the right question to ask is not "Does system X have feature Y?" (Becausethe answer will almost never be a clear yes or no) but rather "How much of feature Y doessystem X handle, and how important is it to the application I have in mind?" In a generalpurpose system the system developers should be able to describe the mechanisms used to handle these issues;a demonstration of their use in one domain is fairly good evidence of their applicability to another domain. In the case of specialpurpose systeffis, evaluation is more difficult, since the techniques used may be more ad hoc; a demonstration that is impressive in one domain may not be relevant to the kinds of problems that will arise in a different application.
657
class" words in English such as prepositions,conjunctions, articles, etc. In addition, since it is impossible for any system to have complete coverage,it is important to know how easy or difficult it is to extend the vocabulary of the system. What knowledge of linguistics and the internal structure of the dietionary is required? Can an end user add new vocabulary or doesit take an applications programmer with some short training, or must vocabulary always be added by the system developers?It is also important to distinguish between new words that are essentially synonymsfor existing words and new words that involve new concepts for the system. Syntactic Couerage.What is the range of syntactic phenomena the system can deal with? Doesthe system handle complex verb forms, relative clauses,various question forms, passives,comparatives,subordinate clauses,time and place adverbials, measure expressions, ellipsis, pronomtnaliza' tion, and conjunction?Although this is the most well studied aspect of natural-langu age understanding, there is not as yet a benchmark against which to test a system,nor even a generally agreed upon list of phenomena.A useful list of phenomenais given by Winograd (29) in his book on syntactic processing. Semantic Couerage.How much doesthe systemunderstand about the domain? For a DBMS retrieval system, does the system have a model of the semantics of the applications domain or does it merely make a direct translation of certain English phrases into specific queries in a formal retrieval langua ge?This is particularly important if the system is to be able to accessnew databases,or to work when old databasesare restructured. There is a major difference between having to ask "Is there an employment record for Joneswith Acme Co. in the employerfield?" and "Did Jones ever work for Acme?" If the system treats the latter question as simply a variant of the first, it will not be able to handle such a query if the database is modified to list the employees for each company (but not the companies for each employee),nor would one expectit to be able to handle "Did Jones ever work for division 5?" ot "Did Jones ever work for Smith?"
Although a system with extremely large coverageis likely to be habitable, even systems with very limited coveragecan be habitable if properly designed,and systemswith wide variations of coverage may be less habitable than ones with uniformly smaller coverage. The critical issues are whether the system has enough coverage to let users meet a reasonable proportion of their needs (i.e., is there at least one way to express everything a user really needs to saY), whether the user can quickly find an appropriate way of expressing a reCoverageand Habitability. Thesetwo propertiesare related quest, and whether the user can easily learn to avoid the system's blind spots. but not identical. Coverageis a characterrzationof the linguisA system'shabitability is reduced if the user is led to bemeasures habitability whereas tic competence of a system, how quickly and comfortably a user can recognizeand adapt to lieve that the system has capabilities that are beyond it, and the system'slimitations. The coverageof a NL system may be there is no clear indication of the boundaries.This can happen categorized in a number of dimensions, some of which are if the language the system presentsto the user is not language that the user can present to the system. discussedbelow. In most applications some English is presentedto the user, Lexical Couerage.How large a vocabulary doesthe system even if it is only canned text. English output from a computer have? The overall size of the vocabulary is not as critical as system witl either be prestored strings or generated text that the relevance of the vocabulary for the application domain, comesfrom a different knowledge base than that used by the though the system should certainly cover all the "closed language-understanding part of the system (instead of being
658
INTERFACES NATURAL-IANGUAGE
integrated as in humans). This means that the language that can be expressedby a computer system may exceedits comprehensior, & situation that is precisely oppositethat of humans! Human users of a system will, very naturally and unconsciously, be influenced by the computer's language and will assume that the computer can understand the kind of language it produces.Thus, a desirable goal is to ensure that the vocabulary in the output is understood by the interface and that the syntactic constructions used in the output are within its syntactic coverage. Even if the two capabilities are matched, there is another possiblepitfall. In normal conversations people typically use pronouns and other anaphoric expressionslike "that purchase order," "those salespeople,"and "the average" to refer to entities introduced into the conversation by their dialogue partner. If the system used canned text for output, or even if it synthesizesEnglish output as needed, it will not be able to understand such anaphoric expressions unless it maintains a model for everything it (as well as the user) has said. The difficulties in achieving habitability with a semantic grammar are based on the fact that without great care such grammars can give users misleading clues as to coverage.If the system can understand both "list the salespeoplewho have been under quota for two months" and "what salespeoplehave been under quota for two months" and the system can understand "list the products that Jones sold to Acme," the user might reasonably expectthe system to understand "what products did Jones sell to Acme?" In a special-purposesystem, however, the system may have different portions of the grammar for each verb, and it is easy for them to becomeinconsistent. Inference. This is the art of drawing logical conclusions basedon the data in the databaseand/or general knowledge of the subject domain (see Inference). It is often the case that retrieving only data that is explicitly stored in a database is insufficient to meet a normal user's needs.Users will assume that the system has the ability to infer new information from that already in the database. (This is particularly true if the user doesnot have detailed knowledge of the database.) The "navigation problem" is an example of a simple inference: Suppose a database contains records about employees and records about jobs the company has performed for clients; the employee record has a field for jobs the employee has worked on, and the job record has a field for the client's name. Someone accessingthis database might naturally ask "Has Ellen Matthews ever worked for Adams Co.?"Notice that in order to interpret this question correctly, the system must be able to follow the chain of reasoning that Matthews has worked for Adams if she has worked on a job that had Adams as the client, although no job was explicitly mentioned in the query and no relation "work for" exists in the database. End User Control of lnterpretation. Supposesomeoneasks "What is the largest division in the company?" This could mean largest in terms of number of employees, number of employeesof a particular type, gross sales,or someother metric. Either the NL system has some built-in metric or it does not. If it does,it may or may not match what was meant. If it is not what was meant, how does the user find this out (the answer "Division 4" probably will not help) and can the user change it? If the system does not have a default metric, it might have a set of metrics that it can ask about, but the user
will not want to seethe question "Do you mean largest number of employeesor largest building area or highest sales?"every time he or she usesthe term largest Ideally, the user should be able to set temporary (or permanent) "standing orders" that will be interpreted in context, but this is currently possible only in a limited way. Useof Pronouns.Any NL system will claim that it can handle pronouns (he, her, it, they, their, himself, etc.) becausethey are so widely used in English, but every system has limitations in this regard because pronoun use can be extremely complex.For example, pronouns usually refer to objectsexplicitly mentioned in previous discourse,but sometimesthey can refer to objects mentioned later ("After he transferred from Department 22, did John Jones work in Division 6?"). Pronouns can also refer to actions ("Did Smith ever cometo work later than 10 am? How often has he done that?"). In NL interfaces,users find it perfectly natural to use pronouns to refer to objectsin the computer's previous response,not just objectsin their own language (Q: "How many projects are ahead of schedule?"A: "One." Q: "Who is in charge of it?"). Other Kindsof Reference.Pronouns are a specificcase of a linguistic phenomenon called anaphoric reference, in which one refers to things without using their full names. Even things that have not been mentioned explicitly can be referred to if it is "obvious" that they should be inferred from the previous context. For example, the multisentence utterance "Seven contracts were concludedlast month. Thoseprofits will set a new record." usesthe phrase "those profits" to refer to the profits of the contracts just mentioned. A "natural" DBMS interface should also provide someability to specify items on the basis of previously computedaggregates,e.9., "products whose salesare at least 80Voof the average sales of the ten most profitable products." Ellipsis.fn conversation,people often leave out large portions of sentences,assuming that the missing parts can be filled in by the listener who sharesthe contextbeing discussed. For example, a user might want to make the following seriesof queries: "How many people did we hire last month?" "The month before?" "How many do we expect next month?" It is easyto be fooledinto thinking that becausea systemhandlesa few examples, it can handle any kind of ellipsis (qv). Quantification. The use of words like soffLe,euery,all, and any can complicate NL understanding becausetheir interpretation often depends on wide-ranging commonsenseknowledge or on detailed knowledge of the particular domain. The queries "Did every person in department 5 submit his/her trip report?" and "Did every person in department 5 consult his/ her department manager?" are structurally equivalent, but the first caserefers to multiple trip reports and the secondcase to a single manager. Negation. Negation is particularly tricky when combined with quantification. Does "All of the projects weren't completed on time" mean that none of the projectswere completed on time or that some were and some were not? Negation can also occur in noun phrases as well as verb phrases:"Who sold no widgets last quarter?" Time and Tense.This is currently an open research issue. There are no general mechanisms for effectively and efficiently representing events and objectsthat change over time.
INTERFACES 659 NATURAL.LANCUAGE Fortunately, many database applications do not have to be concernedwith this issue since they often contain only limited historical data that does not contain complex time relations. Conjunctionand Disjunction.And's and or's are extremely common in English. Often they join completeunits ("the book and the author"), but sometimes they join discontinuous segments ("I adjusted for and calculated next quarter's overhead"). Handling simple conjunctions is within reach of current systems, but the combination of conjunctions with ellipses and other phenomena is still an open problem in computational linguistics. Telegraphiclnput. Although full English sentencesare easy to say, peoplewho have to type a lot frequently want to abbreviate their input by dropping out "unnecessary" words. For example, "Show saleslast year midwest by salesman"is easily understood (by most humans) as a paraphrase of "Show (me) (the) sales (from) last year (in the) midwest (graphed with sales) by salesman." Of course, in the appropriate context it might also mean "Show (to the sales department) (the figures from) last year (graphed with the) midwest (sales) by salesman." An important point to remember about this capability is that, although it is desirable, one pays for it with an increased potential for misunderstanding and (usually) a decreasedability to use the finer points of grammatical structure to influence the processingof nontelegraphic input.
will fit comfortably in individual work stations or personal computers and will be able to locally translate the user's input into a sequenceof commandsto be sent to the DBMS on another machine. There are (at least) two approachesto looking for NL capability in a computer system. One is to look at available systems and say, "If I had it, what could I do with it?" This is likely to be misleading since it is very easy to infer from a few examples that the system can do more than it actually can. A better approach is to determine in advance what kinds of interactions one would like to be able to have with the machine [perhaps by taking protocols of a simulation, as in Bates and Sidner (30) or just by asking potential users of the system to describe a few dozen examplesl. Armed with this unbiased language sample, one can then ask, "Will system X be able to handle this input?" Conclusion:The Future
In the next few years one can expect to find natural-language interfaces to a wide variety of computer systems, including database systems, graphics packages, expert systems, and DSSs.This already large market is certain to grow as personal work stations and network accessto data and DSSs become widely available. Some organizations wiII chooseto develop their own NL interfaces in-house; others will buy that capability elsewhere. UngrammaticalInput. Closely related to the notion of tele- Because the development of language systems requires a much different programming approach than, say, accounting graphic input is that of ungrammatical input. In fact, usually the same techniques are used to handle both kinds of nonstan- or database packages, the in-house systems will tend to be special purpose and difficult to modify as the needs of the dard language. people using them grow. Some companies will offer to build All of the issues above represent problems that have been special-purposesystems on a contract basis, and fewer will at least partially solved in general-purposeresearch systems. offer general-purposesystems(becauseof the very limited supThe following list presents somehighly desirable attributes of ply of experts neededto developthem and the lengthy developsystemsfor databaseretrieval that are not so well understood ment cycle). The subject areas for NL applications will be very broad. in general terms but may be available in limited form for Some vendors will aim for one or more well-defined user comparticular applications. munities and develop specializedpackages;others will build a family of more general, tailorable systems.It will not be easy What-If Capability. This is nearly essentialin decisionsupfor the purchaser of an NL system to judge whether a particuport system (DSSs).Simple specificationsof conditions are lar system is capable of meeting the demands of the proposed easy to handle, but complex specificationspresent serious application. This problem will continue to require expert adprobleffis, particularly if they are expressedincrementally vice and consulting. and modified during a dialogue. In summary, the technology for useful, cost-effective,natuPresentation of Output. This includes formatting reports ral-language interfaces is available now and will begin to have and tables, interfacing to graphics modules,and generating a major impact on database retrieval and other areas in the English output. Simple capabilities are available now and very near future. However, these interfaces will not behave are expanding rapidly. like a human conversational partner, so users must carefully Tools for Altering Domain and File Structttrres.Systems examine such systems to understand their capabilities and that have a very direct correspondencebetween the input limitations. langUage and the retrieval langUage can be modified by users (or systemsprogrammers at the users' organizations), but more sophisticated NL capability implies the need for BIBLIOGRAPHY most customization to be done by the developer of the NL system. Software tools that will make it easier to develop, 1. W. A. Woods,"A Personal view of natural language understandexpand, and modify domain-dependent information and i.g," SIGART Newslett. (61), L7-24. (Februaty L977). DBMs-dependent information will only gradually be devel2. S. J. Kaplan and Davidson, "Interpreting Natural Language Daoped. tabase Update s," Proceedings of the 19th Annual Meeting of the Associationfor Computational Linguistics, Stanford University, Implementation in Work Station For many applications, it CA, June 1981. Stanford, proresources to is undesirable to use mainframe computer Moore, Practical Natural-Language Processing by Comsystems 3. R. C. NL some Soon queries commands. and English cess
NATURAL.TANGUAGE UNDERSTANDING puter, Technical Report Technical Note 25t, SRI International, Menlo Park, CA, October 1981. S. P. Schwartz,Problems with Domain-IndependentNatural Language DatabaseAccessSystems,Proceedingsof the TwentiethAnnual Meeting of the ACL, Association for Computational Linguistics, University of Toronto, Toronto, Ontario, June 1982, pp. 60-62. B. Grosz, TEAM: A Transportable Natural-Language System, Technical Report No. 263R, SRI Artificial Intelligence Center, Menlo Park, CA, November 1982. B. J. Grosz, Transportable Natural-Language Interfaces: Problems and Techniques,Proceedingsof the Twentieth Annual Meeting of the Associationfor ComputationalLinguistics, University of Toronto, Toronto, Ontario, June L982,pp. 46-50. B. J. Grosz,TEAM, a Transportable Natural Language Interface System, Proceedingsof the Conferenceon Applied Natural Languqge Processirg, ACL and NRL, Santa Monica, CA, February 1983,pp. 39-45.
23. R. C. Moore, Natural Language Accessto Databases:Theoretical/ Technical Issues,Proceedingsof the Twentieth Annual Meeting of the Association fo, Computational Linguistics, Association for 4. Computational Linguistics, University of Toronto, Toronto, Ontario, pp. 44-45, June 1982. 24. M. Templeton and J. Burger, Problems in Natural Language Interface to DBMS with Examples from EUFID, Proceedingsof the Conference on Applied Natural Language Processing, ACL and 5. NRL, Santa Monica, CA, February 1983,pp. 3-16. 25. Association for Computational Linguistics, Proceedings of the Nineteenth Annual Meeting of the Association for Computational 6. Linguistics, ACL, Stanford University, Stanford, CA, 1981. 26. Association for Computational Linguistics, Proceedings of the Twentieth Annual Meeting of the Association for Computational 7. Linguistics, University of Toronto, Toronto, Ontario, June 1982. 27. Association for Computational Linguistics and the Naval Research Laboratory, Proceedingsof the Conferenceon Applied Natural Language Processing,ACL, Santa Monica, CA, 1983. 8. M. Bates and R. J. Bobrow, A Transportable Natural Language 28. S. Kaplan, "Special section: Natural language," SIGART Interface for Information Retrieval, Proceedingsof the Sixth AnNewslett. (79), 27 -109 (January 1982). nual International ACM SIGIR Conference,ACM Special Interest 29. T. Winograd, Language as a Cognitiue Process,Vol. 1, Syntax, Group on Information Retrieval and American Society for InforAddison-Wesley,Reading,MA, L982. mation Science,Washington, DC, June 1983. 30. M. Bates and C. L. Sidner, A Case Study of a Method for Deter9. E. F. Codd,R. S. Arnold, J-M. Cadiou,C. L. Chang,and N. Roussomining the NecessaryCharacteristics of a Natural Language Inpoulis, RENDEZVOUS Version 1: An Experimental English-Lanterface, Integrated Interactiue Computing Systems,North-Holguage Query Formulation System for Casual Users of Relational land, Amsterdam,pp. 263-278, 1983. Data Bases,Technical Report R.I2I44, IBM Research,San Jose, CA, January 1978. M. Berns 10. M. Epstein and D. Walker, Natural Language Accessto a MelaBBN Laboratories Inc. noma Data Base,Proceedingof the SecondAnnual Symposiumon Computer Applications in Medical Care, 1978. Also ART Technical Note L71, September1978. An earlier version of this entry appearedin M. Bates and R. Bobrow, 11. J. M. Ginsparg, A Robust Portable Natural Language Data Base Natural Language Interfaces: What's Here, What's Coming, and Who Needs It, in Artificial Intelligence Applications for Business, Ablex, Interface, Proceedings of the Conferenceon Applied Natural Language Processing, ACL and NRL, Santa Monica, CA, February Norwood,NJ, 1984,pp. 179-193. 1983,pp.25-29. L2. G. Guida and C. Tasso, IR-NLI: An Expert Natural Language Interface to Online Data Bases,Proceedingsof the Conferenceon Applied Natural Language Processing, ACL and NRL, Santa Monica, CA, February 1983,pp. 31-38. 13. G. G. Hendrix, The LIFER Manual: A Guide to Building Practical Natural Language Interfaces, Technical Report Technical Note 138, SRI International, Menlo Park, CA, February 1977. L4. G. Hendrix, E. Sacerdoti,D. Sagalowicz,and J. Slocum,"Developing a natural language interface to complex data," ACM Trans. DatabaseSys.3(2),105-147 (June 1978). 15. G. G. Hendrix, "Natural-language interface," Am. J. Computat. Ling. 8(2), 56-61 (April-June 1982). 16. S. C. Shapiro and S. C. Kwasny, "Interactive consulting via natural language," CACM,18(8), 459-462 (L975). L7. I. Spiegler, "Modelling man-machine interface in a data base environment," Int. J. Man-Mach. Stud. 18, bb-70 (lgg3). 18. B. H. Thompsonand F. B. Thompson,Introducing ASK, A Simple Knowledgeable System, Proceedingsof the Conferenceon Applied Natural Language Processing,Santa Monica, CA, February 1988, pp. L7-24. 19. D. E. Walker and J. R. Hobbs,Natural LanguageAccessto Medical Text, Technical Report Technical Note 240, SRI International, March 1981. 20. D. Waltz, "Natural language interfaces," SIGART Newstett. (6L), 16-64 (February Lg77). 2L. D. H. D. Warren and F. C. N. Pereira, An Efficient Easily Adaptable System for Interpreting Natural Language Queries, Technical Report 155, university of Edinburgh, February 1981. 22. G. J. Kaplan and D. Ferris, "Natural language in the DP world," Datamation 28(9), LI4-L20 (August 1982).
NATURAT-LANGUAGE PROCESSINC. See Natural-language generation; Natural-language interfaces.
NATURAL-LANC UAGEU N DERSTAN DING Natural-language communication with computers has long been a major goal of AI both for the information it can give about intelligence in general and for its practical utility. Databases, software packages, and Al-based expert systems all require flexible interfaces to a growing community of users who are not able or do not wish to communicate with computers in formal, artificial command languages. Whereas many of the fundamental problems of general natural-language processing (NLP) by machine remain to be solved, the area has matured in recent years to the point where practical natural-language interfaces to software systems can be constructed in many restricted, but nevertheless useful, circumstances. This entry is intended to survey the current state of natural-language processing by presenting computationally effective NLP techniques, by exploring the range of capabilities these techniques provide for NLP systems, and by discussing their current limitations. This presentation is organized in two major sections: the first on language recognition strategies at the single-sentence level and the second on language processing issues that arise during interactive dialogues. In both cases the concentration is on those aspects of the problem appropriate for interactive natural-langu age interfaces but relate the techniques
NATURAL-LANGUAGE UNDERSTANDING
and systems discussedto more general work on natural language, independent of application domain. Nature of Natural-LanguageProcessing.Natural-language processing(NLP) is the formulation and investigation of computationally effective mechanisms for communication through natural language. To take the bold face phrases in reverse order, first the subject area deals with naturally occurring human languages such as German, French, or English. Second,it is concerned with the use of these languages for communication, both communication between people,the purpose for which these languages evolved, and communication between a person and a computer. Third, NLP doesnot study natural-langu age communication in an abstract way, but by devising mechanismsfor performing such communication that are computationally effective, i.e., can be turned into computer programs that perform or simulate the communication. It is this third characteristic that sets the NLP subarea of AI, itself a subarea of computer science, apart from traditional linguistics and other disciplines that study natural language. This entry examines the relationship of NLP among two other closely related disciplines: linguistics and cognitive psychology (qv). Linguistics is traditionally concernedwith formal, general, structural models of natural language. Linguists, therefore, have tended to favor formal models that allow them to capture as much as possible the regularities of langu age and to make the most appropriate linguistic generalizations. Little or no attention was paid in the developmentof these modelsto their computational effectiveness.That is, Iinguistic models characterize the language itself, without regard to the mechanisms that produceit or decipher it. A goodexample, as shown below, is Chomskian transformational grammar (qv) (1,2), perhaps the best known of all linguistic models, which turns out to be unsuitable as a basis for computationally practical language recognition [although see work by Petrick (3)]. The goal of cognitive psychology (qv) on the other hand is not to model the structure of language but rather to model the use of language and to do it in a psychologicallyplausible w&y, where plausibitity here is defined by correlation with experimental results, especially timing studies of language-understanding tasks (see Anderson (4) for a good example of the flavor of this approach).This is somewhatcloserto the spirit of Al-based NLP in its emphasis on the use of language in communication, but again it is not of primary importance to the cognitive psychologist whether his models are computationally effective. Moreover, the models producedare not often targeted at language understanding per se but at more general aspects of human cognition and memory organizattort, with natural langu age serving only as the vehicle through which these related phenomena are studied. In addition to relating NLP to the study of language in other disciplines, we should point out a major division that arises within NLP itself. The distinction is between general and applied NLP. One can think of general NLP as a way of tackling cognitive psychology from a computer scienceviewpoint. The goal is to make models of human language use and also to make them computationally effective. The vehicles for this kind of work are general story understanding, &s in the work of Charniak (5), Schank (6), Cullingford (7), Carbonell (8), and others, and dialogue modeling, as in the work of Cohen and Perrault (9), Allen (10), Grosz (11), Sidner (L2), and others. One of the most important lessons learned from this
661
work is that general NLP requires a tremendous amount of real-world knowledg"; most of the work just cited is mainly concernedwith the representation of such real-world knowledge and its application to the understanding of natural-language input. Unfortunately, AI has not yet reachedthe stage where it can routinely handle the amount of knowledge required for these tasks, with the result that systems constructed in this area tend to be "pilot" systems that demonstrate the feasibility of a concept or approach but do not contain a large enough knowledge base to make them work on more than a handful of carefully selected example naturallanguage passagesor dialogues. Applied NLP, or the other hand, is not typically concerned with cognitive simulation but rather with allowing people to communicate with machines through natural language. The emphasis is pragmatic. It is less important in applied NLP whether the machine "understands" its natural-language input in a cognitively plausible way than whether it respondsto the input in a way helpful to the user and in accordancewith the desires expressedin the input. Typical applications are databaseinterfaces,as in the work of Hendrix (13),Grosz(I4), Kaplan (15), and others, and interfaces to expert systems(qv), as in the work of Brown and Burton (16) and Carbonell (J. R.) (17) and Carbonell (J. G.) et al (18). Becausesuch systems must operate robustly with real users, in addition to actually processing well-formed natural langueg€, they must be concerned with the detection and resolution of errors and misunderstandings by the user. BasicProblemof NLP. If there is one word to describewhy NLP is hard, it is ambiguity. It arises in natural language in many different forms including the following. Syntactic(or Structural)Ambiguity John saw the Grand Canyon flying to New York Time flies like an arrow Is it John or the Grand Canyon doing the flying? The answer dependson the ambiguous syntactic role of the word flying rn this example. Again, is time flying, or are we talking about a speciesof insect called time flies in the second example. It dependswhether flies is a noun or a verb. (Actually, the second example here has at least six different parsings. Seeif you can find them all.) Word SenseAmbiguity The rrlan went to the bank to get softLecash and jumped in Here the word bank refers either to a repository for money or the side af a river, depending on the two different continuations. Case He ran the mile in four minutes the Olympics Linguistically, a "case"refers to the relation betweena central organi zing concept, here an act of running, and a subsidiary
662
NATURAL.LANGUACE NG UNDERSTANDI
LISP expressions(most often for expert system requests). Case frame instantiations (for a variety of applications). Conceptual dependency(for story understanding).
"Who is the captain of the Kennedy?"
Figure 1. Translationfrom a natural-language utteranceto unambiguousinternal representation. concept, here time or location. In both examples the same preposition, in, indicates the two quite different relationships. Referential
In general NLP, translation of an utterance into an unambiguous internal representation can require inference based on a potentially unbounded set of real-world knowledge. Consider, for instance, Jack took the bread from the supermarket shelf, paid for it, and left. Coming up with an unambiguous representation for this requires answers to such questions as What did Jack pay for? what did Jack leaue?
(the referent of it) (the ellipsed object of left)
and possibly even I took the cake from the table and ate it. What was eaten, the cake or the table? The answer is "obvious," but, independent of real-world knowledge, lt could refer to either one. For instance, it would have a different referent in the example above if one were to repl aceate with cleaned,. Literalness Can you open the door? I feel cold. What are the correct interpretations here? There are some circumstances when the first question might be answered quite reasonably yes or no, e.g., before setting off on a long journey to the place where the door is. On the other hand, it is easy to think of circumstances where the speaker might be very unhappy with such a reply. Again, the secondexample might be a statement of fact or request to closea window. The ambiguities here lie in whether to interpret the utterance literally or whether to treat it as an indirect speechact (qv) (19), e.9., Bil implicit request as in the examplesabove. Becauseof these and other kinds of ambiguity, the central problem in NLP, and this is true for both the general and applied variety, is the translation of the potentially ambiguous natural-language input into an unambiguousinternal i.e., internal to the program doing the processing,representation, as suggestedby Figure 1. The secondlayer of Figure 1 shows an example translation of a natural-langu age database query into an expressionin a databasequery language-the one used by the LADDER (20) system for accessto its database of information about U.S. Navy ships. Note how a potentially ambiguous word such as Kennedy rs resolved into the internal name, JOHN F. KENNEDY, of a specificship, or captain is resolved into the name, COMMANDER, of a field of the relational databaseconceptually underlying the LADDER system. The specific internal representation used here is, of course, highly specialized.In general, there is no commonly agreed standard for internal representations, and different types are useful for different purposes.A partial list includes:
Did Jack haue the bread,with him when he teft? To answer these questions,information on supermarkets,buying and selling, and other real-world topics is required. As mentioned above, AI knowledge representation (qv) techniques have not yet developed to the stage where i6"y can handle at an acceptablelevel of efficiency the large quantities of such knowledge required to do a completejob of rnd"rstanding a large variety of topics. Moreover, even if the knowledge could be represented, unresolved problems in inference fqu) techniques remain a barrier to applying the correct knowledbt to the input in order to produce the desired unambiguous internal representation. The result is that current general NLp systems are demonstration systems that operate with a very small amount of carefully handcrafted knowledge, specifically designedto enable the processingof a small set ol example inputs. The main point of such systems is to investigate the feasibility of certain inference or knowledge representation techniQuesrather than to achieve broad .ou.r"ge in the I{Lp they perform. Applied NLP systems potentially face exactly the same problem, but they finesse it by taking advantage of certain characteristics of the highly limited domains in which they operate. Supposethe input How rnany terminals are there in the ord,er?
was addressedto an expert system that acted as a computer salesman'sassistant.Such a systemneednot considermany of the potential ambiguities lurking in this example. The word terminals, for instance, can be assumed to refer to computer terminals, rather than airport terminals, terminally ill patients, or terminal values of a mathematical series.Also, ,nssuming the system processesone sales order at a time, "the order" can be assumed to refer to the current order without considering any others. In general, the technique is to premake as many inferences as possibtein a way appropriate to the task at hand. For suitable tasks in many restricted domains, this has been used very successfullyto reduce the amount of knowledge that must be representedand the number of inferences that must be made to manageable proportions. By restricting the natural langu agedealt with by an interExpressions in a database query language (for DB access). face to that required to handle a limited task in a limited Parse trees with word sense terminal nodes (for machine domain, it is thus possible to construct performance systems transldtion). capable of useful natural-langu age communication, and this
NATURAL.LANGUACE UNDERSTANDING
represents the current state of the art in practical NLP. Clearly, this is far from satisfactory, since in particular, each task and domain that are tackled require careful preanalysis so that the required inferences can be preencodedin the system, thus making it difficult to transfer successfulnaturallanguage interfaces from one task to another. Some research (e.g.,Ref. t4) is being conductedto improve the portability of current interfaces, but until the problem of preencodinginferencesis solved in a more general way, the portability issue will be the one that most hinders the widespreaduse of natural-language interfaces. A practical alternative, however, is the Language Craft (Carnegie-Group Inc.) approach,where a development environment and grammar interpreter are provided to shorten drastically the development of new domainspecific interfaces. AnalysisTechniques Natural-Language In this section, several of the more common techniques for natural-language analysis are examined in some detail, i.e., for translating natural-language utterances into a unique internal representation. Virtually all natural-language analysis systems can be classified into one of the following categories: Pattern matching (qv) [e.g., ELIZA (qv) (2I), PARRy (qv) (22)1. Syntactically driven parsing (qv) [e.g.,ATNs (23)]. Semantic gfammars (seeGrammar, semantic) [e.g.,LIFER (seeGrammar, semantic) (13), SOPHIE (seeGrammar, semantic) (16)1. Case frame instantiation [e.g', ELI (seeGrammar' semantic) Q4)1. Wait and see [e.g.,Marcus (25)]. Word expert [e.g.,Small (26)]. Connectionist (seeConnectionism)[e.9.,Small (27)]. Skimming [e.g.,FRUMp (qv) (28),IPP (29)]. The examples provided with each category are the names of language analysis systems following that approach or the names of builders of such systems.Of these categories,the first four represent the bulk of the language analysis systems aIready constructed and are the only ones coveredin detail. The reader is encouragedto follow up the referencesprovided for further details of the other methods. PatternMatching. The essenceof the pattern-matching (qv) approach to natural-langu age analysis is to interpret input utterances as a whole rather than building up their interpretation by combining the structure and meaning of words or other lower level constituents. The approach is thus wholistic rather than constructive. With this approach,the interpretations are obtained by matching patterns of words against the input utterance. Associatedwith eachpattern is an interpretation, so that the derived interpretation is the one attached to the pattern that matched. In the simplest case this arrangement is simply a list of correspondencesbetween equivalence classesof utterances (the onesthat match a given pattern) and interpretations (the ones associated with each pattern). In more sophisticated variations of the approach, patterns may involve higher level constituents or semantic elements so that some aspects of the interpretation may becomeconstructive, but the basic flavor of the approach still remains to go as
663
directly as possiblefrom the input utterance to the interpretation, as suggestedby Figure 2. One of the simplest and earliest examplesof pattern-matching parsing was the ELIZA system of Weizenbaum (21). This system demonstrated how a very simple mechanism in the form of a computer program was able to mimic apparently intelligent behavior. The carefully selectedtask of ELIZA was to simulate a Rogerian psychologist as he interviewed a patient. The style of this school of psychologyallowed ELIZA to understand virtually nothing of what was said and still, in many cases,come up with convincing replies. ELIZA did not construct an internal representation of its input as such but instead went directly from the input to its reply, which could therefore be thought of as the "interpretation" of the input. The input was matched by a small set of single-level patterns, each of which was associatedwith several replies. The first pattern found to match an input lead to a selectedreply from the associatedset; if a pattern matched again on a subsequent input, the next reply was given. Variables in the patterns that would match any sequenceof words whatsoever allowed text from the input to be incorporated into the reply. Example patterns and correspondingreplies with variables representedby X and Y include: X always Y
X you Y me
Can you think of a specific example? When? Really, always? Why do you think I Y you? You like to think I Y you, don't you. I did Y you. What would that t1:nrthat
These would allow ELIZA (E) conduct the following dialogue with a patient (P): P: People always stare at me. E: Can you think of a specific example? P: For instanceyou keepstaring at me. E: Why do you think I keep staring at you. P: Becauseyou hate me. E: You like to think I hate !ou, don't you?
Matches first pattern
Matches secondpattern with Y matching "keep staring at" First reply for secondpattern with appropriate substitution for Y Secondpattern again with Y matching "hate" Next (second)reply for second pattern
The simplicity of the matching and reply generation meant that most conversations with ELIZA did not go nearly as smoothly as this, but there are several anecdotesabout people being fooled into thinking they were talking to a real person. ELIZA could achieve its results with such a low level of analysis only by ignoring most of what was said. To make more complete analyses of the input using the same tech-
P a t t e r nm a t c h
Figure 2. Parsing by pattern matching.
664
NATURAL.TANGUACE UNDERSTANDING
niques would require far too many patterns-in the extreme, one pattern for every possible utterance. Moreover, many of these patterns would contain common subelements because they mentioned the same objects or had the same concepts aruanged with slightly different syntax. In order to resolve these problems within the pattern-matching approach,hierarchical pattern-matching methods have been developed in which some patterns match only part of the input and replace that part by some canonical result. Other higher-level patterns can then match on these canonical elements in a similar way, until a top-level pattern is able to match the canonicalized input as a whole according to the standard patternmatching paradigm. In this way similar parts of different utterances can be matched by the same patterns, and the total number of patterns is much reduced and made more manageable. The best known example of hierarchical pattern matching is the PARRY system of Colby (22,30).Like ELIZA, this program operates in a psychological domain but models a paranoid patient rather than a psychologist.Using the traditional pattern-matching paradigffi, PARRY interprets its input utterances as a whole by matching them against a set of about 2000 general patterns. The internal representation into which the input is transformed is a set of updatesto a simple model of the paranoid patient's mental state plus a representation of any factual content of the input. Replies are generated from the updated paranoid model plus the factual content. However, before the general patterns are applied, PARRY massagesits input through a series of eight canonicalizing steps, most of which are basedon localizedpattern matching. Examples of these steps include: Canonicalizingrigid idioms (e.g.,"have it in for" -+ "hate"). Noun phrase bracketing using an ATN (seebelow). Canonicalizing flexible idioms (e.g., "lend a hand" + "help "). Clause splitting (e.g.,"I think you need help" + "(I think) (you need help)"). Using rules of this form, an input such as Do you haue it in for me? I want to lend you a hand.
which matclnes all events in which a person compels another person to do something. Other general patterns involved people doing things to objects,objectsbeing in certain states, etc. To allow matches against these patterns, Wilks represented word sensesas formulas of the same semantic primitives as appeared in the patterns, so for instance, intenogate was ((MAI{ SUBJ) ((MAN OBJE) (TELL FORCE))), i.e., a person forcing another person to tell something, and crook was one of the following possibilities: (KNOTGOOD ACT) OBJE) DO) SUBJ MAN)) ((((((THIS BEAST) OBJE) FORCE) SUBJ MAN)) POSS) LINE THING)) i.e., a person who does bad things or a long thin thing that a person usesto force animals (normally sheep)to do something. As well as providing an interpretation of the input, the process of matching these formulas against the general patterns also allowed word sensesto be disambiguated. So The policeman intercogatedthe crook. is analyzed by matching it against the (MAN FORCE MAN) pattern, and this also choosesthe bad person sense of crook becauseit matches the secondMAN of this pattern. There is also a (MAN FORCE THING) pattern, but this doesnot match as well becausethe formula for interrogate specifiesMAN for its object.Note that the notion of degreeof match is present in this system. As shown below, this idea makes parsing by pattern matching considerably more powerful, especially when the input contains grammatical errors. To summaruzethis section on parsing by pattern matching, the basic paradigm is to recognizeinput utterances as a whole by matching them against patterns of words, wildcards, andior semantic primitives. The result of the match is the interpretation of the utterance. Unless a very shallow level of analysis is acceptable,the number of patterns required is too large, even for restricted domains. This problem can be ameliorated by hierarchical pattern matching in which the input is gradually canonicalized through pattern matching against subphrases. The number of patterns can also be reduced by matching with semantic primitives instead of words.
can be canonicalized into a form similar to (YOU HATE ME) + INTERROGATIVE _? O WANT) (I HELP YOU) An appropriate reply is generated by matching against PARRY's 2000 general patterns. As well as matching patterns of words, it is also possibleto analyze natural-language input by matching patterns of semantic elements with potentially very powerful results as shown by the pilot machine translation (qv) system of Wilks (31). The goal of this system was to translate English input into French output. To do this, it first analyzed its English input into an internal semantic pattern from which it could generate the French. This analysis was performed by matching the input against a very general set of patterns such as (MAN FORCE MAN)
SyntacticallyDriven Parsing.Syntax deals with the ways that words can fit together to form higher level units such as phrases, clauses, and sentences.Syntactically driven parsing (qv) is, therefore7naturally constructive, i.e., the interpretations of larger groups of words are built up out of the interpretations of their syntactic constituent words or phrases.In this sense,it is just the oppositeof pattern matchirg, in which the emphasisis on interpretation of the input as a whole. The most natural way for syntactically driven parsing to operate is to construct a complete syntactic analysis of the input utterance first and only then to construct the internal representation or interpretation. This leads to considerable inefficiency, and more recent syntactically driven approacheshave tried to intermix parsing and interpretation. Parselrees and Context-FreeGrammars. The most common form of syntactic analysis is known as a parse tree. Figure 3 shows a parse tree for the sentence
NATURAL-LANGUAGE UNDERSTANDING
The rabbit nibbled the carrot. The tree showsthat the sentenceis composedof a noun phrase (subject) and a verb phrase (predicate).The noun phrase consists of a determiner (the)followedby a noun (rabbiD,whereas the verb phrase consistsof a verb (nibbled) followed by another noun phrase (the direct object), whose determiner ts the and whose noun is carcot. Syntactic analyses are obtained by application of a grammar that determines what sentencesare legal in the language being parsed. The method of applying the grammar to the input is called a parsing (qv) mechanism or parsing algorithm. A very simple style of grammar is called a context-freegrammar, which means that the symbol on the left side of a rewrite rule may be replaced by the symbol on the right side regardless of the context in which the left side symbol appears.The context-freegrammar consistsof rewrite rules of the following form:
665
English between subject and object.To enforce such an agreement, there would have to be two completely parallel grammars, one for singular sentences and the other for plural. Moreover, a grammar that also allowed passivesentencessuch The carrot was nibbled by the rabbit.
would have to have another completely different set of rules, even though the passive and the active forms of the same sentence have a clear syntactic relation, not to mention semantic equivalence. These duplications are multiplicative rather than additive, leading to exponential growth in the number of the grammar rules. Thus, in terms of the number of rules involved and in terms of being unable to capture related phenomena by related rules, context-free grammars turn out to be quite unsuitable for natural-language analysis. Recent work by Gazdar (34) and other has shown that these problems of exponential rule growth can be masked using notational shorthand devices such as "metarules" plus relatively minor S-+NPVP extensions to the context-free formalism and in particular NP -+ DET NIDET ADJ N without going to the transformational framework discussed VP-+VNP below. However, the computational tractability of generalized DET + the phrase structure grammar (qv), as the extended formalism is + ADJ biglgreen called, has yet to be determined. N + rabbit lrabbits Icarrot There is one more point to be made with this example, one V + nibbledInibbledlnibble not specific to context-free grammars, but a serious problem As this example shows, context-free grammars have the ad- for all syntactically driven parsing. The above grammar also vantage of being simple to define. They have been widely used allows for computer languag€s, and highly efficient parsing mechaThe rabbit was nibbled by the carcot. nisms (32,33)have been developedto apply them to their input. However, they also suffer from some severe disadvantages. It should be clear that the above context-freegrammar This is an example of a sentencethat is perfectly goodsyntacaccountsfor the parse shown in Figure 3; rewrite rules corre- tically but makes no senseat all. For utterances that are am(and for more comprehensivegrammars, spond directly to bifurcations in that tree. Although it ac- biguous syntactically is very common),such acceptanceof nonambiguity syntactic counts for that and several other good sentences,the grammar can lead to the highly inefficient geninterpretations sensical also allows several bad ones, such as eration of multiple parses,only one of which has a reasonable translation into the final internal semantic representation. The rabbits nibbles the carrot. TransformationalGrammar. The problems mentioned above to context-free grammars were tackled by linguists, as specific The problem here is that the context-free nature of the grammar does not allow agreements such as the one required in in particular Chomsky (f ,2), through transformational grammar (qv). As shown in Figure 4, their answer was to add another type of rule to a context-free grammar. The basic idea was to use the context-free grammar to generate a parse tree just as before but add onto it certain tags, such as one for a plural sentence.The set of transformations on the parse tree would then rearrange things so that the pluralness was transmitted to all parts of the tree concerned and the required agreements could be enforced. The transformations that enforced agreements were called obligatory transformations. A secondclass of optional transformations was used to capture the relations between, for instance, active and passive sentences; the active and passive versions of the same sentence had the same representation in the base componentproduced
Context-free grammar
the Figure
r ab b i t
n i b bblleedd
the
carrot
3. A parse tree for "the rabbit nibbled the carrot."
Figure 4. Transformational grammar.
666
UNDERSTANDING NATURAL-LANGUAGE
by the context-free grammar, but the passive version was the result of applying an extra optional transformation. Transformations are context-sensitive rules that map a parse tree into a related parse tree. Although transformational grammar did a much better job of accounting for the regularities of natural language than context-free grammar, from the point of view of computational effectiveness,it was much worse. (However, significantly, a complete transformational grammar of English has never beenproduced.)As the abovedescriptionimplied, it was set up as a generative model, i.e., it told you how to produce a sentence starting from the symbol S. Running the model in reverse to do sentenceanalysis turned out to be a computational nightmare, largely becausetransformations operate on trees, not strings of words, and so are highly nondeterministic when run backward. For instance, the "equi-NP deletion" transformation deletes without trace the second occurrence of a coreferential noun phrase in certain structures, and it is impossible to run a deletion backward if there is no clue as to what was deleted.Consequently,although someattempts have been made [e.g., Petrick (3)], parsers based on transformational grammar have not played a major role in NLP. AugmentedTransition Networks. Largely in responseto the problems of transformational grammar (qv), Bobrow and Fraser (35) proposed and Woods (23) subsequently developed a method of expressing a syntactic grammar that was computationally tractable and yet still could capture linguistic generalizations in a concisew&y, in many casesmore conciselythan transformational grammar itself. The formalism Woodsdeveloped was known as an augmented transition network (ATN) (seeGrammar, augmented-transition-network). It consistedof a recursive transition network (formally equivalent in expressive power to a context-free grammar) augmented by a set of tests to be satisfied before an arc was traversed and a set of registers that could be used to save intermediate results or global state. An example ATN is shown in Figure 5. The network recognizessimple sentenceswith just a subject,verb, and direct object in all combinations of active, passive,declarative, and interrogative. The symbolsattached to the arcs show what constituent must be recognizedto traverse the arc;AUX is an auxiliary verb (like is or haue);NP is a noun phrase, which is definedby another network in the same formalism as this one; V is a verb; and "by" is the word by. The numbers on the arcs serve as indices to the following table, which list the tests that must be true to traverse the arcs and the action that must be performed as the arc is traversed.
Figure 5. ExampleATN. In this LISP-like notation, the asterisk refers to the constituent just parsed, and SETR sets a register, whose name is specifiedby its first argument, to the value of its secondargument. A concrete example of the network in operation will make this clearer. Supposeone wanted to parse The rabbit nibbled the carrot. One would start at the leftmost node in the graph and at the left of the sentence.Two arcs lead from that node,but only arc 2 is applicable since in the input one is not looking at the auxiliary verb required by arc 1 but at a noun phrase, "the rabbit," as required by arc 2. One can see from the table that arc 2 has no additional test (indicated by T), so we traverse that link setting the SUBJ register to the thing just parsed, i.e., "the rabbit," and the TYPE register to DECLARATIVE. One is now at a node with only one arc, arc B, and that arc requires a verb. Fortunately, one is now looking at "nibbled" in the input, so one can try to traverse it. Arc 3 has an additional test requiring that*, i.e., the present word in the input (the verb), agree with the contents of the subject register; this is the way agreementsare enforcedin an ATN. In this casethe agreement is correct, and one can traverse the arc, setting the V register to the verb. The node one gets to now has a line through it, indicating that this can be the end of the parse provided that there is no input left to consume,so "The rabbit nibbled" would be acceptedhere. In this example there is another noun phrase, "the carrot," and so one follows atc 6, whose test requires that the verb in the V register be transitive, which "nibbled" is. So one ends up at another terminal node with no further input, and so the parse is completed successfully.The result of the parse is the setting of the four registers: SUBJ, TYPE, V, and OBJ, and these can be combined into a tree or whatever representation is desired. A more interesting use of registers can be seen from the example The camot was nibbled by the rabbit.
Predicate
1T 2T 3 (agrees.V) 4 (agreesSUBJ.) 5 (AND (GETF PPRT) (: V'BE))
6 (TRANS V) 7 AGFLAG 8 T
(SETRV-) (SETRTYPE'QUESTION) (SETRSUBJ-) (SETRTYPE'DECLARATIVE) (SETRSUBJ-) (SETRV-) (SETROBJ SUBJ) (SETRV-) (SETRAGFLAGT) (SETRSUBJ'SOMEONE) (SETROBJ-) (SETRAGFLAG FALSE) (SETRSUBJ-)
To parse the first three words, we traverse arcs 2 and 3 much as before,with the difference that now "the carrot" is in SUBJ and "was" is in V. One cannot take arc 6 becauseone is only up to "nibbled" in the input, but one can take arc 5 becausenibbled is a verb. The test on arc 5 also requires nibbled to be a past participle, which it is, and the contents of V to be, and since was is a form of the verb to be, the test is satisfied. The action on arc 5 is interesting; it puts the contents of the SUBJ register in the OBJ register, overwrites the verb register with the past participle verb, sets a flag to true, and puts a placeholder "someone"in the SUBJ register. This correspondsto recognrzing that the sentence is in passive form, and in our casemakes the carrotthe objectand ruibbledthe verb. One has reached "by" in the input and so can follow arc 7, which just requires the passiveflag to be set; its only action is to turn this
NATURAL.LANCUAGE UNDERSTANDING
flag off, so that the arc cannot be traversed again. Finally, one gets back to their terminal node via arc 8, which puts "the rabbit" in the SUBJ register. Note that the result of this parse is the same as the result of the first example. Now try to follow the parses of
Did the rabbit nibble the carrot? Was the carrot nibbled by the rabbit? These brief examples should give someidea of the power of an ATN and of how its tests and registers can be used to capture the regularities of language in a conciseand elegant way. Very large ATN grammars of several hundred nodes(36) have been developed that capture large subjects of English. However, ATNs also have several disadvantages: Complexity and Nonmodularity. As the coverage of an ATN increases, so does its structural complexity. It becomesextremely difficult to modify or augment an existing ATN without causing large numbers of unforeseenside effects. For instance, if another outgoing arc is added to a node with a large number of incoming arcs in order to handle an additional type of phrase that is a valid continuation of the parse represented by one of the incoming arcs' it could to lead to spurious and incorrect parseswhen the node is reached via a different incoming arc. (Fan-out and fan-in factors of 10 or 20 are not uncommon in large realistic grammars.) Fragility. The curuent position in the network is a very important piece of state information for the operation of an ATN. If an input should be slightly ungrammatical, even by a single word, it is very hard to find the appropriate state to jump to that would enable the parse to continue, though seethe work by Kwasny and Sondheimer (57) and Weischedel and Black (37) on dealing with such extragrammaticality and the work on island-driven ATN parsing for speech input by Bates (38). Infficiency through Backtracking Search. Although the aboveexamplesare not complex enough to show it, the task of traversing an ATN is in general nondeterministic and requires search. The natural way to search an ATN is through backtracking (qv). Because intermediate failures are not remembered in such a search, major inefficiencies can result through repetition of the same subparsesarrived at through different paths through the network. Chartparsing techniques (39-41) were designedas alternatives to ATNs precisely to avoid these inefficiencies. Ineffi,ciency through Meaningless Parses. Normally the grammar of an ATN is purely syntactic, and a complete syntactic parse is producedbefore any semantic interpretation is performed. In that situation many spurious meaningless parsescan be produced,especiallyif the grammar is large and comprehensive. To combat this, recent parsers (42) in the ATN tradition have tried to interpret each constituent as it was produced, thus preventing complete
to be that couldbepredicted on constituents
ffiffl3,1x:] See Refs. 43 and 44 for more discussion on the relative advantages and disadvantages of ATNs.
667
SemanticGrammars.Language analysis basedon semantic grammars (qv) is similar to syntactically driven parsing except that in semantic grammars the categories used are defined semantically as well as syntactically. Thus, instead of the category "noun phrase" in a syntactic grammar, a semantic grammar might have the category "description of a ship," which is syntactically always a noun phrase but has additional strong semantic constraints. Semantic grammars were introducedby Burton (45) for use in SOPHIE (16), a computer-aided instruction system for electronic circuit debuggirg, to deal with the problems of inefficiency due to the generation of syntactically correct, but meaningless, parses mentioned above for ATN-based syntactic grammars. The goal was to eliminate the production of meaningless parses by setting up the grammar so that only meaningful parses could be produced.To do this, it was necessary to categorrzeall the objectsand actions that the SOPHIE system neededto parse to conduct a conversation in its domain of electronic circuitry and then to construct the grammar so that, for instance, only a description of a switch could be the object of a "close" action. This technieu€, while retaining the fragility of an ATN, worked well to reduce parsing inefficiency. Becausethe relevant semantic categories were available at parse time, it also allowed semantic interpretation to proceedas the parse unfolded. However, the technique only works properly in restricted domains, Iike the one mentioned above, in which all objects and their relations can be categorized in advance, allowing a grammar to be built around the possible semantic relations. Semantic grammars are thus a technique useful only for applied natural-language processing,not for general NLP. For an example of how semantic grammars can be used, consider the following grammar definition in the formalism used by LIFER, a system for building semantic grammars developedby Hendrix (13). S -+ (present) the (attribute) of (ship) (present) - what is I [can youl tell me (attribute) + length I beam I class (ship) + the (shipname) | (classname)class ship (shipname) -+ kennedy I enterprise (classname)-+ kitty hawk I lafayette An expandedversion of this grammar was used for accessto a database of information about U.S. Navy ships in the LADDER (20) system. Even the above "mini" version is capableof recognrzing such inputs as What is the length of the Kennedy? Can you tell me the class of the Enterprise? What is the length of Kitty Hawk class ships? Since the definitions used by LIFER are similar to those used for context-free grammars, the reader should have no difficulty in seeing how these inputs could be recognizedby the above glammar. In addition to defining a grammar, LIFER also allowed an interface builder to specifythe interpretations to be producedfrom rules that were used in the recognition of an input. In the above case this resulted in database query language statements corresponding to the inputs being produced as a direct result of the recognition. The databasequery language statements in effect took the place of a parse tree, and so no separate semantic interpretation stage was required.
NATURAT-LANGUAGE UNDERSTANDING
Note in the example above that not all the categoriesare specializations of pure semantic categories; (present), for instance,will parse several phrases, none of which fits into any standard grammatical category. The phrases may differ from each other in their syntactic structure, including the number of verbs they contain. The ability to construct cross-grammatical categorieslike this allows a semantic grammar to incorporate some features of pattern matching. Also note how strongly directed the recognition is. The word class for instanceoccursin two quite different ways in the grammar: once as a ship attribute and the other as part of the secondtype of ship description. Thus, in the (rather silly) question What is the class of Lafayette classships? the appropriate category for class would be used each time it appeared without considering its other role in the grammar. This directednessof recognition is also useful in building spelling correction into the recognition process.In an imput like What is the legnth of the Kennedy? the spelling of legnth need only be checkedagainst the list of ship attributes rather than the entire system vocabulary because a ship attribute is the only category that can appear at the place where the misspelling occurs. A final advantage of the strong top-down direction available through semantic glammars can be seen in LIFER's ellipsis mechanism, which was intended to deal with input sequencessuch as What is the length of the Kennedy? The beam? Here the fact that beam and length are in the same semantic grammar category allows the secondinput to be interpreted as "What is the beam of the Kenne dy?" rather than say "What is the length of the beam?" See below for discussionon ellipsis mechanismsin general. In addition to their numerous advantages for limited domain applications, semantic grammars have several disadvantages, chief of which is the requirement that a new grammar be developedfor each new domain, since the semantic categories for each domain will be quite different. However, if the applications are similar (e.g., both include database access), there will be many parts of the grammar (e.g.,the basic framework for questions)that are the same.A related disadvantage is that semantic grammars tend to get large very quickly, partly because of the repetition of similar constructions in different semantic categories. This makes nontoy-semantic grammars quite hard to construct and can result in very "spotty" kind of coverageof syntactic variation. For instance, adding a rule that allows the possessiveto be apostrophizedin the description of a ship attribute (i.e.,you can say "the Kennedy's length" as well as "the length of the Kennedy") doesnot also allow possessivesto be apostrophizedin the description of an attribute of a sailor (i.e., you might not be able to say "officer's rank" even though you can say "rank of an officer") becausethe two categories are in different parts of the grammar, and their recognition is unrelated. A secondrule would be required. Three approaches have been tried to resolve these prob-
lems. One is to go back to recognition by a syntactic grammar before semantic interpretation, but to try to intermix the semantic and syntactic componentsmuch more closely, so that every syntactic constituent is interpreted as soon as it is constructed. The RUS system (42) is an example of this approach. It provides some improvement over a pure syntax first approach but is still not as efficient as pure semantic grammars; it is also difficult to incorporate semantic constraints, B process that requires writing different chunks of LISP code,called 'Irules," of each domain. An alternative approach, as exemplified by the TEAM system (14),is to focusin on a specificclassof applications,access to relational databases,and to abstract out the linguistically commonaspectsof a semantic grammar for such a class.Building a specific interface, then, requires only instantiating a template, as it were, with the vocabulary and morphological variation required for a specific database.This approach has the potential to produce highly efficient natural-language interfaces,but at the cost of someexpressivepower and inability to go beyond the class of applications without restarting from the ground up. The third approach is to combine the strengths of several parsing strategies, such as semantic grammars, syntactic transformations, and pattern matching into a single system that maps structures into ore canonical forms before attempting to use the full semantic grammar, thus allowing many redundant and unnecessary constructions to be eliminated (46,47).This multistratery approachhas been implemented in the DYPAR system (48) and applied to databasequery, expert system command, and operating system command interfaces. Although richer in expressivepower, this approach demands more sophistication of the glammar writer, requiring knowledge of how to write transformations, context-free rules, and patterns. CaseFrameInstantiation.A major developmentin computational linguistics (qv) was the inclusion of case-frameinstantiation (seeGrammar, case)in the repertoire of effective parsing techniques. Case frames were popularized by the linguist Charles Fillmore in his seminal paper "A Casefor Case" (49), and their computational import was quickly grasped by several researchers in natural-language processing, including Simmons (50), Schank (6), and Riesbeck(s1). Case frame instantiation is one of the major parsing techniquesunder active researchtoday. Its recursive nature, and its ability to combine bottom-up recognition of key constituents with top-down instantiation of less structured constituents, gives this method very useful computational properties (seealso, Frame Theory). What are CaseFrames.?A caseframe consistsof a head concept and a set of roles, or subsidiary concepts,associatedin a well-defined manner with the head concept.Initially, only sentential-level case frames were investigated, where the head consists of the main verb, and the casesinclude the "agent" that carries out the action, the "object" acted upon, the "location" in which the action takes place, etc. For instance, consider the sentence In Elm Street, Johrt. broke a window with a hammer for Billy. In simplified generic notation, the caseframe correspondingto this sentence is
NATURAL.LANGUAGE UNDERSTANDING
IBREAK [caseframe agent: JOHN
?y,i::;I:y,H[*,o recipient: directiue: locatiue;ELM STREET benefactiue:BILLY co-agent: I Imodals
I'f;,xTl".'
669
In order to illustrate the directiue case, consider "John kicked the ball toward the goal" and "John flew the airplane to New York." In the former example "the goal" fills the directiue case, and in the latter "New York" fills the same case, since both expressthe direction in which each respectiveaction was performed. In some early formulations of case frames no distinction was made between locatiue and directiue,but the need to encodestative vs. dynamic information explicitly-plus the need to represent sentencessuch as "fn Yankee Stadium, John threw the ball at the catcher" that instantiate both cases-led to the acceptanceof two semantically distinct cases,one encoding global location, the other a local change in location. The recipient caseis filled by "Mary" in both of the following: "John gave Mary a ball" and "John gave a ball to Mary." Note that in this instance there are syntactically distinct sentences that map onto a unique semantic caseframe representation, to wit:
In the notation above, cases, such as agent, are written in lowercase,and their fillers are in uppercase. Caseframes, 8s adoptedin computational linguistics, differ markedly from simple, purely syntactic, parse trees. The relaTGIVE tions between the head of the case frame and the individual [caseframe agent: JOHN cases are defined semantically, not syntactically. Hence, a recipient: MARY noun in the subject position can fill the agent case,as in the object: BALLI example above, or it can fill an object case,as in "the window broke" (the window was not the agent that causedthe break. .l age),or it can fill the instrument case,as in "the hammer broke the window." These are different semantic roles played by the Required,Optional, and Forbidden Cases.Each case frame same syntactic constituent, "subject." Since the purpose of a defines some required cases, some optional cases,and some nature-Ianguage interface is to extract the semantics of the forbiden cases.A required caseis one that must be present in input it behoovesthe caseframe representation to encodeexfor order the verb to make sense.For instance, break requires plicitly semantic differences in otherwise similar syntactic the objectcase.A sentenceis not complete without it (try consemantic requires frames parsing into case parse trees. Thus, structing one), but no other case is required. "The window knowledge, as well as syntactic information, as shown below. broke" is a complete, if not very informative, sentence.An Consider someother properties of caseframes. In the examoptional caseis one that, if present, provides more information ple above, only some of the caseswere instantiated. What of to the case frame representation but, if absent, doesnot harm the other cases, such as recipient and co-agent?There are its integrity. For instance agent, co-agent,and locasemantic examples that illustrate these shortly. First, consider the tiue are optional casesof break. Forbidden casesare those that meaning of each case,as outlined below: cannot be present with the head verb. The directiue and recipient casesare forbidden for the break case frame. (Again, try constructing a sentence with these casesusing break as the tffEAD VERB) head verb.) [caseframe agent: (the active causal agent instigating the action) Conieptual Dependency.It is often useful in natural-lanobject: (the object upon which the action is done) guage processing to employ a semantic representation that instrument: (an instrument used to assist in the action) represents information in as canonical a manner as possible. recipient: (the receiver of an action - often the indirect In the ideal canonical representation, different ways of stating object) the same information would be represented identically, and directiue: (the target of a (usually physical) action) propositions that encodesimilar information would map into locatiue: (the location where the action takes place) semantic encodingsthat highlighted the similarities while rebenefactiue:(the entity on whose behalf the action is taining the differencesin an explicit manner. The best known taken) attempt at a canonical semantic representation is the concepco-agent: (a secondary or assistant active agent) tual dependency (CD) (qr) formalism developed by Schank tl (6,52,53)as a reductionistic caseframe representation for common action verbs. Essentially, it attempts to represent every If instead of saying "John broke the window with a hamaction as a composition of one or more primitive actions, plus mer," one were to say "John broke the window with Mary," intermediate states and causal relations. To use Schank's example, supposeone wants to represent, Mary would fill the co-agent case. Presumably John did not in a caseframe notation, "John gave Mary a ball" and "Mary swing Mary over his head and use her as a battering ram to took a ball from John." These sentencesdiffer syntactically, shatter the window, much as he would use an instrument like they differ in terms of verb selection, and they differ in how a hammer or a tree branch. Since Mary is taking part in caustheir cases are instantiated (e.g., "John" is the agent of the ing the action to happ€r, regardless of whether her action is independent of, or in support of, John's action, she fiIIs the co- first sentence and "Mary" of the secondsentence).However, both sentencesexpress the proposition that a ball was transagent case.
670
UNDERSTANDING NATURAL-LANGUAGE
ferred from John to Mary, and in both casesone can infer that John had the ball before the action took place,that Mary has it after the action, and that John no longer has it after the action. The only significant difference is that in the first sentence, John performed the action, and in the latter Mary did so. In CD there is a primitive action called ATRANS (for Abstract TRANSfeT of possession,control, or ownership) that encodes the basic semantics of both of these verbs and many more. The CD representation of these sentenceis: TATRANS rel: POSSESSION actor: JOHN object: BALL source:JOHN recipient: MARYI "John gave Mary a ball"
TATRANS rel: POSSESSION actor: MARY object: BALL source:JOHN recipient: MARYI "Mary took a baII from John"
(Somereaders may be acquainted with Schank'scomplexnotation of double and triple arrows. The direct simplified notation (shown above) is virtually isomorphic, somewhat clearer, and closer to the data structures used by most of the computer programs that parse into CD and other caseframe representations.) Thesetwo structures are very simple to match against each other to determine precisely in what aspectsthe two propositions differ and in what aspectsthey are identical. Moreover, inference rules associatedwith ATRANS can be invoked automatically wh en giue and take are parsed into these structures. There are many more verbs that contain the ATRANS primitive (such as bequeath, donate, steal, sell, buy, a,ppropriate, ucpropriate,etc.). Sometimes ATRANS is used in conjunction with other CD primitives that capture other aspects of the meaning. The verb sell, for instance, involves two ATRANS primitives in mutual causation:
TATRANS
IATRANS rel: OWNERSHIP CAUSE + actor: JOHN object:APPLE CAUSE source:JOHN recipient: MARYI "John sold an apple to Mary
rel: OWNERSHIP ,actor: MARY ' object:25 CENTS source:MARY recipient: JOHNI for 25 cents"
The casesused in CD are similar but not identical to the set used originally in casegrammars, although the basic ideas are the same. One refinement in CD was to separate agent into actor and source,&s the two can be instantiated by different entities in the underlying semantic primitives. Other CD primitive actions include: PTRANS MTRANS MBUILD INGEST PROPEL ATTEND SPEAK
Physical transfer of location Mental transfer of information Create a new idea or conclusionfrom other information Bring any substanceinto the body Appty a force to an object Focus a senseorgan (e.g.,eyes,ears) Produce sounds of any sort
Later work (54) has extended this list to include social and other interpersonal actions.
Parsinginto CaseFrames.The discussionof caseframes thus far has focused on their structural properties, including parsimony and clarity of representation. Now the uses of case frames in parsing natural language are discussed,in particular certain parsing techniques available to parsers whose target representation is basedon caseframes. In essence,parsers built around caseglammars help to combine bottom-up recognition of structuring constituents with more focusedtop-down instantiation of less structured, more complex constituents. This essential property is demonstrated in the example case frame recognition algorithm presented below (see also Parsing). Thus far caseframes have been mentioned that consist of a header and a collectionof semantically definedcases.There is a bit more to it than that. Each case consistsof a filler and a positional or a lexical marker. There have been examples of casefillers in the abovesections.A positional casemarker says that the filler of the caseoccursin a predefinedlocation in the surface string. A lexical casemarker saysthat the casefiller is precededby one of a small set of marker words (usually prepositions) in the surface string. For instance, considerthe following input to a natural-langu age interface to an operating system: Copy the fortran files from the system library to my directory. "Copy" is the case header, the object case is marked positionally as the noun phrase occupying the simple direct object position (i.e., the first noun phrase to the right of the verb that is not precededby a preposition). The filler of the objectcaseis constrained semantically to be some information structure in a computer. Hence, the parser knows where in the input to search for the fiIler of the object case and moreover knows what to expect in that position (a noun phrase denoting an information structure, Iike a fiIe or directory in a computer). The source case is marked lexically by the preposition from and the recipient case is marked by the preposition fo. Both casefillers are constrained to be noun phrasesdenoting information repositories in the computer (directories,tapes, etc.). More explicitly, the case frame information available to the parser is: TCOPY(header-pattern) [object: (POSITIONAL DIRECT-OBJECT) marker: (i nformation- structure)l filler: lsource: (LEXICAL (from-marker)) marker: (information-repository)| (input-device)l filler: ldestination: (LEXICAL (to-maker)) marker: (information-repo sitory) | (output- device)l filler: l Where: (header-pattern) --+ copyI transfer lmovel. (from-maker) --+ from lin (to-marker) --+to linto lonto plus patterns or NP-level case frames to recognize outputdevices,input-devices,information structures, and information-repositories A typical case-frame parsing algorithm that operates on this caseframe data structure could be summarized as follows:
NATURAL.TANGUAGE UNDERSTANDING
1. For each case frame in the grammar, attempt an unanchored match of the header pattern against the input string. If none succeeds,the input is unparsable by the grammar. (An unanchored match is the processof searching for a particular pattern anywhere in the input, as opposedto an anchored match, where the match is attempted only starting at a predefinedposition in the input string.) If one or more matches are found, perform the following steps for frame where header matched, and the onesthat account for the entire input are the possible parses of the input string. 2. Retrieve the case frame indexed by the recognized case header. 3. Attempt to recogRizeeachrequired case,as follows: a. If the caseis marked lexically, do an unanchored match for the case marker (a very simple one- or two-word pattern), and if that succeeds,perform the more complex recognition of the case filler by anchored match to the right of the case marker or by a more complex parsing strategy (such as recognizing an embedded case frame startin g at that location in the input). "Source"and "destination" in the example above are marked lexically. b. If the caseis marked positionally, do an anchoredmatch of the case filler (or again a more complex recognition stratery) starting at the designated point in the input string. "Object" in the example above is marked positionally. c. If the casemaker can be marked either way, searchfirst for the lexical marker, and, failing that, attempt to recognizeit positionally. For instance, the recipient casein GIVE can be marked by the word to (or u,nto,etc.) or it can appear positionally in the indirect object location ("John gave an apple to Mary" vs. "John gave Mary an apple"). If one or more required casesare not recognized,return an error condition. This signifies a possible ellipsis, incorrect selectionof the caseframe, ill-formed user input, or insufficient grammatical coverage.The following sectionsaddress issues of robust recovery from ill-formed user input. 4, Attempt to recognizeall the optional casesby applying the same method used to parse the required ones. If some are not recognized,however, do not generate error conditions. 5. If, after all, the required and optional caseshave been processedand there is remaining input, generate a potential error condition denoting spurious input, insufficient cover&g€,or garbled or ill-formed input that may be recognized by more flexible parsing strategies. As the caseframe is parsed, the input segmentsrecognizedas casefillers are processedand stored as the value of the corresponding casesin the caseframe. A partially instantiated case frame can serve to guide error-correction processesor to formulate focusedqueries to the user (46,55,56).The initial case frame selection phase can be speededup by indexing the caseheader patterns by the words they contain and recognuzung them in a pure bottom-up manner. This bottom-up indexbased process is computationally effective if there are very many case frames, and each case header consists of a relatively simple pattern. Otherwise, the top-down unanchored pattern match is sufficiently efficient (few case frames), or both processesrequire substantial computation (large numbers of case frames with complex header patterns).
671
Case-frame instantiation can be applied recursively to parse relative clauses or any other linguistic structures that can be expressed as case frames. Noun phrases with postnominal modifiers (i.e.,trailing propositionalphrasesthat modify the main noun phrase), for instance, can be encoded and recognized by an extension of the sentential-level caseframe instantiation algorithm presented above. Moreover, case-frame instantiation works in concert with semantic gTammars or patterns used to recognize any subconstituents such as case markers represented as nonterminal nodes in a grammar. The advantagesof case-frameinstantiation over other parsing techniques can be summarized as follows: Caseframes combine bottom-up recognition of simple structuring constituents, such as caseheadersand casemarkers, with top-down recognition of semantically more complex, but syntactically less signifi cant, case fillers. The differential treatment of different constituents provides more efficient parsing in general, allows for ellipsis resolution, and makes possible some forms of error recovery, as discussed below. Caseframes combine syntax and semantics.Positional and case marker information is used in concert with semantic recognition of case fillers, thus reducing (though certainly not eliminating) structural and lexical ambiguity. Case frames are a fairly convenient representation for back-end systems to use. In contrast, parse trees must first be interpreted semantically and subsequentlytransformed into a representation more convenient for other modules in the system. Robust Parsing.Atty natural-language interface that is used in a practical application with a multitude of users must be able to handle input that is outside its grammar or expectations in various ways. When people use language spontaneously, whether in spoken or written form, they inevitably make mistakes resulting in extragrammatical utterances that a natural-language interface will receive. Given the present limited state of NLP, a natural-language interface must also be prepared for input that is, as far as the user is concerned,perfectly correct but that the parser cannot recognize because of its own limited competence. Some types of extragrammatical utterances [see Refs. 43 and 44 for more complete accountsl are listed below with example utterances that might be encounteredby an interface to a college course registration system. Spelling errors: tarnsfer Jim Smith from Econoics 237 too Mathematics 15?Note that some spelling errors can result in different correctly spelled words (e.g., too). Novel words: transfer Smith out of Economics237 to Basketwork 100 Here one suppos"r tftut "out of" is not listed as a (multiword) preposition correspondingto the sourcecasemarker of transfer and that "Basketwork" is not in the interface's dictionary of department names. Spurious phrases: pleaseenroll Smith if that's possible in I think Economics 237
672
NATURAL-LANGUACE UNDERSTANDING
Ellipsis or other fragmentary utterances: also Physics 514 This might be the a follow-up input to the previous one. Unusual word order: in Economics2ST Jim Smith enroll Missing words: enroll Smith Economics2SZ Here the in is missing, but the meaning is still perfectly clear.
the operation of an ATN. If an input should be slightly ungrammatical, even by a single word, it is very hard to find the appropriate state to jump to that would enable the parse to continue. This assumes,moreover,that it is possibl. [o determine exactly where the input has departed from the grammar's expectations.The backtracking search used with most ATNs can make this difficult. Work by Weischedeland Black (37) has dealt with extragrammaticality caused by incorrect agreementsthat can be resolved by relaxing the predicateson ATN arcs, and Kwasny and Sondheimer(bi) havl looked into Unless a natural-language interface can deal with prob- adding extra arcs to ATNs on a dynamic basis to make the lems in these classeseasily, it will appear very uncooperative grammar fit the input. Earlier work on speechparsing (Bg) also tried to use ATNs in an island-driven mode. and stupid to its users, who will tend not to use it if they A more recent development in robust parsing by Carbonell "ith", have that choice or to use it with a high level of frustration. and Hayes (56,58) uses a construction-rp..ific approach that Examined below are techniques available to deal with someof fits in well with semantic grammars and caseframe instantiathe above deviations from grammatically in more detail. tion. The basic idea is to tailor parsing strategies to specific Spelling errors are the most common and normally the construction types; this not only results in efficient purrlrrg most easily correctedof all grammatical deviations. of The usual grammatical input but also permits built-in recovery basic approach when a word is found to be outside stratethe vocabugies that exploit the characteristics of the particular lary of a natural-language interface is to compare constructhe word tion type. For instance, the following simpiu r".overy against a set of known words and substitute strategy (or the word works quite well for simple imperative caseframes: words) from that list found to be closestto the unknown word according to some metric and subject to some threshold of Skip over unexpected input until a case marker is closeness.There is not space here to go into the found; methods of parse skipped segmentsagainst unfilled cases,using comparison,but clearly the processwill be made more only efficient semantic constraints. and less prone to error by shortening the list of words against which to comparethe unknown word. For this reason, methods If this strategy is applied to of language analysis, such as semantic grammars and caseframe instantiation, that are able to apily strong top-down transfer Economicsz4T to physlcs slT smith constraints to their recognition are at a rigninrult when it comesto spelling correction.For instance,"drlntage "Economics 247" and "smith" will initially in be skipped over, with "to Physics 3rT" being correctly parsed since ,,to,, is a tarnsfer Jim Smith from Econoics237 too Mathematics Is6 valid case marker. Then the skipped segments will be correctly parsed against the unfilled casesrro,rr.e-course,, and a system basedon case-frameinstantiation such as that exam- "student:'respectively, leading to a parse identical to that for ined above need only compare Econoics against its list of department names rather than against its whole vocabulary. transfer smith from Economics24T to physics sl| This ability is particularly important in the caseof tooin this example. Too is a real word that might well be in the system,s Such methods of robust parsing are under active investigation -outstanding vocabulury, and without the strong prediction that it should be at the moment with the chief problem beiig the a preposition marking a caseof transfer, the system would be coordination of multiple, independent, Ionstruction-specific unable to correct it (a match against the whole vocabulary parsing strategies on the same input. would make too the best match), or even notice that it is misspelled. DiafoguePhenomena Whereas spelling correction can be dealt with at the lexical level, other forms of grammatical deviation require modi fica- In addition to recognizingindividual sentences, the problem of tion to a NLP system's grammatical expectations. interactive communication through natural langulge, The way in be it which this can be accomplisheddiffers markedly by approach. communication between man and machine or communication In pattern matchitg, for instance, the obvious approachis parbetween two people, entails discourse phenomena that trantial pattern matching as attempted in the Flexp system (4g). scendindividual sentences(seeDiscourseunderstanding). Patterns are deemed to match partially if most, but not a1, their elements actually do mat.n tft" input. Clearly, this can Anaphora- Pronouns and other anaphoric references be useful for missing or extra words, but is (words like it, that, or one)refer to concepts not useful in the describedprevicaseof unusual word order. Moreover, in practice, it turns out ously in a dialogue. Anaphoric resolution entails identifythat some elements of a pattern are more important ing the referents of these place-holder words. Interactive than others, and unless allowance is made for these difierences,it is dialogues invite the use of anaphora much more than simdifficult to decide exactly how much of a pattern pler database query situations. Therefore, as natural-lanneeds to match before the pattern as a whole can be declared guage interfaces increase in complexity and to have expand their matched. domain of application, anaphoric resolution becomes an inDealing with grammatical deviation in an ATN-based creasingly important problem. sys_ tem turns out to be extremely difficult. The curyent position in Definite Noun Phrases. Noun phrases often serve the network is a very important piece of state information as anfor other type of anaphoric referenceby referring to previously
NATURAL.LANGUAGE UNDERSTANDING
mentioned concepts,much like the less specific anaphors do. IJsually such phrases are flagged by a definite pronoun (e.g., the). As Grosz (11) noted, resolving the referent of definite noun phrases or any other anaphors often requires an understanding of the planning structure underlying cooperative discourse. Ellipsis. People often use sentencefragments to expressa complete proposition. These terse utterances must be filled out in the context of the dialogue. Sentential-levelellipsis (qv) has long been recogntzedas ubiquitous in discourse. However, semantic ellipsis, where ellipsis occurs through semantically incomplete propositions rather than through syntactically incomplete structures, is also an important phenomenon.The ellipsis resolution method presentedbelow addressesboth kinds of ellipsis. Extragramrrlatical Utterances. Interjections, dropped articles,false starts, misspellings, and other forms of grammatical deviance abound. Developing robust parsing techniques that tolerate errors has been the focus of much recent work (37,56,58-60), as discussedin the preceding section. Metalinguistic Utterances. Intrasentential metalanguage has been investigated to somedegree(61) but its more common intersentential counterpart has received little attention (62). However, utterances about other utterances (e.g., corrections of previous commands,such as "I meant to type X instead" or "f should have said . .") are not infrequent, and an initial stab is being made at this problem (68). Note that it is a cognitively less demanding task for a user to correct a previous utterance than to repeat an explicit sequence of commands (or worse yet, to detect and undo explicitly each and every unwanted consequenceof a mistaken command). Indirect Speech Acts. Occasionally users of natural-language interfaces will resort to indirect speechacts (qv) (6466), especially in connectionwith intersentential metalanguage or by stating a desired state of affairs and expectittg the system to supply the sequenceof actions necessaryto achieve that state. Empirical studies suggest that users of natural-language interfaces avail themselvesof discoursephenomenawhenever such deviceshelp in formulating short, succinct linguistic expressionsover lengthier, more explicit ones.This observation is summartzed as follows:
673
by the Rl system (69).Part of the XSEL task is to provide the user with information about DEC components,hence subsuming the databasequery task. However, unlike a pure database query system, &D expert system interface must also interpret commands, understand assertions of new information, and carry out task-oriented dialogues [such as those discussedby Grosz (11)1.XCALIBUR, in particular, deals with commands to modify an order, as well as information requestspertaining to its present task or its databaseof VAX componentparts. In the following example dialogue, user inputs are pr..uded with ((>" a prompt. >what is the largest IlTB0 fuceddisk und,er 940,000? The rp}7-aa is a 516 MB fixed.pack disk that costsg58,000. >The largest under 950,000? The rp}7-aa. >Add two rp}7-aa disks to my ord,er. Line item I addedt (Z rp77-aa) >Add a printer with graphics capabitity fixed or changeablefont? >ftxed font lines per minute? >make it at least 200, upperllowercase. OK. Line item 2 addedt (t txyll-ry) >Tell me about the lxyl l The kylL bilities.
ls a 240-llm line printer with plotting capa-
For details of the XCALIBUR interface, the reader is referred to (70-72). In this entry, only illustrating the caseframe ellipsis resolution method is discussed. The XCALIBUR system handles ellipsis at the case-frame level. Its coverageis a supersetof the LIFER/LADDER system (13,20)and the PLANES (qv) ellipsis module (73).Although it handles most of the ellipsed utterances we encountered,it is not meant to be a general linguistic solution to the ellipsis phenomenon. The following examples are illustrative of the kind of sentence fragments the current case-frame method handles. For brevity, assume that each sentencefragment occurs immediately following the initial query below. INITIAL QUERY: 'what is the price of the three largest single port fucedmedia disks?" "Speed?" "Two smallest?" "How about the price of the two smallest?" "alEo the smallest with dual ports" "Speed with two ports?" "Disk with two ports."
TersenessPrinciple: Users of natural-language interfaces insist on being as terseas possible, independentof task, communication media, typing ability, or instructions to the contrary, without sacrifi,cingthe flexibility of expressioninherent in natural-language corrlmunicatiora[This principle may be viewed as a surprisingly strong form of Grice's maxim of In the representative examples above, punctuation is of no brevity (67).1 help, and pure syntax is of very limited utility. For instance, the last three phrases are syntactically similar (indeed, the Case Frame Eflipsis Resof ution. In order to illustrate the last two are indistinguishable), but each requires that a differubiquity of ellipsis in interactive dialogues through a naturalent substitution be made on the parse of the preceding query. language interface, look at the 1SALIBUR project, whose obEllipsis is resolved differently in the pr"r"rce or absenceof jective is to provide flexible natural-language access(compre- strong discourse expectations.In the former casethe discourse hension and generation) to the XSEL expert system (6g). expectation rules are tested first, and if they fail to resolve the XSEL, the Digital Equipment Corporation'sautomated sales- sentencefragmant, the contextual substitution rules are tried. man's assistant, advises on selection of appropriate VAX com- If there are not strong discourse expectations,the contextual ponents and producesa salesorder for automatic configuration substitution rules are invoked directly.
674
NATURAT-IANGUACEUNDERSTANDING
An exemplary discourse expectation rule follows: 'fhe
IF:
system generated a query for confirmation or disconfirmation of a proposedvalue of a filler of a case in a case frame in focus,
THEN:
EXPECT one or more of the following: 1) A confirmation or disconfirmation pattern. 2) A different but semantically permissible filler of the case frame in question (optionally repeating the attribute or providing the case marker). 3) A comparative or evaluative pattern. 4) A query for possible fillers or constraints on possible fillers ofthe case in question. [If this expectation is confirmed, a subdialogue is entered, where previously focused entities remain in focus.l
The following dialogue fragment, presented without further commentary, illustrates how these expectations come into play in a focused dialogue: >Add a line printer with graphics capabilities. Is 150 lines per minute acceptable? >No, 320 is better Erpectations 1, 2, & g (or) other options for the speed,? Expectation 4 (or) Too slnw, try 300 or faster Expectations 2 & J The utterance "try 300 or faster" is syntactically a complete sentence, but semantically it is just as fragmentary as the previous utterances. The strong discourse expectations, however, suggest that it be processedin the same manner as syntactically incomplete utterances since it satisfies the expectations ofthe interactive task. The terseness principle operates at all levels: syntactic, semantic, and pragmatic. The contextual substitution rules exploit the case-frame representation of queries and commands discussedin the previous section. The scopeof these rules, however, is limited to the last user interaction of appropriate type in the dialogue focus, as illustrated below. The rules search the ellipsed fragment for casefillers (or casemarker and filler pairs) to substi. tute for corresponding casesin the parse ofthe previous input. Substitution can occur at a top-level (sentential) caseframe or in embedded (relative-clause or noun phrase) case frames. >What is the size of the 3 largest single-port fiaed-media disks? >And the price and speed? and >What is th.esize of the 3 largest single-port fi.tced-media disks? >disks with two ports? Note that it is impossible to resolve this kind of ellipsis in a general manner ifthe previous query is stored verbaiim or as a semantic-grammar parse tree. "Disks with two ports,,would be best correspond to some (disk-descriptor) nonterminal and hence, according to the LIFER algorithm (19,20), would replace the entire phrase "single-port fixed-media disks,' that correspondedto (disk-descriptor) in the parse of the original
query. However, an informal poll of potential users suggests that the preferred interpretation of the ellipsis retains the previous information in the original query. The ellipsis resolution process, therefore, requires a finer grain substitution method than simply inserting the highest level nonterminals in the ellipsed input in place of the matching nonterminals in the parse tree of the previous utterance. Taking advantage of the factthat a caseframe analysis of a sentenceor object description captures the meaningful semantic relations among is constituents in a canonical manner, a partially instantiated nominal caseframe can be merged with the previous case frame as follows: Substitute any casesinstantiated in the original query that the ellipsis specifically overrides. For instance "with two ports" overrides "single port" in our example, as both entail different values of the same case filler regardless of their different syntactic roles. ("Single port', in the original query is an adjectival construction, whereas "with two ports,, is a postnominal modifier in the ellipsed fragment.) Retain any casesin the original parse that are not explicitly contradicted by new information in the ellipsed fragment. For instance, "fixed media" is retained as part of the disk description,as are all the sentential-levelcasesin the original query, such as the quantity specifier and the projection attribute of the query (,,size,'). Add casesof a caseframe in the query that are not instantiated therein but are specifiedin the ellipsed fragment. For instance, the "fixed-head" descriptor is added as the media caseof the disk nominal caseframe in resolving the ellipsed fragment in the following example: >which disks are configurable on a vAX 1I -Ts\? >Ary configurablefixed-headdisks? In the event that a new case frame is mentioned in the ellipsed fragment, wholesalesubstitution occurs,much like in the semantic grammar approach. For instance, if after the last example one were to ask "How about tape drives?" the substitution would replace "fixed-head disks" with "tape drives" rather than replacing only "disks" and producing the phrase "fixed-head tape drives," which is meaningless in the current domain. In these instances of wholesale context switch the semantic relations captured in a caseframe representation and not in a semantic grammar parse tree prove immaterial. The key to case-frameellipsis resolution is matching corresponding cases rather than surface strings, syntactic structures, or noncanonical representations.Although correctly instantiating a sentential or nominal case frame in the parsing processrequires semantic knowledge, some of which can be rather domain specific,once the parse is attained, the resultittg canonical representation, encoding appropriate semantic relations, can and should be eploited to provide the system with additional functionality such as the present ellipsis resolution method. For more details and examplesof the rules that perform case-frame substitution, see the XCALIBUR report (7L). More Complex Phenomena.In addition to ellipsis and anaphora, there are more complex phenomena that must be addressedif one is to understand and simulate human dis-
UNDERSTANDING NATURAL.LANCUAGE
course. This type of deeper understanding has not yet been incorporated into practical natural-language interfaces. However, as natural-Ianguage interfaces increasein sophistication (as they surely willl tnr.e more complex phenomenarequire attention, so, as the final topic of this entry, someexamplesof these more esoteric discoursephenomena are discussed. Goal DeterminationInference.The interpretation of an utterance may dependon the inferred conversationalgoals of the speaker. Consider the following set of examples,in which the same utterance spoken in somewhat different contexts elicits radically different responses.These responsesdepend on the interpretation of the initial utterance, in which the attribution of goals to the speaker plays a dominant role. Passer-by:Do you know how to get to Elm Street? Person on the street corner: Walk toward that tall building and, EIm Street is the fr.fth or sixth on your left. The passer-by'squestion was quite naturally interpreted as an indirect speech act, since the information sought (and given) was not whether the knowledge of getting to Elm Street was present but rather how actually to get there. Lest the mislaken impression be given that it is a simple matter to identify indirect speech acts computationally, consider the following variant to the examPle: Passer-by:Do you know how to get to Elm Street? person reading a street map and holding an envelope with an Elm Street addresson it: No, I hauen't found it; could you help me? In the secondexample, the listener infers that the goal of the passer-byis to render assistance,and therefore the initial utl*rurrre is interpreted as a direct query of the knowledge state of the listener in order to know whether assistanceis required. Hence,the passer-by'squestion is not an indirect speechact in this example. Nor is the task of the interpreter of such utterances only to extract a binary decision on the presenee or absence of a speechact from goal expectations.The selectionof which indirect speechact is meant often rests on contextual attribution of different goals to the speaker. Consider, for instance, the following contextual variant of our previous example:
Example
original example Map reader Cabbie example
675
Speech act
Indirect information request Direct information request Indirect action request
Socia/ Role Constraints.The relative social roles of the dicourse participants affect their interpretation of utterances as illustrated below: Army General; I want a juicy Hamburger. Aide: Yes sir! Child: I want a juicY Hamburger. Mother Not today, perhaps tomoryowfor lunch. Prisoner 1: I want a juicy hamburgerPrisoner 2: Yeah, me too.all the food here tasteslike cardboard. Clearly, the interpretation of the sentence "I want a juicy hambu rgef' differs in each example with no context present beyond tft" differing social roles of the partiipants and their consequent potential for action. In the first example a direct order is inferred, in the seconda request, and in the third o nly a general assertion of a (presumably unattainable) goal. Therefore,comprehendinga dialogue rests critically on knowledge of social roles (74, 75).Moreover, social role constraints provide part of the setting essential in making goal attribulionr and therefore impinge (albeit indirectly) on goal determination inferences discussedin the previous section. In unconstrained discourse there is strong interaction between goal expectations,social role constraints, indirect speechacts, and metalanguage utterance interpretation. Conclusion
This entry has presented a brief overview of the current state of the art of NLP-the processof developingcomputer systems that communicate with their users through natural language. The computational approach to NLP differs from the more general open-endedapproach to natural langu age in linguisii.r and cognitive psychology.As shown above,practical natural-language interfaces can currently be constructed to perform limited tasks within restricted domains, and the various techniques that have been employed to construct such interPasser-by:Do you know how to get to Elm Street? faces have been examined and compared. Further details on Waiting cabbie:Sure, hop in. How far up Elm Street are you any of the systems or techniques describedcan, of course,be going? obtained by following the large set of referencesprovided. A reader with desire for further general information may be parIn this example, the cabbie interpreted the goal of the passer- ticularly interested in Refs.76-78, and a reader with a desire by as wanting aride to an Elm Street location. Making sure to see some implementation details of systems illustrative of the cabbie knows the destination is merely instrumental to the cognitive simulation approachmay wish to look at Ref. 53, the inferred goal. The social relation between a cabbie and a which includes unusually complete descriptions of a small (potential) customer is largely responsible for triggering the number of NLP systems (see also Ref. 79). goal attribution. Thus, the passer-by'sutterance in this example is also interpreted as an indirect speechact, but a different one from the frst example (i.e., wanting to be driven to the destination vs. wanting to know how to navigate to the desti- BIBLIOGRAPHY nation). In summary, three totally different speechacts (qv) 1. N. Chomsky,SyntacticStructures.Mouton,The Hague,1957' are attributed to identical utterances as a function of different of the Theoryof Syntax,MIT Press,Cam2. N. Chomsky,Aspects goals inferred from contextual information (for additional dis1965. MA bridge, cussion of goal determination inferences in discoursecompre3. S. R. Petrick, A RecognitionProcedurefor Transformational hension see Refs. 4L, 65, and 74)
676
NATURAT-IANGUAGE UNDERSTANDING
Grammars. Ph.D. Thesis, Department of Modern Languag€s, MIT, Cambridge,MA, 196b.
26. S. L. Small and C. Rieger, Parsing and Comprehendingwith Word Experts (A Theory and its Realization), in M. Ringle and W. 4. J. R. Anderson, Language, Memory, and, Thought. Lawrence Lehnert (ed.), Strategies for Natural Language Proiessirg, LaErlbaum, Hillsdale, NJ, Lg7G. wrence Erlbaum, Hillsdale, NJ, 1gg2 pp. gg_I47. 5. E. C. Charniak, Toward a Model of Children's Story Comprehen- 27. S. Small, G. Cotrell, and L. Shastri, Toward Connectionist Parssion, TR-266, MIT AI Lab, Cambridge,MA, 1972. itg, Proceedingsof the SecondNational Meeting of the AAAI, University of Pittsburgh, Pittsburgh, pA, pp.247-280, August 1gg2. 6. R. C. Schank, Conceptual Information Processing,Amsterdam, North-Holland, L975. 28. G. Dejong, Skimming Stories in Real-Time. Ph.D. Thesis, Computer science Department, yale university, New Haven, cr, 7. R. Cullingford, Script Application: Computer Understanding of 1979. Newspaper Stories, Ph,D. Thesis, Computer ScienceDepartment, Yale University, New Haven, CT, 1928. 29. R. C. Schank, M. Lebowitz, and L. Birnbaum, "An integrated understander,"Am. J. computat. Ling.6(1), 18-80 (1gg0). 8. J. G. Carbonell, Subjective Understanding: Computer Models of Belief Systems,Ph.D. Thesis, Yale University, New Haven, CT, 30. K. M. Colby, Simulations of Belief Systems,in R. C. Schank and 1979. K. M. Colby (eds.),Computer Models of Thought and,Language, Freeman, San Francisco, pp. 2EL-296, Ig7g. 9. P. R. Cohen and C. R. Perrault, "Elements of a plan-basedtheory of speechacts," Cog.Sci. 3, L77-2L2 (1979). 31. Y. A. Wilks, "PreferenceSemantics,"in Keenan (ed.), Format Semantics of Natural Language, Cambridge University Press,Cam10. J. F. Allen, A Plan Based Approach to SpeechAct Recognition, bridge, UK, L975. Ph.D. Thesis, University of Toronto, Lg7g. 32. J. Earley, "An efficient context-free parsing algorithm,,, CACM 11. B. J. Grosz,The Representationand Use of Focus in a System for r3(2),94-LA2 (1970). Understanding Dialogues. Proceedings of the Fifth International Joint Conferenceon Artifi.cial Intelligence, Cambridge, MA, pp. 33. M. Tomita, Effi'cientParsing for Natural Language, Kluwer Aca67-76, L977. demic Publishers,Boston, MA, 1980. 12. C. L. Sidner, Towards a Computational Theory of Definite Anaph- 34. G. Gazdar, Phrase Structure Grammars and Natural Langu&g€, Proceedingsof the Eighth International Joint Conferenceon Artifiora Comprehension in English Discourse, TR- 537, MIT AI Lab, Cambridg", MA, 1979. cial Intelligence, Karlsruhe, FRG, pp. bb6-b65 August 19g8. 13. G. G. Hendrix, Human Engineering for Applied Natural Lan- 35. D. G. Bobrow and J. B. Fraser, An Augmented State Transition guage Processitg, Proceedings of the Fifth Interna,tional Joint Network Analysis Procedure, Proceedings of the First International Joint Conferenceon Artifi,cial Intelligence,Washington, DC, Conferenceon Artificial Intelligence, Cambridg", MA, pp. 189pp. 557-567, 1969. 1 9 1 ,1 9 7 7 . L4. B. J. Grosz,TEAM: A Transportable Natural Language Interface 36. w. A. Woods, R. M. Kaplan, and B. Nash-Webber,The Lunar sciencesLanguage system, Final Report, 2}78,Bolt, Beranek, and System, Proceedingsof the Conferenceon Applied Natural Language Processing,Santa Monica, CA, February 1988. Newman, Cambridge, MA, 1972. 15. S. J. Kaplan, CooperativeResponsesfrom a Portable Natural Lan37. R. M. Weischedel,and J. Black, "Respondingto potentially unparseablesentences,"Am. J. Computat.Ling. 6, 97-L0g (19s0). guage Data Base Query System, Ph.D. Thesis, Department of Computer and Information Science,University of Pennsylvania, 38. w. A. woods, w. M. Bates,G. Browh, B. Bruce, c. cook, J. KlovPhiladelphia, L979. stad, J. Makhoul, B. Nash-webber, R. schwartz, J. wolf, and v. 16. J. S. Brown and R. R. Burton, Multiple Representationsof KnowlZue, Speech Understanding Systems, Final Technical Report 3438, Bolt, Beranek, and Newman, cambridge, MA, 1976. edgefor Tutorial Reasonirg,in D. G. Bobrow and A. Collins (eds.), Representationand Understanding, Academic Press, New York, 39. R. M. Kaplan, A General Syntactic Processor,in R. Rustin (ed.), pp. 311-349, 1975. Natural Language Processing,Algorithmics, New york, pp. 19824r, L973. L7. J. R. Carbonell, Mixed-Initiatiue Man-Computer Dialogues, Bolt, Beranek, and Newman, Cambridge,MA, 1971. 40. M. Kay, The MIND System, in R. Rustin (ed.),Natural Language Processing,Algorithmics, New York, pp. 1bb-188, 1gTB. 18. J. G. Carbonell,W. M. Boggs,M.L. Mauldin, and P. G. Anick, The XCALIBUR Project, A Natural Language Interface to Expert Sys- 4L. R. Frederking, A Rule-BasedConversation Participant, Proceed,tems, Proceedingsof the Eighth International Joint Conferenceon ings of the 19th Meeting of the Associationfor ComputationalLinArtificial Intelligence,Karlsrube, FRG, 1988,pp. 6b3-6b6. guistics, Stamford, CT, ACL-81, 1981. 19. J. R. Searle, Speech Acts, Cambridge University press, Cam- 42. R. J. Bobrow, The RUS System,BBN Report 9828,Bolt, Beranek, bridge, UK, 1969. and Newman, Cambridge,MA, lg78. 20. E. D. Sacerdoti,Language Accessto Distributed Data with Error 43. P. J. Hayes and G. V. Mouradian, "Flexible parsirg," Am. J. ComRecovery,Proceedings of the Fifth International Joint Conference putat. Ling. 7(4), 232-241 (1981). on Artificial Intelligence, Cambridge, MA, pp. 196-202, Lg77. 44. P. J. Hayes and D. R. Reddy, "Stepstoward graceful interaction in 2I. J. Weizenbaum, *ELIZA-A computer program for the study of spoken and written man-machine communication." Int. J. Mannatural language communication between man and machine, Mach. Stud. 19(B),2LL-294 (Septemberlgg3). CACM 9(1), 36-45 (January 1966). 45. R. R. Burton, Semantic Grammar: An Engineering Techniquefor 22. R. C. Parkisotr, K. M. Colby, and w. S. Faught, "Conversational Constructing Natural Longuage (Jnderstanding Systems, BBN language comprehensionusing integrated pattern-matching and Report 3453, Bolt, Beranek, and Newman, cambridge, MA, Deparsing," Artif , Intell. 9, 111-134 (1977). cember1976. 23. W. A. Woods, "Transition network grammars for natural lan- 46. J. C. Carbonell and P. H. Hayes, Robust Parsing Using Multiple guage analysis," CACM 13(10),591-606 (October1970). Construction-SpecificStrategies, in L. Bolc (ed.;, Natural Language Parsing Systems,Springer-Verlug, New York, 198b. 24. C. R. Riesbeck and R. C. Schank, Comprehensionby Computer: Expectation-Based Analysis of Sentencesin Context. 78, Com- 47. J. G. Carbonell, Towards.a Robust, Task-OrientedNatural Lanputer Science Department, Yale University, New Haven, CT, guage Interface, Workshop/Symposium on Human Computer InL976. teraction, Georgia Technical Information Sciences,March 1981. 25. M. A. Marcus, A Theory of SyntacticRecognitionfor Natural Lan- 48. J. G. Carbonell, Robust Man-Machine Communication, Lfser guage, MIT Press, Cambridge, MA, 1980. Modelling and Natural Language Interface Design, in S. Andriole
NOAH
677
10, Ellis Horwood, and Wiley, New York, Chichester, U'K" pp' (ed.), Applications in Artificiat Intelligence' Petrocelli, Boston' 325-337, L982. MA, 1985. (eds'), Harms R. McDermott Rl:A Rule-Based Configurer of Computer Systems, 69. J. and Bach E. in 49. c. Fillmore, The case for case, Carnegie-Mellon University Computer Science Department, (Jniuersals in Linguistic Theory, Holt, Rinehart, and winston, Pittsburgh, PA' 1980. New York, PP. 1-90, 1968' use unq G. Carbonell, J. H. Larkin, and F. Reif, Towards a General computation 70. J. Their b0. R. F. simmons, semantic Networks: K' M' and Reasoning Engine, Carnegie-Mellon University ComSchank, C. Scientific R. in sentences, English for Understanding FreeDepartment, Pittsburgh, PA' CIP #445, L983' puter Language, Science and Thought colby (eds.),computir Mod,etsof 1973' W. M. Boggs, M. L. Mauldin, and P' G' Anick, The 63-113, G. Carbonell, ?1. J. Francisco, man, San PP. (ed.),concepA Natural Language Interface to Expert Sys' hoject, Schank XCALIBUR C. R. in Analysis, conceptual 51. c. Riesbeck, pp' 83in S. Andriole (ed'), Applications in ArtifiBases, Data and Amsterdam, tems tual Information Processing,North-Holland, MA, 1985. Boston, Intelli.gence, cial 156, t975. Pragmatics in Task-Oriented Natural underand Discourse Plans Carbonell, J. G. Goals, 72. scripts, Abelson P. R. , and 52. R. G. Schank, of the Twenty-First Annual Proceedings Interfaces, 1977. Language NJ, standing, Lawrence Erlbaum, Hillsdale, Linguistics, CamComputational Association the of Meeting for La' Understanding, 53. R. Schank and C. Riesbeck,Inside Computer 1983. ACL-83, MA, bridge, wrence Erlbaum, Hillsdale, NJ, 1980' Address: ?3. D. L. Waltz and A. B' Goodman, Writing a Natural Language 54. R. C. Schank and J. G" Carbonell, Re The Gettysburgh (ed.), Data Base System, Proceedings of th.e Fifth /JCAI, Cambridge' Aspolitical Findler v. N. in Acts, Representing social and MA, pp. 14l-15O, 1977. L979' sociatiue Networks,Academic Press, New York, 327-362' Carbonell, Subjectiue und.erstanding: Computer Mod'els of Be' J. Interac- ?4. 55. P. J. Hayes, A Construction specific Approach to Focused Systems,UMI Research Ann Arbor, MI' 1981. Iicf proceedings Annual parsing Nineteenth of the , tion in Flexible Stanford B. J. Grosz, Utterance and Objective: Issues in Natural Language Linguistics, ?5. Computa,tional Meeting of the Associationfor Communicatio n, Proceedingsof the Si,xthInterrntional Joint ConUniversitY, PP. 149-Il2,June 1981' ferenceon Artificinl Intelli'gence, pp. 1067-1076' 1979' b6. p. J. Hayes and J. G. carbonell, Multi-strategy constructionProUpdate, and 76. E. Charniak and Y. Wilks (eds.), Compuational Semantics, Base Data Query Specificiarsing for Flexible Artificial on North-Holland. Amsterdam, 1976' Conference Joint International Seuenth the of ceed,ings pp' 432Vancouver' Columbia, ??. R. G. Schank and K. M, Colby (eds.),CornputerModels of Thought Intettigence,University of British and. Language, Freeman, San Francisco, 1973. 1981. August 439, for ?8. T. Winograd, Language as a Cognitive Process, Yol. l, Synta't' techniques "Relaxation sondheimer, K. N. and Kwasny b7. s. c. unAddison Wesley, Reading, MA, 1982. parsing grammatically ill-formed input in natural language (1981)' 99-108 (May Ling.7(2) Computat. J. A*. s,'; system derstanding ?9. "Teaching Computers Plain English," Ht'ch TechnoLogy,tG in selection strategy Dynamic Hayes, J. P. and 1986). 58. J. G. Carbonell of Flexible parsin g, proceed.ingsof the NineteenthAnnual Meeting UniverStanford Linguistics, Computational the Association for Jurr,rr G. C$noNELL and Ptutp J. Hlvps sity,Stanford,CA,pp'143-I47,June1981' Carnegie-Mellon University and Carnegie Group Inc' parsing and its bg. p. J. Hayes and J. G. carbonel, Multi-strategy Role in Robust Man-Machine Communication,CMU-CS-81-118' This research was sponsored in part by the Defense Advanced Reby carnegie-Mellon university computer science Department, search Projects Agency OOD), ARPA Order No' 3597, monitored F33615-81-K-1539' contract under Laboratory 1981' Avionics MaY Air Force the Pittsburgh, PA, and Ex- and in part by the Air Force Office ofScientific Researchunder Con60. s. c. Kwasny and N. K. sondheimer, ungrammaticality The views and conclusions contained in this tragrammaticality in Naturar Language understanding systems, tractF4g620-79-c-0143. proceed.ingsof thi SeuenteenthMeeting of the Association for Com' document are those of the authors and should not be interpreted as expressed or implied, of putationol Linguistics,San Diego, CA, ACL-?9, pp. L9-23, L979' representing the official policies, either or the U'S' govResearch, of Scientific Office Force Air the DARPA, 61. J. R. Ross,"Metalinguistic Anaph ora,"Ling. Inq. L(2),273 o970)' Pre- ernment. 62. J. G. Carbonell, Interpreting Meta-Langu&ge Utterances. L'Orpar Naturel Language prints of the workshop: L',Analyzedu dinateur, Cadarache,France, 1982' Learning' Processing NEAR-MISS ANAIYSIS. See Concept learning; 63. P. J. Hayes and J. G. Carbonell, A Framework for Eighth Corrections in Task-Oriented Dialogs. Proceedings of the KarlsInternational Joint Conferenceon Artificial Intelligence, NOAH 1983. ruhe, FRG, Earl 64. J. F. Allen and C. R. Perrault, "Analyzing intention in utterA hierarchical planner (qv) developed around 1975 by to nets procedural uses NOAH ances,"Art. Intell. 15(3), I43-L78 (1980)' International, SRI at Sacerdoti Plans and for A Structure Basis a (see as Acts speech E. Sacerdoti, plans P. R. Cohen, and Allen, F. represent J. 65. c. R. Perrault, for Understanding Dialog Coherence,Proceedings of the Second Behavior, hechnical Note 109, AI Center, SRI International' Conferenceon Theoretical Issuesin Natural Langua,geProcessing, 1975). Cambridge,MA, 1978. K. S. Anona 60. J. R. Searle, Indirect SpeechActs, P. Cole and J. L. Morgan (eds.), SUNY at Buffalo rn Syntax and Semantics, Vol . 3, Speech Acts, Academic Press, New York, L975. NONMONOTONIC tOGlC. See Reasoning,nonmonotonic. 67. H. P. Grice, ConversationalPostulates,in D. A. Norman and D. E. Rumelhart (eds.), Explorations in Cognition, W. H. Freeman' San Francisco,1975. NONMONOTONIC REASONING.See Belief revision; Theo68. J. McDermott, XSEL: A Computer Salesperson'sAssistant, in J. proving. rem (eds.), vol. Intelligence, Machine Hayes, D. Michie, and Y-H. Pao,
678
NON.VON
NON-VON
referred to as a single instruction stream, multiple d,atastream (SIMD) mode of execution. The name NON-VON refers to a family of massively parallel The current version of the general NON-VON design, how"new generation" computer architectures (1) develop.i at co- ever' provides for a number of LPEs, each capabl" o? broadlumbia University for use in high-performance AI applica- casting an independent stream of instructions to some subtree tions. The NON-VON machine architecture is basedon a very of the active memory tree, as first described in Ref. (2). The large number (many thousands and, ultimately, millions) of LPEs in the general machine are interconnectedusing a high processing elements implemented using specially designed bandwidth, low latency interconnection network. The in.orpocustom integrated circuit chips, each containing a number of ration of a number of communicating LPEs gives the general processingelements.An initial 63-processorprototype, called NON-VON architecture the capacity for *ilttpt, inslruction NON-VON 1, has been operationalat Columbia sinceJanuary stream, multiple data stream (MIMD) and multipte SIMD exe_ 1985. cution, multitasking applications, and multiuser operation. This entry begins with a brief overview of the NON-VON The general NON-VON architecture also includes a Eecond,ary architecture. Performance projections derived through de- processing subsystembased on a bank of "intelligent" disk tailed analysis and simulation are then summartzed,foiappli- drives capable of high-bandwidth parallel transfers between cations in the areas of rule-basedinferencing, computer vision, primary and secondarystorageand of the parallel execution of and knowledge base management. The results of these pro- certain operatorsat the level of the individual disk heads. jections, most of which are basedon benchmarkspropor"d by other researchers, suggest that NON-VON could provide a performance improvement of as much as several orders of Applicationsand PerformanceEvaluation magnitude on such tasks by comparison with a conventional NON-VON's performancehas thus far been evaluated in three sequential machine of comparable hardware cost. The entry AI task areas: concludeswith a conciseexplanation of the basis for NONVON's performanceand cost/performanceadvantagesin these 1. rule-basedinferencing, implemented using the OpSb prosuperficially dissimilar AI task domains. duction system language (seeRule_basedsystems); 2' the performanceof a number of low- and intermediate-level image-understanding(qv) tasks; and NON-VON Architecture 3. the execution of certain "difficult" relational algebraic opCentral to all membersof the NON-VON family is a massively erations having relevance to the manipulation of knowlparallel active memory. The active memory is composedof a edgebases. very large number of simple, area-efficient small frocessing elements (SPEs)that are implemented using custom VLSI cirAn experimental compiler and run time system for the execuits. The most recently fabricated active memory chip con- cution of OPS5on a one-LPE NON-VON has beenwritten and tains eight 8-bit processingelements. Each SPE .o*prlses a tested on an instruction-level simulator (B).In order to predict small local RAM, a modestamount of processinglogic, and an the algorithm's performance when executing real prodrr.tion I lO switch that permits the machine to be dynamically recon- systems, its running time has been calculated bar.a on meafigured to support various forms of interprocessorcommunica- surements obtained by Gupta and Forgy (4) of the static and tion. dynamic characteristics of six actual production systems, In the current version of the general NON-VON machine, which had an averageof 910 inferencerules each.Accordingto the SPEs are configured as a complete binary tree whose these calculations, a NON-VON configuration having approxileaves are also interconnected to form a two-dimensional or- mately the same cost as a VAX LIl780 would executeapproxithogonal mesh. Each node of the active-memory tree, with the mately 903 productions per second.By way of compurirorr, . exception of the leaves and root, is thus connectedto three LISP-based OPS5 interpreter executing the sequential Rete neighboring SPEs, which are called the parent, left child,, and Match algorithm on a VAX lll780 typically fires between 1 right child of the node in question, and each leaf is connected and 5 rules per second,and a Bliss-basedintlrpreter executes to its parent and to its four mesh-adjacent SPEs, which are between 5 and 12 productionsper second. called tts north, south, east, and west neighbors. In addition, In the image-understanding domain, algorithms have been the IIO switches may be dynamically configuredin such a way developed, simulated, and in some casesexecutedon the acas to support "linear neighbor" communication, in which all tual NION-VON 1 machine for image correlation, histogramSPEs are capable of communicating in parallel with their left itg, thresholding, union, intersection, set differerrr.,- conor right neighbors in a particular, predefined linear ordering. nected component labeling, Euler number, area, perimeter, NON-VON programs are not stored within the small RAM center of gravity, eccentricity, the Hough transform (qv), and associated with each SPE but are instead broadcast to the the "moving light display,' problem (5). The results of these active memory by one or more large processing elements comparisons suggestthat NON-VON should offer an increase (LPEs), each based on an off-the-shelf 32-bit microprocessor in performanceof between a factor of 100 and 1000 by comparhaving a significant amount of local RAM. In the simplest ison with a VAX lll780 of approximately the same cost and NON-VON configuration, which was also the first to be imple- should in a number of casesimprove on the best results remented, the entire active memory operates under the control ported in the literature for special-purposevision architecof a single LPE that broadcastsinstructions through a high- tures and other highly parallel machines. speedinterface called the active memory controller for simulAlgorithms for a number of databaseprimitives have been taneous execution by all enabled SPEs.This simple configura- developedfor the NoN-voN machine, including select,protion thus restricts NON-VON's operation to what is oft.rt ject, join, union, intersection, set difference,aggregation,and
NON.VON various statistical operations. To evaluate NoN-voN's applicability to the kinds of databaseoperationsmost relevant to AI applications, a detailed analysis was performed (6) of the ma,hirr.,* projected performance on a set of benchmark queries formulated by Hawthorn and DeWitt (7). This analysis predicted that NON-VON should provide higher performance than any of the fi.ve special-purposedatabase machines evaluated by Hawthorn and Dewitt at approximately the same hardware cost. Although NON-VON's relative cost/performance advantage over specialtzed database machines was modest in the case of relational selection, major advantages were found in the case of those computationally demanding operations that appear to be most relevant to AI applications.
679
NON-VON's strong performanceon any given AI task is probably of less interest than the range of diverse AI tasks that would appear to be efficiently executable within a single machine. It must be noted that there is still insufficient evidence to adequately evaluate the extent to which the NON-VON architecture might serve as the basis for a high-performance "general AI machine." The diversity of AI applications for which NON-VON has been shown to offer significant potential performance and cost/performanceadvantages,however, suggests that some of the essential principles underlying this architecture might point the way toward one possible approach to the ultimate development of such machines (seealso Boltzmann machines; Connection machines; LISP machines).
Sourcesof NON-VON'sAdvantages Different aspects of the NON-VON architecture appear to be responsibleior the machine's advantagesin different problem ur.Lr. It is nonethelesspossible to identify a relatively small number of features, several of which are typically operative in the caseof any single application, to which the machine's advantages may be attributed.
I
The effective exploitation of an unusually high degree of parallelism, which is made possibleby the very fine granularity of the active memory. The extensive use of broadcast communication, high-speed content-addressablematchirg, and other associative processingtechniques.
BIBLIOGRAPHY 1. D. E. Shaw, Organization and Operation of a Massively Parallel Machine, in G. Rabbat (ed.), Computers and Technology,Elsevier North-Holland, Amsterdam, 1985. 2. S. J. Stolfo and D. E. Shaw, DADO: a Tree-structured Machine Architecture for Production Systems, Proceedings of the Second National Conference on Artificial Intelligence, Pittsburgh, PA, 1982. B. B. K. Hillyer and D. E. Shaw, "Execution of OPS5 production systems on a massively parallel machine:' J. Parall. Distr. Comput. 3(2), 236-268 (June 1986). 4. A. Gupta and C. L. Forgy, Measurements on Production Systems, Technical Report, Carnegie-Mellon Computer ScienceDepartment,
The exploitation of other physical and logical interconnection topologiesto support a number of problem-specificcommunication functions.
Pittsburgh, PA, 1983. b. H. A. H. Ibrahim, Image Understanding Algorithms on FineGrained Tree-Structured SIMD Machines, Ph.D. Thesis, Department of Computer Science,Columbia University, New York, October 1984. 6. B. K. Hillyer, D. E. Shaw, and A. Nigam, "NON-VON's performance on certain database benchmarks," IEEE Trans. Software Ens. SE-12(4),577-S83 (April 1986).
The capacity for SIMD, MIMD, and MSIMD execution and for a mixture of synchronous and asynchronousexecution within a single algorithm.
7. p. B. Hawthorn and D. J. DeWitt, "Performance analysis of alternative database machine architectures," IEEE Trans. Software Eng.SE-8(1), 6t-75 (January 1982).
The use of the active memory tree to execute algebraically commutative and associativeoperations (such as sum and maximum) in logarithmic time.
The simplicity and cost-effectivenesswith which the machine .un be implemented using currently available technology.
D. E. Ssaw ColumbiaUniversitY