Series ~ d i t o ~ The Worldwide series in Computer Science has been created to publish textbooks which both address and anticipate the needs of an ever evolving curriculum thereby shaping its for under~rad~lates nnajoring in Computer Science and practitioners Its philoso~hyderives from the convictionthatthediscipline of produce technically skilled engineers who will inevitably face, and possibly invent, radically new technologies throughout their future careers. New media will be used in~ovative~y to support high quality texts written by leaders in the field.
JOHNWILEY & SONS, LTD Chichester New York W e i n h e i ~0 Brisbane Singapore Toronto 0
0
0
0
Copyright 0 2000 by John Wiley & Sons, Ltd, Baffins Lane, Chichester, West Sussex PO19 lUD, England ~ational:01243 779777 International: (+44) 12437797’77 e-mail (for orders and customer service enquiries): cs-books~wi1ey.co.uk Visit our Home Page onhttp://~.wiley.co.uk or ht~://~w.wiley.com 1 Reprinted March 200
All rights reserved.No part of this publicationmay be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mecha~ical,photocopying, recording, scanning or otherwise, except under the terms of the Copyright Designs and Patents Act 1988 or under the terms of a licence UK issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London, W 1POLP, without the permission in writing of the Publisher, with the exception of any ~ a t e ~supplied al specifically for the purpose of being entered and executed on a computer system for exclusive use by the purchaser of the publication. & Sons, Ltd accept any responsibili~or Neither the author nor John Wiley liability for loss or damage occasioned to any person or property t ~ o u g using h the material, instructions, methods or ideas contained herein, or acting re~aining or from actingas a result of such use. The author and publisher expressly disclaim all implied warranties, including mercha~tabilityof fitness for any particular purpose. ~esignationsused by companies todistin~uishtheir products are often claimed as & Sons, Ltd is aware of a claim, the trademarks, In all instances where John Wiley product names appear in capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Other ~
i ~ d~i t ~~rOffices i ay ~
J o b Wiley & Sons, lnc., 605 Third Avenue, NewYork, NY 10158-0012, USA
Weinheim * Brisbane * Singapore * Toronto uing in Pu~lication
A catalogue record for this book is available from the British Library. ISBN 0-471-97697-0
Printed and bound, from authors’ own electronic files, in Great Britain by Biddles Lt& Guildford and King’s Lynn. This book is printedon acid-free paper responsibly manufactured from sustainable forestry, for which at leasttwo trees are planted for each one used for paper prod~ction.
...
x111
3.1
1.2
L .3
Why study compiler construction? 1.1.1Compilerconstruction is verysuccessful 1.1.2Compiler const~ctionhas a wide app~ica~ility 1.1.3 Compilerscontaingenerallyusefulalgorithms A simple traditional modularcompiler/inte~reter abstract syntax tree cture of the demo compiler 1.2.3 Thelanguage for thedemocompiler cal analysis for the demo compiler ax analysis for the demo compiler 1.2.6 Contexthandling for thedemocompiler 1.2.7Code ~ ~ n e r a t i ofor n thedemocompiler 1.2.8 ~ n t ~ ~ r e t a tfor i o nthedemocompiler The structure of a more realistic compiler 1.3.1 The structure
1.4
~ho~-cuts Compiler architectures 1.4.1 width of compiler the 1.4.2
1.S
roperties of a good compiler
1.3.3
1 4 6 8
8 9 9 11 12 13 14 19 19 21 22 22 24 25 25 26 27 29
1.6
l .7 1.
lace and use~ulnessof o ~ t i ~ i z a t i o n s A short historyof compiler 3 945-1 960: code 1960-1975: parsin 1.8.3 1975-resent:codegeneration
33 33 33 and code opti~ization;
1.9 be ~roductionprocess
1.10 1.11
1.1
32 32
ion of the closure a l ~ o r i t ~ ~ he outline code used in this book ~oncl~sion
34 34 35 35 37 38 38 40 44 45 47 47 48 49
*l. .l* .l*
ding the ~ r o g r text a~ ical versus syntactic analysis ressions and regulardesc~ptions
.l* .l.
*l. .l. .l.
I analyzer by hand l analyzer auto~atical1~ table ~ o ~ p r e s s i o n
.I. .1.10 1.1 1
identification of tokens
a
A.12
ro processing andfile inclusion
.l .l3 rom tokens to syntax tree ___ the syntax o classes of parsing methods or detection and error recovery a top- own parser ~ a n u a l l y a top-down parser automatically
52 56 56 57 58 60 61 68 86 93 94 96 98 l 03 109 110 111 115 117 120
2.2.5Creating
a bottom-upparser auto~atically
179
Conclusion 2.3
3.1
3.2
summary Further reading
181
Exercises
185
Attribute grammars 3.1.1 Dependency graphs 3.1.2 Attribute evaluation 3.1.3 Cycle handling 3. 1.4Attributeallocation 3.1.5 Multi-visit attribute grammars 3.1.6Summary of thetypes of attribute grammars 3.1.7 L-attributed grammars 3.1.8 S-attributed grammars 3.1.9Equivalence of L-attributedand S-attributed grammars 3.1.10 Extended grammar notations and attribute grammars 3.1.1 1 Conclusion Manual methods 3.2.1 Threading the AST 3.2.2 Symbolic interpretation 3.2.3 Data-flow equations 3.2.4 Inte~roceduraldata-flow analysis 3.2.5Carryingtheinformationupstream -live analysis 3.2.6Comparingsymbolicinterpretationanddata-flowequations
194 195 200 202 210 217 218 229 230 235 235 237 238 238 239 245 253 260 262 267 269
Conclusion 3.3 summary Further reading Exercises
ei 4.1
4.2
150
Interpretation 4.1.1 Recursive inte~retatio~ 4.1.2 Iterative inte~retation Code generation 4.2. 1 Avoiding code generationaltogether 4.2.2 The starting point 4.2.3 Trivial code generation 4.2.4 Simple code generation
3 84
269 273 274 279 28 1 28 1 285 290 295 296 297 302
4.2.5Codegeneration for basicblocks 4.2.6BURS code generationanddynamic p r o g r a ~ i n g 4.2.7Registerallocation by graphcoloring Supercompilation 4.2.8 of code generationtechniques 4.2.9Evaluation 4.2.10 Debugging of codeoptimizers 4.2.1 l Preprocessingtheintermediatecode 4.2.12 Postprocessingthetarget code 4.2.13 Machinecodegeneration Assemblers, linkers, and loaders 4.3 4.3.1 Assembler design issues 4.3.2 Linker design issues Conclusion 4.4 ~ u ~ m ~ y Further reading Exercises
5. l
Dataallocationwith explicit deallocation Basic memory allocation 5.1.3
Extensible arrays
5.2Dataallocationwithimplicitdeallocation 5.2.1Basicgarbagecollectionalgorithms 5.2.2 Preparing the ground 5.2.3 Reference counting 5.2.4 Mark and scan 5.2.5 Two-space copying 5.2.6 Compaction 5.2.7 Generational garbage collection Conclusion 5.3 S
6. 1
u
~
~
a
~
320 337 357 363 364 365 366 37 1 374 375 378 381 383 383 389 389 396 398 399 403 404 407 407 409 415 420 425 428 429 43 l
Further reading
43 1 434
Exercises
435
Context handling 6.1.1 Identification 6.1.2 Type checking 6.1.3 Conclusion
438 440 44 1 449 460
6.2
Source lan~uagedata rePresentation and handling
io^ types 6.2.6 Array types 6.2.7 Sett es types YPes 6.2.10 Interface types outines and their activation Activation records 6.3. l 6.3.2 Routines 6.3.3 O~erationson routines 6.3.4 Non-nested routines 6.3.5 Nested routines 6.3.6 Lam~dalifting 6.3.7 Iterators and coroutines Code ge~erationfor control flow state~ents
6.2.5
6.3
6.4
6.5
Code en era ti on for ~ o d u l e s 6.5.3 Name generation 6.5.2 odule initialization .3 Code ge~erationfor generics
6.6 Further reading xercises
460 460 461. 46 1 465 466 46'7 470 47 l 47 l 480 48 1 482 485 486 489 49 1 499 500 501 502 512 519
523 524 524 525 527 528 531 532 538
7.1
A short '7.1.1 7.1.2 7.1.3
Offside rule Lists List co~prehension
540 540 54 1 542 543 543 544 545 547
7. 7.3 function a ~ ~ l i c a t i o n 7.4
e translation of lists he trans~atio~ of ~ ~ ~ t tmatching ern he translation of list c o ~ ~ r ~ h e n s i o n he t r a ~ s ~ ~ toifonested n functio~s
7.
7.6 7.7
umulato~transfor~ation
548 549 551 55 l 553 553 553 557 558 560 564 566 568 573 575 575 582 582 584 587 587 587 588 588 588 589 589
7.
7.9
592 593 596 598 598 600 tion model, i n t ~ ~ r e t e ~
8.3.1
~ n i f i ~ ~ tof i osnt ~ ~ t u ~lists, e s ,and sets
601 603 606 606 607 607
8.3.2 The implementation of unification 8.3.3Unification of twounboundvariables Conclusion 8.3.4 8.4The generalimplementationmodel,compiled 8.4. 1 List procedures 8.4.2Compiled clause searchandunification 8.4.3OptimizedclauseselectionintheWAM 8.4.4Implementingthe‘cut’mechanism 8.4.5 ~mple~enting thepredicates a s s e r t and r e t r a c t
609 612 614
8.5Compiled code for unification 8.5.1 ~nificationinstructionsintheWAM 8.5.2Deriving a unificationinstruction by manualpartial evaluation 8.5.3 ~nificationof structuresintheWAM 8.5.4Anoptimization: readwrite mode 8.5.5FurtherunificationoptimizationsintheWAM Conclusion 8.5.6
634 636
Summav Further reading Exercises
9.1 Parallel programing models Sharedvariablesandmonitors 9. l. 1 9.1.2 Message passing models 9.1.3 Object-oriented languages 9.1.4The Linda Tuplespace 9.1.5 ~ata-parallellanguages 9.2 Processes and threads 9.3 Shared variables 9.3.1 Locks 9.3.2 Monitors 9.4 Message passing 9.4.1 Locating the receiver Marshaling 9.4.2 9.4.3Typechecking of messages 9.4.4 Message selection 9.5 Parallel object-oriented languages 9.5.1 Object location 9.5.2 Object migration 9.5.3 Object replication
615 616 619 623 628 630
638 639 646 648 650 650 653 653 656 659 659 66 1 663 664 665 667 668 669 669 67 1 672 672 673 674 674 675 676 677
9.6
Tuple space 9.6.1Avoidingtheoverhead of associativeaddressing ~ i s t ~ b u t implementations ed of the tuple space
9*7
~ u t o ~ a tp~allelization ic 9.7.1 E ~ p l o i t i n ~p ~ a l ~ e lautomatically ism 9.72 Data ~ependencies 9.7.3 Loop transformations 9.7.4Automatic paralleli~ationfor distributed-~emo~ machines
678 679 682 684 685 684 688 690
9.8
~o~clusion
693
Su~rnar~ Further reading
693 695 695
Exercises
A. 1 A.2 A.3 AA
Syntax-dete~nedclassesand semantics-dete~inin~ methods Thesimpleobject-orientedcompiler ~bject-ori~nted parsing Evaluation Exercise
699 499 701 702 708 708 709 720 728
In the 1980s and 1990s, while the world was witnessing therise of the PC and the Internet on the front pages of the daily newspapers, compiler design methods developed with less fanfare, developments seen mainly in the technical journals, and -more importantly -in the compilers that are used to process today’s software. These develop~entswere driven partly by the advent of new p r o g r a ~ ~ i nparadigms, g partly by a better unde~standingof code generation techniques, and partly by the introduction o f faster machines with large amounts of memory. The field of p r o g r a ~ i n glanguageshasgrowntoinclude,besidesthe traditional imperative para dig^, the object-oriented, functional, logical,and ~arallel/distributed paradigms,whichinspirenovelcompilationtechniquesandwhichoftenrequiremore extensive run-time systems than do imperative languages. BURS techniques (Bottom-Up Rewriting Systems) have evolved into very powerful code generation techniques which cope superbly with the complex machine inst~ctionsets of present-day machines. And thespeedandmemorysize of modernmachinesallowcompilationtechniquesand languagefeaturesthatwere unthin~ablebefore.Moderncompilerdesign methods meet these challenges head-on.
Our audience are mature students in oneof their final years, who have at least used a compiler occasionally and given some thought to the concept of compilatio~.When these students leave the u~iversity,they will have to be familiar with language processors for each of the modern ~aradigms,using modern techniques. Although curriculum requirements in many universities may have been lagging behind in this respect, graduates entering the job market cannot afford to ignore these develop~ents. xperience has shown us thata considerable number of techniques traditio~allytaught ... .. (L..
const~ctionarespecialcases imperativ iwork for ues fundamental techniques have
of morefundamentaltechniques.Oftenthese a sentation for activa-
fullydistinguishing seful to the student) nefit the student only in exceptional cases); etails and opti~i~ations;
book, can expectto:
-have obtained __I
a
anding of the concepts of mode^ compiler design and with their practical application; theinng ~onstructionof a language processor ~ i n i m at~dining l period;
for each of the
-be able to read the literature. he first two provide a firm basis; the third provides potential for growth.
The first, co~prisingChapters 1through essing in general; it includes a chapter on in the generated code. The second part, ecific techniques required by the various p r o g r a ~ i n g n the parts of the book are outlined in the table on the four phases of compiler const~ction:~ ~ a Z y s icons, e ~ s y s ~ e ~Chapters s. inthiscolumncoverboththe auf the pertinent software buttendtoemphasize
paradig~scovered in this book; for ct treated by each of thephasesis shown, These iquesonly, all automatictechniqueshavingbeen e and square, with all boxes filled tries are iss sing and that there is no . The top right entries would cover s in the text analysis of logic la~guages,but present text ible enough -and languages similar enough -to thing to be said there, for lack of problems. The would discuss manual and auto~atictechniques
The s t r ~ c t ~of r ethe ~
o
in imperative and objectoriented programs (Chapter 6)
in functional programs (Chapter 7)
in logic programs (Chapter 8)
in parallel/ distributed programs (Chapter 9)
context handling (Chapter 3)
identifier identification
polymorphic type checking
static rule matching
Linda static analysis
synthesis (Chapter 4)
code for whilestatement
marshaling structure code for list comprehension unification
run-time systems (no chapter)
stack
reduction machine
o xv ~
How to do: analysis (Chapter 2)
replication Warren Abstract Machine
for creating run-time systems. Unfortunately there is little or no theory on this subject: run-time systems are still crafted by hand by programmers on an intuitive basis; there is nothing to be said there,for lack of solutions. Chapter 1 introduces the reader to compiler design by examining a simple traditional modular compiler/interpreter in detail. Several high-level aspects of compiler construction are discussed, followed by a short history of compiler construction and an introduction to formal grammars. Chapter 2 treats the analysis phase of a Compiler: the conversion of the program text to an abstract syntax tree. Techniques for lexical analysis, lexical identification of tokens, and syntax analysis are discussed. Chapter 3 covers the second phase of a compiler: context handling. Several methods of context handling are discussed: automated ones using attribute grammars, manual ones using L-attributed and S-attributed grammars, and semi-automated ones using symbolic interpretation and data-flow analysis. Chapter 4 covers the synthesis phase of a compiler, covering both interpretation and code generation. The section on code generation is mainly concerned with machine code generation; the intermediate code required for paradigm-specific constructs is treated in Chapters 6 through 9. Chapter 5 concerns memory management techniques, bothfor use in the compiler and in the generated program.
Chapters 6 through 9 address the special problems in compiling for the various paradigms -imperative, object-oriented, functional, logic, and paralle~distributed.Compilers for imperative and object-oriented programs are similar enough to be treated together in one chapter, Chapter 6. Appendix A discusses a possible but experimental methodof object-oriented compiler const~ction,in which an attempt is made to exploit object-oriented concepts to simplify compiler design. Several subjects in this book are treated in a non-traditional way, and some words of justification may be in order. Lexical analysis is based on the same dotted items that are adi it ion ally reserved for bottom-~psyntax analysis, rather than on Thompson’s NFA cons~ction.We see the dotted item as the essential tool in bottom-up pattern matching, unifying lexical analysis, LR syntax analysis, and bottom-up code generation. The traditional lexical algorithms are just low-level implementations of item manipulation. We consider the different treatment of lexical and syntax analysis to bea historical artifact, Also, the difference betweenthe lexical and the syntaxlevels tends to disappear in modern software. Considerable attention is being paid to attribute g r ~ a r s in, spite of the fact that their impact on compiler design has been limited.Still, they are the only known way of aucontext handling, and we hope that the present treatment will help to lower the thresho~~ of their application. Fun~tionsas first-class data are covered in much greater depth in this book than is usual in compiler design books. After a good start in Algol 60, functions lost much status as manipulatable data in languages like C, Pascal, and Ada, although Ada 95 rehabilitated them somewhat. The implementation of some modern concepts, for example functional and logic languages, iterators, and continuations, however, requires functions to be manipulated as normal data. The fundamental aspects of the implementation are covered in the chapter on imperative and object-oriented languages; specifics are given in the chapters on the various other paradigms. An attempt at justifying the outline code used in this book to specify algorithms can be found in Section 1.11. Additional material, including more answers to exercises, and all diagrams and all code from the book, are available through John Wiley’s Web page.
The book contains far too much materialfor a compiler design courseof 13 lectures of two hours each, as given at OUT university, so a selection has to be made. Depending on the maturity of theaudience,anintroductory,more traditional coursecan be obtainedby including, for example,
Chapter 1; Chapter 2 up to 2.1.7; 2.1.10; 2.1.1 1; 2.2 up to 2.2.4.5; 2.2.5 up to 2.2.5.7; Chapter 3 up to 3.1.2; 3.1.7 up to 3.1.10; 3.2 up to 3.2.2.2; 3.2.3; Chapter 4 up to 4.1; 4.2 up to 4.2.4.3; 4.2.6 upto 4.2.6.4; 4.2.11; Chapter 5 up to 5.1.1.1; 5.2 up to 5.2.4; Chapter 6up to 6.2.3.2; 6.2.4 up to 6.2.10; 6.4 up to 6.4.2.3. A more advanced course would include all of Chapters 1 to 6, excluding Section 3.1. This could be augmentedby one of Chapters 7 to9 and perhaps Appendix A. An advanced course would skip much of the introductory material and concentrate on the parts omitted in the introductory course: Section 3.1 and all of Chapters 5 to 9, plus Appendix A.
We owe many thanks to the following people, who were willing to spend time and effort on reading drafts of ourbookand to supply us withmanyusefulandsometimesvery detailed comments: Mirjam B&er, Raoul Bhoedjang, Wilfred Dittmer, "homer NI. Gil, Ben N. Hasnai,BertHuijben,JacoA.Innthorn,JohnRomein, Tim Ruhl,andthe anonymous reviewers. We thank Ronald Veldemafor the Pentium code segments. We are grateful to Simon Plumtree, Gaynor Redvers-Mutton, Dawn Booth, and Jane Kerr of John Wiley & Sons Ltd, for their help and encouragement in writing this book. Lmbert NleertenskindlyprovidedinformationonanolderABCcompiler,andRalph Griswold on an Icon compiler. We thank the Faculteit Wiskunde en Informatica (now partof the Faculteit der Exacte Wetenschappen) of the Vrije Universiteitfor their support and the use of their equipment.
J.H.
Dick Gmne Henri Ceriel Koen G. Langendoen
[email protected],http://www.cs.vu.nl/-dick
[email protected],http://www.cs.vu.nl/-bal
[email protected],http://www.cs.vu.nl/-ceriel
[email protected],http://pds.twi,tudelft.nl/-koen
~ m ~ t e ~ May ~ a 2000 m,
JavaTMis a tradernarkof Sun ~icrosystems ~irandaTMis a trademark of Research Software Ltd a trademark of ~ i c r o s o fCorporation t ~ S - ~ ~ Sis T M PentiumTMis a trademark of Intel PostScriptTM is a trademark of Adobe Systems Inc SmalltalkT~is a trademark of Xerox Corporation UNIXTMis a trademark of AT&T
Introduction
In its most general form, a compiler is a program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text. This process is called translation, as it would be if the texts were in natural languages. Almost all compilers translate from one input language, the source language, to one output language, the target language, only. One normally expects the source and target language to differ greatly: the source language could be C and the target language might be machine code for the Pentium processor series. The language the conipiler itself is written in is called the implementation language. The main reason why one wants such a translation is that one has hardware on which one can ‘run’ the translated program, or more precisely: have the hardware perform the actions described by the semantics of the program. After all, hardware is the only real source of computing power. Running a translated program often involves feeding it input data in some format, and will probably result in some output data in some other format. The input data can derive from a variety of sources; examples are files, keystrokes, and network packages. Likewise, the output can go to a variety of places; examples are files, monitor screens, and printers. To obtain the translated program, we run a compiler, which is just another program whose input data is a file with the format of a program source text and whose output data is a file with the format of executable code. A subtle point here is that the file containing the executable code is (almost) tacitly converted to a runnable program; on some operating systems this requires some action, for example setting the ‘execute’ attribute. To obtain the compiler, we run another compiler whose input consists of compiler source text and which will produce executable code for it, as it would for any program source text. This process of compiling and running a compiler is depicted in Figure 1.1; that compilers can and do compile compilers sounds more confusing than it is. When the source language is also the implementation language and the source text to be compiled is
I
actually a new version of the compiler itself, the process is called term‘bootstrapping’ is traditionallyattributedto a story of Baronvon Munc~ausen (17I20-1797), although in the original story the baron pulled himself from a swamp by his hair plait, rather thanby his bootstraps (Anony~ous,1840).
Figure 1.1 Compiling and running a compiler.
Compilation does not differ fundamentally from file conversion but it does differ in degree. One clear aspect of compilation is that the input has a property called se~anticsits ‘meaning’ -which must be preserved by the process and which is often less clearly identifiablein a traditional file conversionprogram, for exampleonethatconverts EBCDIC to ASCII. On the other hand, a GIF to JPEG converter has to preserve the visual impression of the picture, which might with some justification be called its semantics. In the final analysis, a compiler is just a giant file conversion program. The compiler can workits magic because of two factors:
-the input
is in a language and consequently has a structure, which is described in the language reference manual; -the semantics of the input is described in termsof and is attached to that structure. These factors enable the compiler to ‘understand’ the program and to collect its se~antics ~ . sametwofactors exist withrespecttothetarget in a s e ~ ~ ~rte ~i rce s e ~ t a t i oThe langua~e.This allows the compiler to refomulate the collected semantics in tems of the
target language. How all this is donein detail is the subject of this book,
Source text Compiler
igure 1.2 Conceptual structure of a compiler.
The part of a compiler that performs the analysisof the source language text is called ,and the part that does the target language synthesis is the b a c ~ - ~ see n~; Figure 1.2. If the compiler has a very clean design, the front-end is totally unaware of the target language and the back-end is totally unaware of the source language: the only thing theyhaveincommon is knowledge of the semantic representation. There are technical reasons why such a strict separation is inefficient, and in practice even the best-structured compilers compromise. The above description i ~ e d i a t e l ysuggests another mode of operation for a compiler: if all required input data are available, the compiler could~ e ~ theo actions r ~ specia different form. The fied by the semantic representation rather than re-express them in code-generating back-end is then replaced by an interpreting back-end, and the whole program is called an i ~ t e r ~ r e t e There r. are several reasons for doing this, some fundamental and some more opportunistic. One fundamental reason is that an interpreter is normally written in a high-level language and will therefore run on most machine types, whereas generated object code will only run on machines of the target type: in other words, portability is increased, Another is that writing an interpreter is much less work than writing a back-end. A third reason for using an interpreter rather than a compiler is that performing the actions straight from the semantic representation allows better error checking and reporting to be done. This is not fundamentally so, but is a consequence of the fact that compilers (front-en~ack-endcombinations)areexpectedtogenerateefficientcode.As a result, most back-ends throw away any information that is not essential to the program execution in order to gain speed; this includes much information that could have been useful in giving good diagnostics, for example source codeline numbers. A fourth reason is the increased security that can be achieved by interpreters; this effect has played an important role in Java’s rise to fame. Again, this increased security is not fundamental since there is no reason why compiled code could not do the same checks an interpreter can. Yet it is considerably easier to convince oneself that an interpreter does not play dirty tricks than that there are no booby traps hidden in binary executable code. It should be pointed out thatthere is no fundame~taldifference between using a cornpiler and using an interpreter. In both cases the program text is processed into an inter-
The original meaning of ‘to compile’ is ‘to select representative material and add it to a collection’; present-day makers of compilation compact discs use the term in its proper meaning. In its early days programming language translation was viewed in the same way: when the input contained for example ‘a. + b’, a prefabricatedcodefragment‘load a in register; add b to register’ was selected and added to the output. A compiler compiled a list of code fragments to be added to the translated program. Today’s compilers, especially those for the non-imperative programming paradigms, often perform much more radical transformations on theinput program. mediate form, which is then interpreted by some interpreting mechanism. In compilation,
-the program processing is considerable;
-the resulting intermediate form, machine-specific binary executable code,is low-level; -the interpreting mechanismis the hardware CPU;and
-program execution is relatively fast.
In interpretation, the program processingis minimal to moderate; theresulting i n t e ~ e d i a t ef o m , somesystem-specific medium-level; the interpreting mechanismis a (software) program; and program execution is relatively slow.
data structure, is high-to
These relationships are summarized graphically in Figure 1.3. Section 4.2.3 shows how a fairly smooth shift from interpreter to compiler can be made. After considering the question of why one should study compiler construction (Section 1.1) we will address some general issues in compiler construction, including compiler architecture (Sections 1.2 to IS), retargetability (Section 14, and the place of optimizations (Section 1.7). This is followed by a short history of compiler construction (Section 1.8). Nextaretwomoretheoreticalsubjects:anintroductionto context-free g r a ~ a r s (Section l.9), and a general closure algorithm (Section lSO). A brief explanation of the pseudo-code used in the book (Sectionl. l 1) concludes this in~oductorychapter. Occasionally, the structure of the text will be summarized ina ‘roadmap’ like the one below.
There are a number of objective reasons why studying compiler construction idea:
is a good
Compilation
1-11.
t"""""" " " " _ " " " "7" " _ preprocessing
'b
processing Inte~ret~tion
F i ~ 1~.3rComparison ~ of a compiler and an interpreter.
Roadrnap l Introduction 1.1 Why study compiler construction? 1.2 A simple traditional modularc o ~ p i l e r / i n t e ~ r e t e r 1.3 The structureof a more realistic compiler 1.4 Compiler architectures 1S-l .7 Properties of a good compiler 1.8 A short history of compiler construction 1.9 Grammars 1.10 Closure algorithms 1.1 l The outline code usedin this book
-compiler construction is a very successful branch of computer science, and one of the earliest to earn that predicate;
-given its close relation tofile conversion, it has wider application thanjust compilers;
-it contains many generally useful algorithms ina realistic setting.
We will have a closer look at each of these below. The main subjective reason to study compiler construction is of course plain curiosity: it is fascinating to see how compilers manage to do what they do.
Compiler construction is a very successful branch of computer science. Some of the reasons for this are the proper structuringof the problem, the judicious use of f o ~ a l i s m sand , the use of tools wherever possible,
Compilersanalyzetheir input, construct a semanticrepresentation,andsynthesizetheir output from it. This analysis-synt~esis~ a r ais ~very i powerful ~ ~ and widely applicable. A program for tallying word lengths in a text could for example consist of a front-end which analyzes the text and constructs internally a table of (length, frequency) pairs, and a back-end which then prints this table. Extending this program, one could replace the textanalyzing front-end by a module that collects file sizes in a file system; alte~atively,or additionally, one could replace the back-end by a module that produces a bar graph rather than a printed table; we use the word ‘module’ here to emphasize the exchangeabi~ityof the parts. In total, four programs have already resulted, all centered around the semantic representation and each reusing lots of code from the others. Likewise, without the strict separation of analysis and synthesis phases, programming languages and compiler construction would not be where they are today. Without it, each new language would requirea completely new set of compilers for all interesting machines -or die for lack of support. With it, a new front-end for that language suffices, to be combinedwiththeexistingback-ends for the current machines: for I, languagesand M machines, L front-ends and M back-ends are needed, requiring I,+M modules, rather than LxM programs. See Figure 1.4. It should be noted immediately, however, that this strict separation is not completely free of charge. If, for example, a front-end h o w s it is analyzing for a machine with special machine instructions for multi-way jumps, it can probably analyze caselswitch statements so that they can benefit from these machine instructions. Sirnilarly, if a back-end knows it is generating code for a language which has no nested routine declarations,it can generate simpler code for routine calls. Many professional compilers are integrate^ compilers, for oneprogramminglanguageandonemachinearchitecture,using a semantic representation which derives from the source language and which may already containelements of the target machine. Still, the structuring has played and still plays a large role in the rapid introductionof new languages and new machines,
Forsomeparts of compilerconstructionexcellentstandardizedformalismshavebeen developed, which greatly reduce the effort to produce these parts. The best examples are regularexpressionsand context-free g r a ~ a r s used , inlexicalandsyntacticanalysis. Enough theory about these has been developed from the 1960s onwards to fill an entire course, but the practical aspects can be taught and understood without going too deeply into the theory. We will consider these formalisms and their applications in Chapter 2. Attribute g r ~ m a r sare a formalism that can be used for handling the context, the long-distance relations in a program that link, for example, the use of a variable to its declaration. Since attribute g r ~ m a r sarecapable of describingthe full semantics of a
Back-ends for
Front-ends for
l
/
Semantic representation
/
i l '
\
Machine M
Language L,
~
Figure 1.4 Creating compilers forL languages and M machines.
language, their use can be extended to interpretation or code generation, although other techniques are perhaps more usual. There is much theory about them, but they are less wellstandardizedthanregularexpressionsand context-free g r a ~ ~ Attribute s . grammars are covered in Section 3.1. Object code generation for a given machine involvesa lot of nitty-gritty p r o g r a ~ n g whendonemanually,buttheprocesscan be automated, for example by usingpattern matching and dynamic p r o g r a ~ i n gtechniques. Quite a number of formalisms have been designed for the description of machine code, both at the assembly level and at the binary level, but none has gained wide acceptance to date and each compiler writing system has its own version. Automated code generation is treated in Section 42.6.
Once one has the proper formalism in which to describe what a program should do, one cangenerate a programfrom it, using a ~ r o g r age~erator. ~ Examplesarelexical analyzers generated from regular descriptions of the input, parsers generated from grammars (syntax descriptions), and code generators generated from machine descriptions. All these are generally more reliable and easier to debug than their handwritten counterparts; they are often more efficient too. Generating programs rather than writing them by hand has several advantages:
-The input to
a program generator is of a much higher level of abstraction than the handwritten program would be. The programmer needs to specify less, and the tools take responsibility for much error-prone housekeeping. This increases the chances that
the program will be correct. For example, it would be cumbersome to write parse tables by hand. The use of program-generating tools allows increased ~exibilityand modifiability. For example, if during the design phase of a language a small change in the syntax is considered, a handwritten parser would be a major stumbling block to any such change. With a generated parser, one would just change the syntax description and generate a new parser. Pre-canned or tailored code can be added to the generated program, enhancin.g its power athardlyanycost.Forexample, input errorhandling is usually a difficult affairin handwritten parsers; a generated parser can include tailored error correction code with no effort on the partof the programmer. A formal description can sometimes be used to generate more than one type of program. For example, once we have written a grammar for a language with the purpose of generating a parser from it, we may use it to generate a syntax-directed editor, a specialpurpose program text editor that guides and supports the user in editing programs in that language. Insummary,generatedprograms maybeslightlymoreorslightlylessefficientthan handwritten ones, but generating themis so much more efficient than writing them by hand that whenever the possibilityexists, generating a program is almost always to be preferred. The technique of creating compilers by program-generating tools was pioneered by Brooker et al. (1963), and its importance has continually risen. since. Programs that generateparts of a compileraresometimescalled CO ilers, althoughthisisclearly a misnomer. Yet, the term lingers on.
Compiler construction techniques can be and are applied outside compiler construction in its strictest sense. Alternatively, more programming can be considered compiler constructionthanonewouldtraditionallyassume.Examplesarereadingstructured data, rapid introduction of new formats, and generalfile conversion problems. If data has a clear structure it is generally possible to write a grammar for it. Using a parser generator, a parser can then be generated automatically. Such techniques can, for example, be applied to rapidly create ‘read’ routines for HTML, files, PostScript files, etc. This also facilitates the rapid introduction ofnew formats. Examples of file conversion TeX systems that have profited considerably from compiler construction techniques are text fomatters, which convert TeX test to dvi format, and PostScript interpreters, which convert PostScript text to instructionsfor a specific printer,
A third reason to study compiler construction lies in the generally useful data structures andalgorithmscompilerscontain.Examplesarehashing,precomputedtables,thestack mechanism,garbagecollection,dynamicprogramming,andgraph algo~thms. Although each of these can be studied in isolation, itis educationally more valuable and satisfying to do so in a meaningful context,
In this section we will show and discuss a simple demo compiler and interpreter, to introduce the concepts involved and to set the framework for the rest of the book. Turning to Figure 1.2, we see that the heart of a compiler is the semantic representation of the programbeingcompiled. This semanticrepresentationtakestheform of a data structure, called the ‘intermediate code’of the compiler. There are many possibilities for the form of the i n t e ~ e d i a t ecode; two usual choices are linked lists of pseudo-instructions and annotated abstract syntax trees. We will concentrate here on the latter, since the semantics is primarily attached to the syntax tree.
The syntas tree of a program text is a data structure which shows precisely how the various segments of the program text are to be viewed in terms of the grammar. The syntax tree can be obtained through a process called ‘parsing’; inother words, parsing-’ is the process of st~cturinga text according to a given grammar. For this reason, syntax trees are arse trees;we will use the terms interchangeably, witha slight preference for ‘parse tree’ when the emphasis is on the actual parsing. Conversely, parsing is also called syntas ana~ys~s, but this has the problem that there is no corresponding verb ‘to syntaxanalyze’. The parser can be written by hand if the grammar is very small and simple; for larger andlor more complicated grammars it canbe generated by a parser generator. Parser generators are discussed in Chapter2. The exact form of the parse tree as required by the grammar is often not the most convenientone for further processing, so usually a modifiedform of it is used,calledan a~stractsyntas tree, or A Detailedinformationaboutthesemanticscanbeattached ns, which are stored in additional data fields in to the nodes in this tree th the nodes; hence the term tract syntas tree. Since unannotated ASTs are of limited use, ASTs are always more or less annotated in practice, and the abbreviation ‘AST’ is used also for annotated ASTs. Examples of a~notationsaretypeinformation(‘thisassignmentnodeconcerns a Boolean array assignment’) and optimization infor~ation(‘this expression does not contain a function call’). The first kind is related to the semantics as described in the manual, and is used, among other things,for context error checking, The second kind is not related to anything in the manual but may be tant for the code generation phase. The annotaes ofthatnodeand since a noderepresents a tionsin a nodeare also calledthe a grammar symbol, one also says that the grammar symbol has the corresponding attributes. It is the task of the cantext ~ ~ n ~module Z i n to ~ determine and place the annotations or attributes. Figure 1.S shows the expression b* b - 4 *a*c as a parse tree; the grammar used for express i o n is similar to those found in the Pascal, Modula-2, or C manuals:
’ In linguistic and educational contexts, the verb ‘to parse’ also used for the determination word classes: determining that ‘to by’ the word ‘by’ is an adverb and in ‘by the way’ it is is
in go preposition. In computer science the word is used exclusively to refer to syntax analysis.
of a
expression
'
expression
/l\
' *' factor
identifier factor
identifier
' b'
term
' *'
' *'
factor identifier
/l\
term
term
term
-1
term
factor
identifier factor
'b'
' c'
identifier
'4'
Figure 1.5 The expression b*b - 4*a*c as a parse tree.
expression "+ expression ' + l term I expression term term "+ term ' * l factor I term ",l factor I factor factor "+ identifier I constant ' (' expression '1'
I
term
Figure 1.6 shows the same expression as an AST and Figure 1.7 shows it as an annotated AST in which possible type and location information has been added, The location i n f o ~ a t i o nin Figure 1.7 suggests offsets froma stack pointer and allocation in a m a c ~ ~ e register, but the precise nature of the i n f o ~ a t i o nis not important at this point. What is important is that we seea shift in emphasis from syntactic structureto semantic contents. Usually the gram^^ of a p r o g r a ~ ~ i nlanguage g is not s ~ e c i in~ terms i ~ ~ of input charactersbut of input 'tokens'. Examples of input tokensareidentifiers(forexample length or as), strings (IIHello! l f ! @ # l f ) , numbers ( 0 , 123e-5),keywords (begin, real), compound operators (++, :=), separators (;, E), etc. Input tokens may beandsometimesmustbeseparated by white space, which is otherwiseignored. So before feeding the input program text to the parser, it must be divided into tokens. Doing so is the task of the ZexicaZ a~alyzer;the activity itself is someti~escalled 'to tokenize', but the literary valueof this word is doubtful.
Figure 1.6 The expression b*b -
4 *a*c as an AST.
,- type: real I
loc: regl
/\ type:
'b' loc: sp+16
I
*, type: real
/\ /\ loc: reg2
,* type: real type: real ' 'loc: SP-I-24 loc: reg2
type: real real 'b' loc: sp+16
I
type: , realtype: 4 'loc: const
Figure 1.7 The expression b*b
-
real
' a' loc: sp+8
4 *a*c as an annotated AST.
We see that the front-end in Figure 1.2 must at least contain a lexical analyzer, a syntax analyzer (parser), and a context handler, in that order. This leads us to the structure of the demo compiler/inte~retershown in Figure 1.8. The baclc-end allows two intuitively different implementations: a code generator and an inte~reter.Both use the AST, thefirst for generating machine code,the second for performing the implied actions immediately.
Figure 1.8 Structure of the demo c o ~ ~ i l e r / i n t e ~ r ~ t e r .
problems that marks much of compiler writing, we will base our demonstration compiler ~on ~ 1 1 ~ sions with operands of one digit. An ~ t h m e t expression i~ is ‘fully parenthesi~ed’if each operator plus its operands is enclosed in a set of parentheses and no other parentheses occur, This makes parsing almost trivial, since each open parenthesis signals the start of a lower level in the parse tree and each close parenthesis signals the return to the previous, higher level: a fully parenthesized expression can be seen as a linear notation of a parse tree. expression -+digit I ’(‘ expression operator expression’)‘ operator -+ I digit -+ ‘ 0 ‘ I ‘1’ I ’ 2 ’ I ‘3‘ I ‘ 4 ’ I ’5‘ I ‘ 6 ‘ I ’ 7 ’ I ’ 8 ’ I I * ’
’9’
Figure 1.9 Grammar for simple fully parenthesized expressions.
To simplify things even further, we will have only two operators, + and *. On the otherhand, we willallowwhitespace,includingtabsandnewlines,in the input. The g r a ~ a r i ~ F i g u r e l ~ 9 p r o ~ u c e s s u c h f o ~( s5 a+s83) , a n d (2*((3*4)+9)). Even this almost trivial language allows us to demonstrate the basic principles of both compiler and interpreter construction, with the exceptionof context handling: the language just has no context to handle.
#include #include #include
h" llparser. "backend.h" error.h"
/* for type AST-node*/
/* for Process0 */ /* for Error() */
int main(void) { AST-node *icode; if ( !Parse-program Process (icode) ;
1
(&icode) ) Error ("No top-level expression''
;
return 0;
Figure 1.10 Driver for the demo compiler.
Figure 1.10 shows the driver of the compiler/inte~reter,in C. It starts by including the definition of the syntax analyzer, to obtain the definitions of type AST-node and of the routine Parse-program ( ) ,which reads the program and constructs the AS". Next it includes the definition of the back-end, to obtain the definition of the routine Process ( ) ,for which either a code generator or an interpreter can be linked in. It then calls the front-end and, if it succeeds, the back-end. (It should be pointed out that the condensed layout used for the program texts in the following sections is not really favored by any of the authors but is solely intended to keep each program text on a single page. Also, the #include commands for various system routines have been omitted.)
The tokens in our language are ( ) ,+ *,and digit. Intuitively, these are five different tokens, but actually digit consists of ten tokens, for a total of 14. Our intuition is based on the fact that the parser does not care exactly which digit it sees, so as far as the parseris concerned, all digits are one and the same token: they form a token class. On the other hand, the back-endis interested in exactly whichdigit is present in theinput, so we have to preserve the digit after all. We therefore split the information about a token into two parts, the class of the token and its representation. This is reflected in the definition of the type Token-type in Figure 1.1 1,which has two fields, one for the class of the token and one for its representation. For token classes that contain only one token which is also an ASCII character (for example +), theclass is theASCIIvalue of thecharacteritself. The class of digits is DIGIT,which is defined in l e x . h as 257, and the repr field is set to the representation of the digit. The class of the pseudo-token end-of-file is EOF,which is defined as 256; it is useful to treat the endof the file as a genuine token. These numbers over255 are chosen to avoid collisions with any ASCII valuesof single characters. The representation of a token has at least two important uses. First, it is processed in
/* Define class constants*/ /* Values 0-255 are reserved for ASCII characters */ #define #define DIGIT 257
EOF
256
typedef struct {int class; char repr;} Token-type; extern extern
Token-type Token; void get-next-token(void); Figure 1.11 Header file lex.h for the demo lexical analyzer.
one or morephasesaftertheparsertoproducesemanticinformation;examples are a numeric value produced from an integer token, and an identification in some form from an identifiertoken.Second,it is usedinerrormessages,todisplaythe exact form of the token. In this role the representation is useful for all tokens, not just for those that carry semantic information, since it enables any part of the compiler to produce directly the correct printable versionof any token. The representation of a token is usually a string, implemented as a pointer, but in our demo compiler all tokens are single characters, so a field of type char suffices. The imple~entationof the demo lexical analyzer, as shown in Figure 1.12, defines a Token and a procedure get-next-token ( ) . A call to global variable et-next-token ( ) skips possible layout characters (white space) and stores the next singlecharacteras a (class, repr) pairin Token. A globalvariable is approp~ate here, since the corresponding input file is also global. In summary, a stream of tokens can be obtained by calling get-next-t oken () repeatedly.
It is the task of syntax analysis to structure the input into an AST. The g r ~ a inr Figby twosimpleBooleanreadroutines, ure 1.9 is so simplethatthiscanbedone Parse-operator (1 for the non-terminal operator and Parse-expression () for the non-terrninal expression. 0th routines are shown in Figure 1.13 and the driver i tial call to Parse-expression ( ) ,is in Fi of the parser,whichcontainsthe ure 1.14. Each of the routines tries to read the syntactic construct it is named after, using the following strategy. The routine for the non-teminal N tries to read thealte~ativesof N in order. For each alternative A it tries to read its first member A If A is found present, the routine assumes thatA is the correct alternative and it then requires the presenceof the other members of A. This assumption is not always warranted, which is why this parsing method is quite weak. But for the ~ r ~ mofa Figure r 1.9 the assumption holds, If the routine succeeds in reading the syntactic construct in this way, it yields a pointer to the corresponding AST as an output parameter, and returns a 1 for success; the output parameter is implementedas a pointertothelocationwherethe output valuemustbe
~
#include
/* for self check
"1ex.h"
*/ /* PRIVATE */
static int Layout-char(int ch) { switch (ch) { case case /\t': case / \ n / :return l; default: 0; /:
return
1
1
/* PUBLIC */
Token-type Token; void get-next-token (void) { int ch;
/* get a non-layout character: */ do
{
ch = getchar (1 ; if (ch c 0) { Token.class = EOF; Token.repr = '#'; return;
1
} while
(Layout-char
(ch) );
/* classify it: */
if c= ch & & ch c = / g r ) {Token.class = DIGIT;} else {Token.class = ch;}
1
Token.repr = ch;
Figure 1.12 Lexical analyzer for the demo compiler.
stored, a usual technique in C. If the routine fails to find the first member of any alternative of ZV, it does not consume any input, does not setits output parameter, and returns a 0 for failure. And if it ets stuck in the middle it stops with a syntax error message. The C template used for a rule P
-+ A1
A2 ... A,
I
B1
B2
...
I
...
is presented in Figure I. S 5. More detailed code is required if any of Ai,E3i7 ...,is a terminal symbol; see the examples in Figure 1.13. An error in the input is detected when we require a certain syntactic construct andfind it is not there. We then give an error message by calling Error ( ) with an approp~atemessage; this routine does not return and terminates the program, afterd playing the message to theuser, This approach to parsi is called rec~rsive~ e s c~e ~~ r~s ibecause ~ g , a set of routines descend recursively to construct the parse tree. It is a rather weak parsing method and makes for inferior error diagnostics, but is, if applicable at all, very simple to implement. Much stronger parsing methods are discussed in Chapter 2, but recursive descent is suffi-
static int Parse-operator (Operator *oper) { if (Token.class == -I") { *oper = get-next-token(); return
1;
if (Token.class == '*') { *oper = l * ' ; get-next-token(); return
1;
' + I ;
1
1
1
return 0;
static int Parse-expression(Expression **expr-p) { Expression *expr = *exprJ = new-expression();
/* try to parsea digit: */ if (Token.class == DIGIT) {
1
expr->type = 'D'; expr->value = Token.repr get-next-token (1 ; return 1;
/* try to parsea parenthesized expression: */ if (Token. class == (' ) { expr->type = P' ; get-next-token (1 ; if ( ! P a r s e - e x p r e s s i o n ( & e x p r - > l e f t ) ) { Error ("Missing expression") ;
1
if ( Parse-operator (&expr->oper) ) { Error ("Missing operator1') ;
1
if (!Parse-expression(&expr->right)) { Error ( "Missing expression" );
1
if (Token.class != ' 1 ' ) { Error ( "Missing ) ;
1
1
get-next-token ( ) ; return 1;
/* failed on both attempts */
1
free-expression(expr); return0;
3 Parsing routines for the demo compiler.
-
'0';
F
r
#includeI1lex. #include "error. #include"parser
h" h" h"
/* for Error()
/* forselfcheck
*/
*/
/* PRIVATE */ static Expression *new-expression (void) {
1
return
(Expression
*)malloc
(sizeof
(Expression) );
static voidfree-expression(Expression *expr) {free ( (void *)expr) ;} static int Parse-operator(0perator *oper-p); static int Parse-expression(Expression **expr-p); int Parse-program(AST-node **icode-p) { Expression *expr;
/* PUBLIC */
get-next-token (1 ; /* startthelexicalanalyzer if (Parse-expression (&expr) ) { if (Token.class != EOF) { Error ( "Garbage after end of program") ;
*/
1
1
1
*icode-p = expr; return l;
return 0;
Parser environment for the demo compiler.
cient for our present needs. The recursive descent parsing presented here is not to be confusedwiththemuchstronger ~ ~ e ~recursive ~ c ~descent ~ vparsing, e which is discussed he latter is an imple~entationof LL(l) parsing, and includes ets to base decisions on. ory we should have different node typesfor the ASTs of different syntactic const~cts,it is more convenient to group them in broad classes and node type for each of these classes. This is one of the differences between which follows the g r a ~ a faithfully, r and the AST, which serves the conveni ore in particular, in our example all nodes in an expression are of type Expression,and, since we have only expressions, that is the only possibility for the ion, each such node type of AST-node. To differentiate the nodes of type Express contains a type attribute, setwith a characteristic value: 13 for a digit and P for a parenthesi%ed expression. The type attribute tells us how to interpret the fields in the of the node. Such inte~retationis needed in the code generator and the interpreter. header file with the definition of the node t e Express ion is shown in igure l .l4 defines a single The syntax analysis module shown Parse-program ( ) whichtriesto re e program as an exp S re
int P(...) { /* try to parse the alternative A, A2 .., A, */ if (AI(...)) { if ( !A2(...) ) Error ("Missing A2");
...
1
if ( !A,(. ..)) Error (IfMissringAnrr) ; return 1;
/* try to parse the alternative B,B2 if ( B l ( . . . ) ) { (!B2(...))
if
... */
Error(llMissingB 2 1 r ) ;
* * .
1
return 1;
... /* failed to find any alternative Pof*/ return 0;
Figure 1.15 A C template for a g r a ~ a nlle. r
int Operator; typedef struct -expression { type char ; /* 'D' or ,pr */ value; int /* for 'D' */ struct -expression *left, *right;/* for / P f */ /* for / P r */ } Expression; typedefExpressionAST-node; xtern
/* thetopnodeisanExpression
int Parse-pro~r~~(AST-no~e **); igure 1.16 Parser header file for the demo compiler.
n ( ) and, if it succeeds, converts the pointer to the expression to a which it subse~uentlyyields as its output parameter. It also checks deed finished after the expression. results from parsing expression the shows the AST that ) ) . ~ e p ~ n on ~ the i nvalue ~ of the type attribute, a node contains either te or three attributes l e f t , o p e r ,and r i g h t . In the diagram, the nontes have been crossed out in each node.
*
value left
/
OPer
\right
F i g ~1 r.1~7 An AST for the expression (2 * ( (3 *4 ) -I- 9 ) ) .
As mentioned before, there is no context to handle in our simple language. We could have introduced the need for some context handling in the form of a context check by allowing the logical values t and f as additional operands (for true andfklse) and defining -t as Zogical or and * as logical and, The context check would then be that the operands must be either both numeric or both logical. Alternatively, we could have collected o~timization i ~ f o ~ a t i ofor n , example by doing all ~ t h m e t i cthat can be done at compile time, Both would have required code that is very similar to that shown in the code generation and inte~retationsections below. (Also, the optimization proposed above would have made the code ene era ti on and interpretation trivial!)
The code generator receives the AST (actually a pointer to it) and generates code from it for a simple stack machine. This machine hasfour instructions, which work on integers: PUSH n ADD
NULT PRINT
pushestheinteger n onto the stack replacesthetopmosttwoelements bytheir sum replacesthetopmosttwoelements by theirproduct popsthetopelementandprintsitsvalue
The module, which is shown in Figure l. 18, defines one routine ProcesS () with one p~ameter,a pointer to the AST. ts purpose is to emit -to add to the object file code with the same se~anticsas the AST. It first generates code for the expression by calling Code-gen-expression ( ) and then emits a PRINT instruction. When run, the code for the expression will leave its value on the top of the stack where PRINT will find it; at the end of the program run the stack will again be empty (provided the machine started with an empty stack). /* fortypesAST-nodeandExpression /* forselfcheck */ /* PRIVATE */ static void Code-gen-expression(Expression *expr) { switch (expr->type) {
#includeItparser. #include"backend.
h" h"
*/
case 'D' : printf (IIPUSH %d\n" expr->value); break; case 'P' : Code-gen-expression(expr->left); Code-gen-ex~ression(expr->right); switch (oper) { case ' +' : printf ( I 1 A D D \ n f 1 ; ) break; : printf ( l1MULT\n1l ) ; break; case *' I
1
1
1
break;
void Process (AST-node "icode) { Code-gen-expression(icode) ;printf
1
/* PUBLIC */ (l'PR1~T\n1l) ;
Code generation back-end for the demo compiler.
The routine Code-gen-expression ( ) checks the type attribute of its parameter to see if it is a digit node or a p~enthesizedexpression node. In both cases it has to generate code to put the eventual value on the top of the stack. If the input node is a digit node, the routine obtains the value directly from the node and genemtes code to push it onto the stack: it emits a PUSH i ~ s t r ~ c t i oOtherwise ~. the input node is a p~enthesized e ~ ~ r e s s i node; o n the routine first has to generate codefor the left and right operands recursively, and then emit an ADD or MULT instr~ction. When run with the expression ( 2 * ( (3 "4 ) -1-9) ) as input, the compiler that results from co~biningthe above modules produces thefollow in^ code:
PUSH 2 PUSH 3 PUSH 4 MULT PUSH 9 ADD MULT PRINT
The interpreter(seeFigure1.19) is verysimilartothe code generator,Bothperforma depth-first scan of the AST, but where the code generator emits code to have the actions performed by a machine at a later time, the interpreter performs the actions right away. The extra set of braces ({ ...}) after case ' P :is needed because we need two local variables and theC language does not allow declarations in the case parts of a switch statement. /* fortypesAST-nodeandExpression /* forselfcheck */ /* PRIVATE */ static int Interpret-expression(Expressi0n *expr) { switch (expr->type) {
#include"parser. #include"backend.
h" h"
case 'D': return expr->value; break; case 'P' : { int e-left = Interpret-expression(expr->left); int e-right = Interpret-expression(expr->right); switch (expr->oper) { case r + i : return e-left + e-right; case '*': return e-left * e-right;
>1
1
1
break;
void Process(AST-node *icode) { printf ( lt%d\ntl, Interpret-expression
1
/* PUBLIC */
(icode) );
igure 1.19 Interpreter back-end for the demo compiler.
*/
ote that the code generator code (Figure l. 18) and the interpreter code (Figure 1.19) e definition file (called a ‘header file’ in C>,backend. h, shown in ossible because they both implement the same interface: a single rouST-node *) . In Chapter 4 we will see an example of a different type tion 4.1.2) and two other code generators (Sections 4.2.3 and 4.2.3.2), e interface. Another module that imple~entsthe back-end interface be a module that displays the AST graphically. Each of these can be ~ o ~ b i with ~ e dthe lexical and syntax modules, to producea program processor.
extern void Process(AST-node *); Common b ~ c ~ - header e n ~ for code generator and interpreter.
three modules and that the back-end could stay as a single module. It will be
these ~ o d u l e will s be called upon by almost all other modules.
of the modules follows, together with an indication of where the o ate rial is discussed in detail. dule finds the program text file, reads it efficiently, and acters, allowing for different kinds of newlines, escape codes, files, when these are to be included. This function may re~uirecooperationwiththeoperatingsystemontheonehandandwiththe lexical a ~ a l y z on e ~ the other. The l e ~ i c~~~l ~ Z y sle i s isolates tokens in the input stream and d e t e ~ i n e stheir sentation. It can be written by hand or generated from a description of the onally, it may do some limited interpretation on some of the tokens, for an identifier is perhaps actually a macro or a keyword (reserved word). ~ ~ ~ Z y~~~~2~ s i s converts the stream of tokens into an abstract syntax tree yntaxanalyzersconsist of twomodules.The first onereadsthetoken st~eamand calls a function from the second modulefor each syntax construct itrecognizes; in the second module then c o n s t ~ c the t nodes of the AST and link them. advantage that one can replace the AST ene era ti on module to obtain a diffrom the same syntax analyzer, or, alte~ative~y, one can replace the syntax
-
ril IC
opti~i~ation
Lexical
generation symbolic inst~ctions
tokens
file
-
Target
Syntax
-Bfile
symbolic instructions
Machine
Context
generation bit patterns
IC Front-end
Back-end
F i ~ ~ r1.2 e1 Structure of a compiler.
analyzer and obtainthe same typeof AST from a (slightly) different language. The above modules are the subjectof Chapter 2. The co~text~ u n ~ Z i n~g o ~ collects ~ Z econtext information from various places in the program,andannotatesnodeswiththeresults.Examplesare:relatingtype info~ation from declarations to expressions; connecting goto statements to their labels, in imperative languages;decidingwhichroutine calls are local andwhichareremote,in distribut~d languages. These annotations are then usedfor performing context checks or are passed on to subse~uentmodules, for example to aid in code generation. This module is discussed in Chapter 3. The i n t e ~ e ~ i code ~ t e ~ene~ution ~ ~ translates ~ ~ language-s~ecific Z e constructs in more general constructs; these general constructs then constitute the i ,so~etimesabbreviated IC. Deciding what is a language-specific and a more general construct is up to the compiler designer, but usually the choice is not very difficult. One criterion for the level of the i n t e ~ e d i a t ecode is that it shouldbe reasonably s t r a i g h t f o ~ ~todgenerate machine code from it for various machines, as suggested by
igure 1.4. Usually the intermediate code consists almost exclusively of expressions and ow-of-control in.structions. ~ x a m p ~ of e s the tran.s~ations done by the intermediate code generation module are: replacing a while statement by tests, labels, and jumps in imperative languages; inserting code for d e t e ~ i n i n gwhich method to call for an object in lallguages with dynamic bind;re~lacinga Prolog rule by a routine that does the appropriate bac~trackingsearch. In each of these cases an alternative trans~ationwould be a call to a routine in the run-time system, with the appropriate parameters: the Prolog rule could stay in symbolic form and be inte~retedby a run-time routine, a run-time routine could dynamically find the method to be called, and even the while statement could beperfomed by a run-time routine if the test and the body were converted to anonymous subroutines. The intermediate code generation ~ o d u l is e the place where the division of labor between in-line code and the runtime systen~is decided. This module is treated in Chapters 6 through 9, for the imperative, object-oriented, functional, logic, and parallel and distributedp~ogrammingparadigms. The i ~ t e r m e ~ icode ~ t eo ~ t i ~ i z f f~ t i lo o~ d performs ~ l ~ preprocessing on the intemediate code, with the intention of improving the effectiveness of the code generation. module. An exam~leof s~aightforwardpreprocessing is constant folding, in whichall operations in e~pressionswith known simple operands are performed. A more sophisticated example is in-lining, in which carefully chosen calls to some routines are replaced by the bodies of those routines, whileat the same time substituting the parameters. The code ~ e n e r ~ t module i o ~ rewrites the AST into a linear list of target machine instructions, in more or less symbolic forrn. To this end, it selects inst~ctionsfor seg,allocatesregisterstoholddataandarrangesthe instr~ctionsinthe target code ~ ~ t i ~ ~ i ~ o d~ u tconsiders l i~ o ~ the list of symbolic machine instructions and tries to optimize it by replacing sequences of machine inst~ctionsby faster or shorter sequences. It uses target-machine-specific properties. The precise boundaries between i n t e ~ e d i a t ecode optimization, code generation, and de opti~~ization are floating: if the code generation is p ~ i c u l a r l ygood, little taroptimization may be needed or even possible, and constant folding can be done de gene ratio^ or even on the target code. Still, some optimizations fit better in one module than in another, and is it useful to distinguish the above three levels. ine code ~ e n ~ r f f 7t i o~ ~ oconverts ~ ~ lthe~symbolic machine instructions into ing bit patterns. It determines machine addresses of pro~ramcode and data and produces tablesof constants and relocation tables. Z e ~ ~ t ~ ~o du ~ tcombines l e the encoded machine instructions, the The e ~ e c u t f f ~code constant tables, the relocation tables, and the headers, trailers, and other material required by the operating system into an executable code file. The back-end ~ o d u l e are s discussed in Chapter4.
There is one i l ~ p o ~ a n t c o ~ p oof n eancompiler t that is tra~itionallyleft outside compiler s t ~ c t u r epictures:therun-timesystem of thecompiledprograms. Some of theactions required by a running p r o g ~ ~will m be of a eneral, language-dependent andlor machine-
dependenthousekeepingnature;examplesare code for allocatingarrays,manipulatin stackframes,andfindingthepropermethodduringmethodinvocation in anobjectorientedlanguage.Althoughit is quitepossibletogenerate code fragments for these actions wherever they are needed, these fragments are usually very repetitive and it is ofte more convenient compile to them store the result library in modules. These library modules together form the . Some imperative languages nee minimal only a run-time system; 0th logic and distributed langua require ~ n - t i systems ~e of considerable size, containing code for parameter unification, remote procedure call, task scheduling, etc. The parts of the run-time system needed by a specific program can be linked in by the linker when the complete object program is constructed, or even be linked in dynamically when the compiled program is called; object programs and linkers are explained in Section 4.3. If the back-end is an inte~reter,the ~ n - t i m esystem must be incorporatedin it. As an aside it should be pointed out that run-time systems are not only tra~itionally left out of compiler overview pictures like those in Figure 1.8 and Figure 3 21, they are also sol~etimesoverlookedorunderestimatedincompilerconstructionplanning.Given the fact that they may contain such beauties as print f ( ) ,ma lloc ( ) ,and concu~ent task management, over~ookingthem is definitely inadvisable.
It is by no means always necessary to implement all modules of the back-end:
-~ r i t i n gthe modules for generating machine code and executable code can be avoide by using the local assembler, whichis almost always available. be avoided by generating C code from the intermediate code. This exploits the fact that good C compilers are available on virtually any platform, which is why C is sometimes called, half jokingly, ‘The Machine-Independent Assembler’. This is the usual approach taken by compilers for the more advanced paradigms, but it is certainly also recom~endablefor first i~plementationsof compilers for new im~erativeand object-oriented languages.
-Writing the entire back-end can often
The object code produced by the above ‘short-cuts’ is often of good to excellent quality, but the increased compilation time may be a disadvantage. Most C compilers are quite substantialprogramsandcallingthemmaywellcostnoticeabletime;theiravailability may, however, make them worth it.
Compilers can differ considerably with regard to their architecture; unfortunately, terminology to describe the different types is lacking or confusing. Two architectural questions dominatethescene. One isconcernedwiththegranularity of the data thatispasse between the compiler modules: is it bits and pieces or is it the entire program? In other words, how wide is the compiler? The second concerns the flow of control between the compiler modules: whichof the modules is the boss?
A compiler consists of a series of modules that transform, refine, and pass on i n f o ~ a t i o n
ern. ~nformationpasses mainly from the front to the end, from module M, to . Each such consecutive pairof modules defines an interface, and although in the end all information has to pass through all these interfaces, the size of the chunks of on makes a considerable difference to the structureof the comices for the size of the chunks of i n f o ~ a t i o nare the smallest unit that is meaningful between the two modules; and the entire program, This leads to mpilers, neither of which seems to have a name; we will call them ‘narrow’ pilers, respectively. er readsasmallpart of theprogram,typicallyafewtokens, obtained, produces a few bytes of object code if appropriate, discards most of the information about these tokens, and repeats this process until the end of the pro ram text is reached. er reads the entire program and applies a seriesof transformations to ontextual, optimi~ing,code generating, etc.), which eventually result in the desired object code. This object codeis then generally written to a file. It will be clear that a broad compiler needs an amount of memory that is propo~ional to the size of the source program, whichis the reason why this type has always been rather Until the 1980s, a broad compiler was unthinkable, even in academia. A narler needs much less ~ ~ e m o rits y ; memory requirements are still linear in the e source program, but the proportionality constant is much lower since it gathers p e ~ a ~ einformation nt (for example about global variables) at a much slower rate. From a t~eoretical,educational, and design point of view, broad compilers are preferable, since they represe~ta simpler model, more in line with the functional pro~ramming m. A broad compiler consists of a series of function calls (Figure 1.22) whereas a compiler consists of a typically imperative loop (Figure 1.23). In practice, ‘real’ compilersareoftenimplementedasnarrowcompilers. Still, anarrowcompilermay each ave abroadcomponent:it is quitenatural for a C compiler initsentirety,process it, andthendiscardallbutthenfor-
n the future we expect to see more broad compilers and fewer narrow ones. Most of new program~ingparadigmsarealreadybroad,sincetheyoften rs, Since scarcity of memory will be less of a problem in the future, more and more imperative compilers will be broad. Onthe other hand, almost all compiler constr~lctio~ tools have been developed for the narrow model and thus favor it. Also, the for a simple n ~ model ~ is opro~ably ~ better for the task of writing a simple compiler y hand, sinceit requires much less dynamic memory allocation. ~ , that it ce the ‘field of vision’ of a narrow compiler is, well, ~ ~ r rit ois possible age all itstransformations on thefly.Suchcompilersthenwriteapartially version. of the program to disk and, often using a different program, continue ccasionally even more passes e used. Not s u ~ r i s i n ~ lsuch y , a comss (or N-pass) compiler, or a scan (N-scan)compiler. If adistincterms is made, ‘2-scan’ often indicates that the second pass actually original program text, the difference being thatit is now armed with
SET Object code TO Assembly ( Code generation( ( Context check Parse ( Tokenize ( Source code )
1
)
) );
Figure 1.22 Row-of-control structure of a broad compiler.
WHILE NOT
Finished: Read some dataD from the source code; Process D and produce the corresponding object code, if any; Figure 1.23 Flow-of-control structure of a narrow compiler.
i n f o ~ a t i o nextracted during thefirst scan. The major transformations performed by a compilerandshowninFigure 1.21 are ~ ~giving ~ rise s to, the termN - ~ ~c ~ o ms ~ei l ~which r , is of course not the same as an N-pass compiler. Since on a very small machine each phase could very well correspond to onepass, these notions are sometimes confused. Withlargermachines,bettersyntaxanalysistechniquesandsimpler pro~ramming language g r a ~ a r sN-pass , compilers with N>1 are going out of fashion. It turns out that not only compilers but also people like to read their programs in one scan. This observation has led to syntactically stronger p r o g r a ~ i n glanguages, which are co~espondin~ly easier to process. Many algorithms in a compiler use only local information; for these it makes little difference whether the compiler is broad or narrow. Where it does make a difference, we will show the broad method first and then explain the narrow method as anopti~ization,if approp~ate.
each In a broadcompiler,control is not a problem:themodulesruninsequenceand module has full control when it runs, both over the processor and over the dah. A simple 1.22. In a driver can activate the modules in the right order, as already shown in Figure
n a ~ o wcompiler, things are more complicated. While pieces of data are moving forward from module to module, control has to shuttle forward and backward, to activate the proper module at the proper time. We will now examine the flow of control in narrow compilers in more detail. hemodules in acompilerareessentially'filters',readinchunksof info~ation, processing them, and writing the result. Such filters are most easily programed as loops which execute function calls to obtain chunks of information from the previous module and routine calls to write chunksof information to the next module. An exampleof a filter as a main loop is shown in Figure 1.24..
WHILE Obtained input character Ch from previous module: IF Ch = 'a': // See if there is another 'a': IF Obtained input character Chl from previous module: IF Chl = 'a': // We have 'aa': Output character 'b' to next module; ELSE Chl /= 'a' : Output character 'a' to next module; Output character Chl to next module; ELSE Chl not obtained: Output character 'ar to next module; EXIT WHILE; ELSE Ch /= 'ar: Output character Ch to next module; The filter aa
-+ b as a main loop.
t describes a simple filter which copies input characters to the output while replacing quence aa by b; the filter is representative of, but of course much simpler than, the
kind of tr~nsformationsperformedbyanactualcompilermodule. The readermay nevertheless be surprised at the complexity of the code, which is due to the requirements roper termination of the previous, the present, and the next module. The need for proper handling of end of input is, however, very much a factof life in compiler construewe cannot afford to sweep its complexities under the rug. e filter obtains its input characters by calling upon its predecessor in the module sequence; such a call may succeed and yield a character, or it may fail. The transformed passed on to the next module. Except for routine calls to the previous and le, control remains insidethe while loop all the time, andno global variables are needed. Alt~oughmain loops are efficient, easy to program and easy to understand, they have oneseriousflawwhichpreventsthem from beingusedastheuniversal pro~ram~ng model for compiler modules: a main loop does not interface well with another main loop in traditional pro~ramminglanguages. Linking the output of one main loop to the input of
another involves a transfer of control that leaves both the environment of the callee and that of the caller intact, regardless of whether the consumer calls the producer to obtain new infomation or the producer calls the consumer to pass on processed inforl~ation. The traditional function call creates a new environment for the callee and traditional function return destroys the environment of the callee. So they cannot serve to link two loops, A transfer of control that does possess the desired properties is the c which involves having separate stacks for the two loops to preserve both envlron~ents. The coroutine linkage also takes care of the end-of-input handling: an attempt to obtain i n f o ~ a t i o nfrom a module whose loop has terminated fails. A well-known implementation of the coroutine linkage is the IJNIX pipe, in which the two separate stacks reside in different processes and therefore in different address spaces. Implementation of coroutines in imperative languages is discussed in Section6.3.7. Although coroutine linkage was proposed by Conway (1963) early in the history of compilerconstruction, no p r o g r ~ i n glanguageexceptperhapsIcon(Griswoldand Griswold, 1983) has ever featured a usable implementation of it. In the absence of coroutines we have to choose one of our modules as the main loop in a narrow compiler and implement the other loops through trickery. That this implies major surgery to these loops is shown by Figure 1.25, which shows our filter as a loop-less module preceding the main loop, and FigureAnswers.l, which shows itas a loop-less module following the main loop. We see that global variables are needed to recordinfomation that must remain availablebetweentwosuccessivecalls of thefunction. The variable Input exhausted records whether the previous call of the function returned from the position before the EXIT WHILE in Figure1.24,andthevariable There is a stored character records whether it returned from before outputting Chl. Some additional code is required for proper end-of-input handling. Note that the code is 29 lines long as opposed to 15 for the main loop. Similar considerations apply to the post-main variant, which is given as an exercise. An additional complication is that proper end-of-input handling requires that the filter be flushed by the using module when it has supplied its final chunk of i n f o ~ a t i o ~ . See Exercise 1.11. Looking at Figures l .25 and Answers.1 in the answers to the exercises, wese complication comes from having to save program state that resides on the stack. be convenient to choose for the main loop the module that has the most state on That module will almost always be the parser; the code generator may gather more state, but it is usually stored in a globaldata structure rather thanon the stack. This explains why we almost universally find the parser as the main module in a narrow compiler: in very si~p~e-minded wording, the parser pulls the program text in through the lexical analyzer, and pushes the code out through the code generator.
The foremost property of a good compiler is of course that it generates correct code. A compiler that occasionally generates incorrect code is useless; a compiler that generates incorrect code once a year may seem useful butis dangerous. It is also i~portantthat a compiler conform completely to the language specification. It may be tempting to implement a subset of the language, a superset or even what is some-
T the flag Input exhausted TO False; SET the flag There aisstored characterTO False; // can never be an SET Stored character TO Undefined;
,a'
F ~ C T I O NFiltered character RETURNING a Boolean, a character: IF Input Exhausted: RETURN False, No character; ELSE IF There is a stored character: // It cannot bean ,ar: SET There is a stored character TO False; RETURN True, Stored character; ELSE Input not exhausted AND There is no stored character: IF Obtained input character Ch from previous module: IF Ch = 'a': // See if there is another, a f : IF Obtained input character Chl from previous module: IF Chl= 'ar: // We have 'aa': RETURN True, 'b'; ELSE Chl /= 'ai: SET Stored character TO Chl; SET There is a stored character TO True; RETURN True, 'a' ; ELSE Ch1 not obtained: SET Input exhausted TO True; RETURN True, 'ar; ELSE Ch /= 'a' : RETURN True, Ch; ELSE Ch not obtained: SET Input exhausted TO True; RETURN False, No character; F i ~ u r1.~25 The filter aa "+b as a pre-main module.
stically called an 'extended subset', and users may even be grateful, but those will soon find that programs developed with such a compiler are much less those written using a fully conforming compiler. (For more about the notion f ' e ~ t e n ~ esubset', d see Exercise 1.13.) ~ n o t property ~ e ~ of a good compiler, one that is often overlooked, is that it should be le to ~ ~ n d l e p r oof~essentially ra~s arbitrary size, as far as available memory permits. It ~ e very~reasonable ~ s to say that no sane programmer uses more than 32 parameters in a routine or more than 128 declarations in a block and that one may therefore allocate a fixed ace for each in the compiler. One should, however, keep in mind not the only ones who write programs. More and more software is ams, and such generated software may easily contain more than128 declarations in one block -although more than 32 parameters to a routine seems excessive, even
for a generated program; famous last words ... Especially any assumptions about limits on the number of cases in a caselswitch statement are unwarranted: very large case statements are often used in the implementation of automatically generated parsers and code generators, Section 5.1.3 shows how the flexible memory allocation needed for handlin grams of essentially arbitrary size can be achieved at an ost negligible increase in cost. allprogramscanbeexpected Compi~ationspeed is anissuebutnot a major one a secondonmodernmachines.r progra~mingprojectsare tocompileinunder izedinmanrelativelysmallsubprorams,modules,libraryroutines,etc., compilationunitscanbecompiled program modification is usually restricted to the modified co~pilationunits only. Also, compiler writers have traditionally been careful to keep their compilers ‘linear in the input’ ,which means that the compilation timeis a linear function of the length of the input file. This is even more important when generated programs of considerable length. are being compiled, since these can be There are several possible sources of non-linearity in compilers. First, all linear-time techniques are rather inconvenient, and the worry-free parsing techniques can be cubic in thesize of theinput in theworstcase. Second, manycodeoptimizationsare potentially e~ponentialin the size of the input, since oftenthe best code can only be found byconsideringallpossiblecombinations of machineinstructions.Third,naivememory ~anagementcan result in quadratic time consumption. Fortunately, good linear-time solutions or heuristics are availablefor all these problems, Compiler size is hardly an issue anymore, with most computers having many megabytes of primary m e ~ o r ynowadays. The ~ser-friendlinessof a compiler shows mainly in the quality of its error reporting, At the least, the user should be presented with a clear error message which includes the d cause of the error, the name of the input file, and the position in it. Giving a od error cause description is often hard or impossible, due to the limited insight have into incorrect programs. inp pointing the error is aided by recordin name and line number with every token and every node inthe AST. More fancy reporting mechanis~s,jncluding showing parts of the syntax tree, may not havethe beneficial effect the compiler writer may expect from them, but it may be useful to provide some visual display mechanism,for example opening a text editor at the point of the error, The importance of the speed and the sizeof the generated code depends totally on the purpose of the compiler, Normally one can expect that the user is more interested in hi speed than in small size (code for embedded applications? suchas washing machines, portable telephones, etc., is an exception). Moderately advanced optimization techniques will perhaps provide a factor of three improvement over very naive code generation; implesuch optimizations may take about the same amount of time as the entire compiler project. Gaini~ganother factor of two or even three over this may be possible t ~ o u extensive ~ h and aggressive optimization; one can expect to spend many times the original effort on an optimization phase of this nature.
A program is considered
ondifferentmachine ty course, a matter of opinion, but t ‘makefile’ to reflect the local situa With compilers, machine de also -perhaps even mainly -in guish two forms of portability: th on another machine, and the ease first is called the y. If thecompiler level language, good portability c retargetabili the back-end; entire back-end. In this contextit is importan mean writing one from scratch. dependent, but much of it is no otherback-endsandotherpar descriptions. This approach ca reasonable effort. Given the p between one and four progr descriptions range in size betweena few hundre
o create a new
~ptimizationsareattractive:muchresearchincompiler co~str~ction is ~ ~ n c e r ~with ed them, and compiler writers regularly see all kinds of o should, however, be kept in mind that imple~enti piler construction: unlike correctness, o p t i ~ i ~ a t i ming, it is easier to make a correct program fast than a fa easier to make correct generated object code fast than t correct. There is another reason besides correctness why W algorithm in this book: some traditionala1 orithms me basicalgorithms.Sometimesthebasic ized version and in any case the basic v dom of design than the optimized version. A good example is the stack in i ~ p l e m e ~ t ~ t i o moment the stack holds the pe~inentdata -admini for each active routine, a routine that of data is calledthe‘activationrecord’ o f this an activation records are found only on the stack, represents a ~ n n i n groutine; we consider the sta activationrecordsare just parts.Itis,however
record as the p r i m ~ yitem: it arises naturally when a routine is called (‘activated’) since it is obvious that its pertinent data has to be stored somewhere. Its allocation on a stack is just an optimization that happens to be possible in many -but not all -imperative and ~bject-o~ented languages. From this point of view it is easier to understand the implementation of those languages for which stack allocation is not a good optimization: imperative languages withcoroutinesorAda-liketasks,object-oriented languages withactive Smallta1~-likeobjects, functional languages, Icon, etc. Probably the best attitude towards optimization is to first understand and implement the basic structure and algo~thm,then see which optimizations the actual situation allows, and only implement them if they are considered worthwhile (considering their cost). In situations in which the need for optimization is obvious from the start, as for example in code generators, the basic structure would include a framework for these optimizations. This framework can thenbe filled in as the project progresses. Thisconcludesourintroductorypartonactuallyconstructing a compiler.Inthe remainder of this chapter we consider three further issues: the history of compiler construction, formal gramm~s,and closure algorithms.
Three periods can be distinguished in the history of compiler construction: 1945-1960, 1960-1975, and 1975-present. Of course, the years are approximate.
During this period languages developed relatively slowly and machines were idiosyncratic. The primary problem was how to generate code for a given machine. The problem was exacerbatedby the fact that assembly programming was held in high esteem, and high(er)level languages and compilers were looked at witha mixture of suspicion and awe: usinga compiler was often called ‘automatic programming’ , Proponents of high-level languages feared, not without reason, that theidea of high-level p r o g r a ~ i n gwould never catch on if compilersproducedcodethatwaslessefficientthanwhatassemblyprogrammersproduced by hand. The first FORTRAN compiler (Sheridan, 1959) optimized heavily and was far ahead of its time in that respect.
The 1960s and 1970s saw a proliferation ofnew p r o g r a ~ i n glanguages, and language designers began to believe that having a compiler for a new language quickly was more important than having one that generated very efficient code. This shifted the emphasis in compiler construction from back-ends to front-ends. At the same time, studies in formal languages revealed a number of powerful techniques that could be applied profitably in front-end const~ction,notably in parser generation.
From 1975 to the present, both the number of new languages proposed and the number of different machine types in regular use decreased, which reduced the need for quick-andsimple/quick-and-d~ycompilers for newlanguagesandlormachines.Thegreatestturmoil in language and machine design being over, people began to demand professional compilers that were reliable, efficient, both in use and in generated code, and preferably with pleasant user interfaces. This called for more attention to the quality of the generated code, which was easier now, since with the slower change in machines the expected lifetime of a code generator increased. Also, at the same time new paradigms in programming were developed, with functional,logic,anddistributed p r o g r a m ~ n gasthemostprominentexamples.Almost invariably, the run-time requirementsof the corresponding languages far exceeded thoseof the i ~ p e r a t i ~languages: e automatic data allocation and deallocation, list comprehensions, unification, remote procedure call, and many others, are features which require much runtime effort that corresponds to hardly any code in the program text. More and more, the emphasis shifts from ‘how to compile’ to ‘what to compile to’.
~ r ~ m a rormore s 7 precisely c o ~ t e ~ t ~ ~ rmars, e ~ aretheessential f o ~ a l i s mfor desc~bingthe structure of programs in a programming language. In principle, the grammar of a languagedescribesthesyntacticstructureonly,but since thesemantics of a language is defined in terns of the syntax, the grammar is also inst~mentalin the definition of the semantics. There are other grammar types besides context-free grammars, but we will be mainly concerned with context-free grammars. We will also meet reguEar ~ r a ~ ~ a which r s ,more often go by the name of ‘regular expressions’ and which result froma severe restriction on the context-free g r a ~ a r s ;and attrib~teg r ~ ~ ~which ~ rares 7 context-free g r a ~ a r s extended with p~ametersand code. Other types of grammars play no more than a marginal role in compiler construction. The term ‘context-free’ is often abbreviated to CF. We will give herea brief s u m m of ~ the features of CF grammars. A g r a ~ is~a recipe ~ r for const~ctingelements of a set of strings of symbols. When applied to p r o g r a ~ i n glanguages, the symbols are the tokens in the langua of symbolsareprogramtexts,andthe set of strings of symbols is the ~ r o g r a ~ m i n g language. The string BEGIN p r i n t ( {{Hi! ) END
consists of 6 symbols (tokens) and could be an element of the set of strings of symbols generated by a programmi~glanguage grammar, or in more normal words, be a pro some programing language. This cut-and-dried view of a prog~amminglanguage would be useless butfor the fact that the strings are constructed in a structured fashion; and to this structure semantics can be attached.
defines a named syntactic construct. construct. An e~ampleof a production rule is expression "+
l
(
'
expression operator expression'1'
The right-hand side of a production rule can contain two kinds of symbols, terminal symbolsandnon-terminalsymbols. As theword says, a ter short) is an end point of the produ be pa^ of the strings produced by hand side (the name)of one or more pr duced by the grammar, Terminals are an input to be analyzed. on-ter~in g r a ~ m symbols ~ intheright-handside of a rule arecollectivelycalledits ,when they occur as nodes in a syntax tree they are more often called its 'children' . In discussing gra~mars,it is customary to use some conventions that allow the class of a symbol to be deduced fromits typo~raphicalform. on-te~inals are denoted by capital letters, mostly A, B,C, and N. T e ~ i n a l are s denoted by lower-case letters near the endof the alphabet, mostlyx, y , and z. Sequences of g r a m m ~symbols are denoted by Creek letters near the beginning of the alphabet, mostly a (alpha), p (beta), and y amma ma). Lower-case letters near the beg inn in^ of the alphabet (a,6, c, etc.) stand for themselves, as terminals. The empty sequence is denoted by E (epsilon),
The central data structure in the production process is the se described as a string of g r a ~ a symbols, r and can then be tho tially produced program text. For our purposes, however, we want to represent the syntactic structure of the program too. The syntactic structure can be added to the flat inte~retation of a sentential form as a tree positioned above the s ~ n t e ~ t iform a l so that the leaves of the tree are the g r a l ~ msymbols. ~ This combination is also called a A string of terminals can be produced from a g r a ~ ~by a rapp tion steps to a sentential form, as follows. The sentential form is initialized to a copy of the start symbol. Each finds a non-terminal N in the leaves of the se~tential fom, finds a production rule N+a with N as its left-hand side, and replaces the iV in the sentential form with a tree having N as the root and the right-hand side of the production rule, a, as the leaf or leaves. When no rnore non-terminals can be found in the leaves of
the sentential form, the production process is finis ed, and the leaves form a string of terminals in accordance with theg r a m ~ a r . Usingtheconventionsdescribedabove,we can writethat replaces the sententialform PNy by Pay. rocess leading from the start sym string. called that are nals the up pose our numbered productio expression -+ ' ( / expression operator expression ') expression "+ '1' operator "+ '-I-' operator -+ '*'
[l] [27 E31
E41
in which the terminal symbols are s u ~ o u ~ d by e dapostro identifiers, and suppose the start symbol is expression. tial forms shown derivation in theF of the particular, it forms a non-terminal in the se~tentialform that is rewritten. in Figure 1.26 shows that grammar ruleR is used to The resulting parse tree (inw h i c ~the derivatio~order is n ure 1.27.
1@1
2@2 4@3
1@4 2@5 3@6 2@7
expression (' expression. operator expression ' 1 ' ' (' '1' operator expression '1' ' ( / '1' '*/ expression. ) ' '(' '1' '*' '(' expression operator expression ')' '1' '(' '1' operatorexpression ')'')' '(' '1' '*' '(' '1' ' S ' expression '1' ' 1 ' ' (' '1' '*' ' (' '1' '+' '1' ' 1 ' ')' I
(
'
Leftmost derivation of the string (1 * (l+ l. ) )
We see that recursion -the ability of a p r o ~ l ~ c trul io~ itself -is essential to the pro~uctiollprocess; without rec only a finite set of strings. The production process is kind enough to produce production tree, but then the program text is com~ittedto a puter file) and the production tree gets stripped off in t to find out the se~anticsof the pro~ram,we use a S retrieve it. The systematic construction of parsers is tr
')'
,
expression
*’
’( J
\ \ 7 2 ’( J expressionoperator
’1’
J
expressionoperatorexpression
’1’
+’
1’
2’7 Parse tree of the derivation in Figure 1.26.
The single g r a ~ m arule r format non-terminal
-+ zero
or more grammar symbols
used above is sufficient in principle to specify any grammar, but in practice a richer notation is used. For one thing, it isusual to combine all rules with the same left-hand side into one rule: for example, the rules
N+a
are combined intoone rule
N+alPlr
in which the original ri~ht-handsides are separated by vertical bars. In this form a, p,and ich may be considered an abbrevia. It is very suitable for expressing nesting and recursion, but less convenient for expressing repetition and optionality, although it can of course express repetition through recursion. To remedy this, three additional notatio~sare introduced, each in the form of a postfix operator: “he format described
so far is kn
-R + indicates the occurrenceof one or more Rs, to express repetition; -R ?indicates the occurrenceof zero or one Rs, to express optionality; and
-R * indicates the occurrenceof zero or moreRs, to express optional repetition.
Parenthesesmay be needed if thesepostfixoperatorsaretooperateonmorethanone
l. Thegrammarnotationthatallowstheabove . An example is the grammar rule parameter-list
-+
foms is called
,for
(/IN’ I ’OUT’)? identifier (’(‘identifier)*
which produces program fragments like a/ b IN yearr month, day OUT left right I
There are a number of properties of grammars and its components that are used in discussing grammars. A non-terminal N is le~t-re~ursive if, starting with a sentential form N, we can produce another sentential form starting with N. An example of direct left-recursion is expression
-+
expression ‘+’ factor
I
factor
but we will meet other forms of left-recursion in Section 2.2.4.3. that con ins one or more left-recursive rules is itself called left-recursive. talso exists, butis less important. -terminal N is ~ullableif, starting with a sentential form N, we can produce an empty sentential form A non-terminal N is useless if it can never produce a string of t e r ~ n a symbols: l any attempt to do so inevitabl~leads to a sentential that again containsN. A simple example is expression
-+
r+r
expression
I
I
-’ expression
but less obvious examples can easily be constructed, Theoretically, useless non-terminals can just be ignored, but in real-world specifications they almost certainly signal a mistake on the part of the user; in the above example, it is likely that a third alte~ative,perhaps I factor,has been omitted. ~rmmar-~rocessing software should check for useless nonterminals, and reject the grammar if they are present. A grammar is ambi uous if it can produce two different production trees with the same leaves in the same order. That means that when we lose the production tree due to linearization of the program textwe cannot recover it unambiguously; and since the sernantics derives from the production tree, we lose the semantics as well. So ambiguous g r m marsaretobeavoided in thespecification of p r o g r a ~ n glanguages,whereattached semantics plays an importantrole,
Thoughts,ideas, definitions, andtheoremsaboutgrammarsareoftenexpressedin a mathe~aticalformalism, Some familiarity with this formalism is indispensable in reading books and articles about compiler construction, which is why we will briefly introduce it ch, much more can be found in any book on formal languages,for which see the ading section of this chapter.
The basicunitinformal g r a ~ a r is s the sy . The onlypropertyofthesesymbols is that we can take twoof them and compare them tosee if they are the same. In this they are comparable to the values of an enumeration type. Like these, symbols are written as identifiers, or, in mathematical texts, as single letters, possibly with subscripts. ~xamplesof - ~ y m ~ o l , symbols are N,x, pracedure-bod ~ ~ ~ ~ ~ n m ~ n t t,+ The next building unit of formal rammars is the ~ r o ~ u ~rule. t i oGiven ~ two sets of symbols V I and V 2 ,a production rule
(N, a) such thatN E V I,a~ V2 * in which X" means a sequence of zero or more elements of the set X . This means that a production rule is a pair consisting of an N which is an element of V , and a sequence a of elements of V 2 . We call N the left-hand side and a the ri~ht-handside. We do not normally write thisas a pair (N,a) but rather as N+a; but technically it isa pair. The V in V I and V 2 stands for v Nowwehaveingunitsneededtodefine a ramm mar. A eo G is a 4-tuple
in which V, and VT are sets of symbols, S is a symbol, and P is a set of production rules. elements of VN c are S , those of VT the ,and S is called the . In programer' s t e ~ i n o l o g ythis means that a g r a is a~ record ~ with non-terminals, the terminals, the start symbol, and the production rules. The previous paragra~hdefines only the c m grammar. To make it a real, acceptable grammar,it has to fulfill threee in which li7 denotes the empty set and which means thatVN and VT are not allowed to have symbols in common: we must be able to telltermi~alsand no~-terminalsapart;
which means that the start symbol must be a non-te~inal;and
which means that the left-hand side of each production rule must be a on-terminal and that the right-hand side may consistof both te~minalsand non-terminals but is not allowed to include any other symbols.
Sequences of symbols are called st be derivable from another string in a grammar; more in p ~ i c u l a ra, S a string a, written a:: a==$, if and only if there exist stringsy, 8, ,62,and a non-terminal N E V,, such that
This means that if we have a string and we replace a non-te string that is directly d Of course,'replacement'
is animperative
notio~
,if and only if a= is ~ e a n that s a strin from the first throu another string if we can reach the second string of a g r a m m ~ G is defined as which is any string that is derivable from the start symbol S of of a grammar G is defined as a sente~~t contain non-terminals:
which denotes a string derivable fromS which is in VT",the set of all strin terminal S mbols on1 . A in, The
whichistheset of allterminalproductions of G. heseterminalproductionsarecalled S in the language U G ) . Termi~al producti mars: if G is a g r a for~a pr~ r a ~ m i n glanguage, th in thatlanguagethatarecorrect a context-freesense. have another property in addition to their identity: typed, printed, etc. For example the representation be := or =,that o f i ~ ~ e ~ e ~ - t y p e - ~CO ym~ol nal symbols ina sentence by their re~resentatio~s and comments, we obtain a program. It is usually considered symbol that has an empty represe~tatio~; it is only sliglltly l different terminal symbols that share the samerep~esentation. Since we are, in this book, more c o n c e ~ e d formal proofs, we will use this formalism s p ~ i n g l yor not at all.
n ~ m b e rof algorithms in compiler constr~ctionstart off tion items and then apply a set of rules to extend the info clusions from them. These 'infor~ation-improving' al~orithms thms are treated in iso~ation; which does not show up well when theal will therefore treat here a si look more different than they really are.
tive of this class of algorithms and refer back toit from the following chapters. The example algorithm in question is the construction of the calling graph of a proof a program is a directed graph which has a nodefor each routine (procedure or function) in the program andan m o w from node A to node B if routine A calls routine B directly or indirectly. Such a graph is useful to find out, for example, which routines are recursive and which routines can be expanded in-line inside other routines. Figure 1.28 shows the sample program in C, for which we will construct the calling graph; the diagram shows the procedure headings and the procedure calls only.
void void void void void
P O Q0 R 0 SO TO
{ { { { {
... Q O ; .... S o ; ... } ... R O ; .... T o ; ... } ... P O ; } ... } ... }
Sample C p r o g r a ~used in the c o ~ s t l ~ c tof i oa~calling graph.
When the calling graph is first constructed from the program text, it contains only the arrows for the direct calls, the calls to routine B that occur directly in the body of routine A; these are our basic information items. (We do not consider here calls of anonymous routines, routines passed as parameters, etc.; such calls can be handled too,but their problems have nothing to do with the algorithm being discussed here.) The initial calling graph of the code in Figure 1.28 is givenin Figure 1.29, and derives directlyfrom that code.
P
R
\k
S
T
Initial (direct) calling graphof the code in Figure1.28.
The initialcallinggraphis,however, of little immediateusesincewearemainly interested in which routine calls which other routine directly or indirectly. For example, recursion may involve call chains from A to B to C back to A. To find these additional information items, we apply the following rule to the graph:
If there is an arrow from node A to nodeB and one fromB to C, make sure there is anm o w from A to C. sider this rule as an algorithm (whichit is not yet), this set-up computes the tr sure of the relation ‘calls directly or indirectly’, The transitivity axiom of the relation can be written as: in which the operator err: should be read as ‘calls directlyor indirectly’. Now the statements ‘routine A is recursive’ and ‘ A d ’ are equivalent. The resulting calling graph of the code in Figure 1.28 is shown in Figure 1.30. We see that the recursionof the routines P, 0, and R has been brought into the open.
Figure 1.30 Calling graph of the code in Figure 1.28.
-Data definitions -definitions and semantics of the information items; these derive from the natureof the problem.
-Initializations -one or more rules
for the initialization of the i n f o ~ a t i o nitems; these convert information from the specific problem into information items. -Inference rules -one or more rules of theform:‘Ifinformationitems I I ,12,... are present then information item J must also be present’. These rules may again refer to specific information from the problem at hand, The rules are called i n ~ e r e ~ crules e because they tell us to infer the presence of i n f o ~ a tion item J from the presenceof i n f o ~ a t i o nitems I1 ,I2,.... When all inferences have been drawn and all inferredi n f o ~ a t i o nitems have been added,we have obtained the closureof the initial item set. If we have specified our closure algorithm correctly, the final set contains the answers we are looking for. For example, if there is m arrow from node A to node A, routine A is recursive, and otherwise it is not. Depending on circu~stances,we can also check for special, exceptional, or erroneous situations.
ure 1.31 shows recursion detectionby calling graph analysis written in this format.
raph with one node for each routine. The information
items are avows in G. 2. An m o w from a node A to a node B means that routine A calls routine B directly or indirectly.
f a routine A contains a call to routine B, an arrow from A to B must be present. If there is an m o w from nodeA to nodeB and one fromB to C, an arrow from A to C must be present. Recursion detection as a closure algorithm.
must be noted about this format. The first is that it does specify which st be present but it does not specify whichinfor~ationitems ~~~~tnot be present; not~ingin the above prevents us from adding arbitrary i n f o ~ a t i o nitems. To remed this, we add the re~uirementthat we do not want m y i ~ f o r ~ a t i oitems n that are not by any of the rules: we wantthe smallest set of i n f o ~ ~ t i oitems n that therules in the closurealgorithm.Thisconstellation is calledthe the closure algo~thm. The second is that the closure algo~thmas introduced aboveis not really an a l g o ~ t ~ in that it does not specify when andhow to apply the inference rules and when tostop; it is rather a declarative,Prolog-likespecification of ourwishes,and‘closure specificati~n’ it doesnotevencorrespond to an acceptable Prowouldbe a moreterm.Actually, logprogram:theprograminFigure1.32gets into an infinite loop immediately. S
calls(A, C ) :- c a l l s ( A , B) , calls(B, calls(a, b) . calls(b, a ) . :-? calls(a, a ) .
C).
A Prolog program correspondingto the closure algorithmof Figure l .31.
V&at we need is an implementation that will not miss any inferredi n f o ~ a t i o nitems, will not add any unnecessary items, and will not get into an infinite loop. The most convenient i ~ p l e ~ e n t a t i uses o n an iterative bottom-up ~ g o ~ t hand m is treated below.
above. If theinferencerulesarerestrictedtothe
form ‘If in
possibilities for th removal of infor
handling to be specified very efficiently.
The usual way of i~plementin a closure algorithl~is this approach, the information erns are visited in sorn itemsthatfulfillacondition of aninferencerule. corresponding inferred item is added, if it was not alre other conditions again,so we have to continue sweepin The exact order of investigation of items and c data structures and the inference rules. There is no generic closure inferencerules can bepluggedintoobtainaspecificclosure a ingenuity is still required. Figure 1.33 shows code transitive closure algorithm.
SET the flag Something changed TO True; WHILE Something changed: TO False; SET Something changed FOR EACH
Node 1 IN Graph: FOR EACH Node2 IN Descendantsof Node 1: FOR EACH Node3 IN Descendants of Node 2: IF there is no arrow from Node 1 to Node 3: Add an arrow from Nodel to Mode 3; SET Something changedTO True; igure 1.33 Outline of a bottom-up algorithmfor transitive closure.
A sweep consists of finding the nodes of the adding an m o w from it to all its descendants’ desce moment, It is important to recogl~izethe restriction ‘ moment’ since this is what forces us to repeat the s
more arrows are added. We are then sure that the descendants we know are all the descendants there are. The algorithm seems quite inefficient. If the graph contains n nodes, the body of the outermost for-loop is repeated n times; each node can have at most n descendants, so the body of the second for-loop can be repeated n times, and the same applies to the third forloop. Together this is O(n 3, in the worst case. Each run of the while-loop adds at least one arc (except the last run), and since there are at most n2 arcs to be added, it could in principle be repeated n2 times in the worst case. So the total time complexity would seem to be 0 ( n 5 ) ,which is much too high to be used in a compiler. There are, however, two effects that save the iterative bottom-up closure algorithm. The first is that the above worst cases cannot materialize all at the same time. For example, if all nodes have all other nodes for descendants, all arcs are already present and the algorithm finishes inoneround. There is a well-knownalgorithm by Warshall (1962) which does transitive closure inO(n3, time and 0 ( n 2 )space, with very low multiplication constants for both time and space, but it has the disadvantage that it always uses this O ( n 3 ) time and 0 (n 2 , space. Also, 0 (n3, time is still rather stiff in a compiler. The second effect is that the graphs to which the closure algorithm is applied are usually sparse, which means that almost all nodes have only a few outgoing arcs. Also, long chains of arcs are usually rare. This changes thepictu~eof the complexity of the algorithm completely. Let us say for example that the average fan-out of a routine is which means that a routine calls on the averagefother routines; and that the average calling depth is d, which means that on the average after d calls within calls we reach either a routine that does not call other routines or we get involved in recursion. Under these assumptions, the while-loop will be repeated on the average U!times, since after d turns all required arcs will have been added. The outermost for-loop will still be repeated n times, but the second and third loops willbe repeated ftimes during the first turn of the while-loop,f2 times during the second turn,f3 times during the thirdturn, and so on, until the last turn, which takes fd times. So the if-statement will be executed nX
f2(d+l)-,f2
cf"+p+fj+...f2"> =
f2-1
xn
times on the average. Although the constant factor can be considerable -for f=4 and d 4 it is almost 70 000 -the main point is that the time complexityis now linear in the number of nodes, which suggests that thealgo~thmmay be practical after all. This is borne out by experience,andbymanymeasurements (Syslo andDzikiewicz, 1975). Fornon-sparse graphs, however, the time complexity of the bottom-up transitive closure algorithm is still 0(~3).
In s u ~ a r yalthough , transitive closure has non-linear complexity in the general case, for sparse graphs the bottom-up algorithmis almost linear.
In addition to being explained in words and by examples, the outline of an algorithm is sometimessketchedin a pseudo-code; we havealreadyseenexamplesinFigures1.24,
1.25, and 1.33. The o ~ t Z i ~code e ignores much of the problematic details that beset many real-worldprogramminglanguages,includingmemoryallocationanddeallocation,type conversion, and declaration before use. It is an informal, reasonably high-level language, based loosely on the programming language ABC (Geurts et al., 1989). We have chosen not to use an existingp r o g r a ~ i n glanguage, for several reasons:
-We emphasize the ideas behind the algorithms rather than their specificim~lementation, since we believe the ideas will serve for a longer period and will allow the compiler designertomakemodificationsmorereadilythan a specificimplementationwould. This is not a cookbook for compiler construction, and supplying specific code might suggest that compilers can be constructed by copying code fragments from books. -We do not want to be drawn into a C versus C++ versus Java discussion. We em~hasize ideas and principles, andwe find each of these languages pretty unsuitablefor hi idea expression. -Real-world code is much less intuitively readable, mainly due to historical syntax and er no^ allocation problems. The rules of the outline code are not very fixed, but the following notes may help in reading the code. Lines can end in a semicolon (; ),which signals a command, or in a colon (:),which signals a control structure heading. The body of a control structure is indented by 4 spaces with respect toits heading. The end of a control structure is evident from a return to a previous indentation level or from the end of the code segment; so there is no explicit ENT> line. Keywords are all-capital; they are self-evident-we hope. Identifiers can contain spaces in addition to letters, digits, and underscores; this is unusual in p r o g r a ~ i n glanguages, but is helpful in const~ctingmeaningful identifiers that print well both in programcode and in running text. Type identifiers and field selectors of records start with a lower-case letter, because they denote a class (‘a dog’); the other identifiers start with an upper-case letter, because they denote individual items(‘Fido’). Field selectors are marked by a dot: Node .l e f t . The selector is separated from the identifier by a space, to avoid constructs like Top node l e f t being misread as Top (node .l e f t ) . It also emphasizes the fact that the selector is actually a postfix operator acting on the record. A c o m m ~ dcan, among other things, be an English-language command starting with a verb, with individual items embedded as ~guments.Such commands can be viewed as procedure calls. An example from Figure l.33 is Add an
arrow from Node l to Node 3 ;
whichcan be viewedas a call to a procedurewithtwoformal p~ametersAdd an. arrow f m m ... t o ...,in addition to being a command in English. Long lines may be brokenfor reasons of page width; the continuationline or lines are indented by another 4 spaces. Broken lines can be recognized by the fact that they do not end in a colon or semicolon, Comments start at // and run to the endof the line.
This concludes our introduction to compiler writing. We have seen a toy interpreter and compiler that already show many of the features of a real compiler. A discussion of the general properties of compilers was followed by an introduction to context-free g r a ~ m a r s and closure algorithms. Finally, the outline code used in this book was introduced. As in the other chapters, a summary, suggestionsfor further reading, and exercises follow.
-A compiler is
a big file conversionprogram, The input format is calledthesource language, the output format is called the target language, and the languageit is written in is the implementation language. -One wants this file conversion because the result is in some sense more useful, like in any other conversion. Usually the targetcode can be run efficiently, on hardware. -Target code need not be low-level, as in assembly code. Many compilers for high- and very high-level languages generate targetcode in C or C++. -Target code need not be run on hardware, it can also be interpreted by an interpreter; in that case the conversion from source to target can be much simpler. -Compilers can compile newer versionsof themselves; this is called bootstrapping. -Compiling works by first analyzing the source text to construct a semantic representation? andthensynthesizingtarget code fromthissemanticrepresentation.This analysis/synthesis paradigm is very powerful, and is also useful outside compiler construction. -The usual form of the semantic representation is the AST, abstract syntax tree, which is the syntax treeof the input, with useful context and semantic annotations at the nodes. Large parts of compilers are generatedautomatically?using program generators based on formalisms, which is another word for special-purpose programming languages, Important formalisms are regular andcontext-free grammars (for program text analysis), attribute g r a ~ a r s(for context handling), and bottom-up tree rewriting (for code generation). -The source code input consists of characters. Lexical analysis constructs tokens from the characters. Syntax analysis constructs a syntax tree from the tokens. Context handling checks and annotates the syntax tree. Code generation constructs target code from the annotated syntax tree. Usually the target code needs the support of a run-time system. -Broad compilers have the entire AST at their disposal all the time; narrow compilers make do with the path from the node under consideration upwards the to top of the AST, plus information collected about the branches on the left of that path. -The driving loop of a narrow compiler is usually inside the parser: it pulls tokens out of the lexical analyzer and pushes parse tree nodes to code the generator. -A goodcompilergeneratescorrect,truthfulcode,conformsexactlytothesource language standard, is able to handle programs of virtually arbitrary size, and contains no quadratic or worse algo~thms.
-
r that can easily be run on different p l a t f o ~ sis portable; a compiler that can et code for ~ifferent p~atforms is izations are attractive and useful, compiler con~truct~on has shift^^ fr st of the how-to p r o ~ l e ~have s been solved by automatic allow us to recover the structure of the source pros text was linearized in the process of c o ~ ~ i t t iti n ~ iler con~tructio~ are closure a ct more i ~ ~ o ~ a t i o ~ he a ~ ~ o r idiffer t ~ ~ins W ntion is collected and
and Warshall (1 962). Schnorr presents a sophisticated but still reasonably simple version of the iterative bottorn-up algorithm shown in Section 1.lo. l and proves that its expected time requireme~tis linear in the sum of the number of nodes and the final number of edges. Warshall's algorithm is very famous and is treated in any text book on algorith~s, for example Sedgewick (1988).
Compilersareoftenwritteninthelanguagetheyimplement.Identifyadvantages and disadvantages of this technique. Referring to Section 1.1.1.1, give additional examples of why a language front-end would need information about the target machine and why a back-end would need information about the source language. Redo the demo compiler from Section 1.2,in your favorite program~inglanguage. Compare it to the versionin this book. Given the following incomplete grammarfor a very simple segmentof English: Sentence "3 Subject Verb Object Subject "+ Noun-Phrase Object -+ Noun-Phrase Noun-Phrase -+ Noun-Compound Personal-Name 1 Personal-Pronoun Noun-Compound -+ Article? Adjective-Sequence? Noun S
.
.
(a) What is the parse tree for the sentence 1 see you,in which 1 and you are terminal productions of Personal-Pronoun and see is a terminal production of Verb? (b) What would bea sensible AST for this parse tree? Consider the demo compiler from Section 1.2. One property of a good compiler is that it is able to give good error messages, and good error messages require,at least, knowledge of the name of the input file and the line number in this file where an error occurred. Adapt the lexical analyzer from Section 1.2.4 to record these data in the nodes and use them to improve the quality of the error reporting. Implement the constant folding optimization discussed in Section 1.2.6: do all arithmetic at compile time. One module that is missing from Figure 1.21 is the error reporting module. Which of the modules shown would use the error reporting module and why? Modify the code generator of Figure 1.18 to generateC code rather thanPUSH,ADD, MULT and PRINT instruction, and compile and run that code,
ere is the context that must be remembered betweeneach cycle of the while loop ure 1.23 and the next? s the ~ompiler i~plemented in Section 1.2a narrow or a broad compiler?
onstr~ctthe post-main versionof the main-loop module in Figure 1.24. or those who already h o w what a finite-state automaton (FSA) is: rewrite the preill and ~ost-mainversions of the aa -+ b filter using an FSA. You will notice W the code is simpler: an FSA is a more efficient but less structured device for age of state than a set of global variables. at is an ‘extendedsubset’ of a language? Why is the termusuallyused
in a
expression in Section 1.2.1 has: expression
-+expressionterm
I
expressionterm
I
term
f we replaced this by
expression expression
I
expression
I-’
expression
I
term
Id still produce the same language, but the re~lacementis not
arameter-list
”+
(‘IN’ I ‘OUT’) ? identifier (‘I
identifier)*
ection 1.9.3 to BNF.
C - + B C S
me the non-terminals that are left-recursive, right-recursive, nullable, or useguage does the grammarproduce?
y could one want two or more terminal symbols with the same ad design to have
re~resentation?
a terminal symbol with an empty
Refer to Section 1.9.5.1 on the definition of a ~ r a m m condition ~, (3). ?Vhy do we have to be able to tell t e ~ n a l and s non-te~inalsapart? Arguethatthere is onlyone‘smallestset r e q u i r e ~ e ~oft sa closure specification.
of i n f o ~ a t i o nitems’that fulf~llsthe
ist to^ of c o ~ ~ i constr~ction: ~er Study Conway’s paper (1963) on the coroutinebased mo~ula~zation of compilers, and write asummary of it.
e front-end of a compiler starts with a stream of characters which constitute the program text, and is expected to create from it intermediate code that allows context h a n ~ ~ i nand g ~ ~syntactic ~ g struct~reof the et code. It does this by first r e c ~ v the ccording to the g r ~ l ~ mof a rthe lan n terms of its syntactic str~cture e the c o ~ e s p o n d i n ~ i n t e ~ e dcode. iate as constant defi~itionsof the form
CONST pi = 3.14159265; CONST pi-squared = pi * pi;
r a ~ m afor r such constantd e f i ~ i t i ois: ~s ~ o ~ ~ t ~ n t - ~ e ~"+i 'CONST' ~ i t i oidentifier ~
' = I
expression ';'
t appear as is in the program, a l s refer to ~ r a ~ m rules ar and expression are n o ~ - t e ~ ~ i nwhich
o c c ~ ~ r e of ~ cthe e conv a l ~ a t to e ~give a st of the block.' t results in a syntax tree, which contains antics is defined based tree as the intermediate
code.
In practice, the actual syntax tree contains too many dead or uninteresting branches andacleanedupversion of it,theabstractsyntaxtreeorAST, is more efficient. The difference between the twois pragmatic rather than fu~damental,and the details depend on of thecompilerwriter.Considerthe(oversimplified) thegoodtasteanddesignskills g r a m m rule ~ for expressionin Figure 2.1. Then the actual syntax treefor CONST pi-squared = pi * pi;
is
I\'\\\
constant-definition
CONST
identifier
product
pi-squared
=
expression
;
*
factor
expression
identifier
factor
identifier
Pi
Pi
as specified by the g r a m ~ a rand , a possible abstract syntax tree could be: constant-definition
/\ /l\
pi-squared expression
Pi
*
Pi
The simplifications are possible because 1. the tokens CONST, =, and ;serve only to alert the reader and the parser to the presence of the constant definition, and need not be retained for further processing; 2. the semantics of identifier (in two different cases), expression,and factor are trivial (just passingon the value) and need not be recorded.
This means that nodes for constant-de f init ion can be imple~entedin the com-
expression "-+ product I factor factor product "+ expression factor "+ number 1 identifier Figure 2.1 A very simple ~r~~~
for express
ion.
iler as records with twofields: struct constant-definition { Identifier *CD-idf; Expression *CD-expr;
1 (in ition ion to some s t a n d ~ dfields recording in ~ h i c hfile and at what line the constant in it ion was found). Another example of a useful difference between parse tree and AST is the combinanode types for if-then-else and if-then into one node type if-then-else. An ifis represented by an if-then-else node, in which the else part has been supplean empty statement, as shown in Figure2.2.
if-statement
IF
condition
THEN
if-statement
statement condition statement statement
Syntax tree (a)and abstract syntax tree (b)of an if-then statement.
ntext handling module gathers information about the nodes and combines it other nodes. This information serves to perform contextual checking and to eneration. The abstract syntax tree adorned with these bitsof information is tated ~ ~ s t r asc yt ~ t atree. ~ Actually the abstract syntax txee passes through 'an~otatedness'during compilation. The degree of annotatedness starts out straight from parsing, and continues to grow even throughcode generation, in which, for example, actual memory addresses may be attached as annotationsnodes, to At the endof the context handling phase our AST might have the form
constant-definition
expression TYPE: real.
pi-squared TYPE: real
* Pi TYPE: real VAL: 3 . 1 4 1 5 9 2 6 5
Pi TYPE: real VAL: 3 . 1 4 1 5 9 2 6 5
and after constant folding -the process of evaluating constant expressions in the compiler -it might be rather than at run time constant-definition
pi-squared TYPE: real
expression TYPE: real VAL: 9 . 8 6 9 6 0 4 3 7
Havingestablished the annotatedabstractsyntaxtree as theultimategoal of the front-end, we can now work our way back through the design. To get an abstract syntax tree we need a parse tree; to get a parse tree we need a parser, which needs a stream of tokens; to get the tokens we need a lexical analyzer, which needs a stream of characters, and to get these characters we needto read them. See Figure 2.3.
l I
c " " "
1
" " " r
Input text
' " " " & ure 2.3 Pipeline from input to annotated syntax tree.
Some compiler systems come witha so-called structure editor anda program management system which stores the programs in parsed forrn. It would seem that such systems can do without much of the machinery described in this chapter, but if they allow unstructured program test to be imported or allow such modifications to the existing text that parts of it have to be reanalyzed from the character level on, they still need full the apparatus. Noonan (1985) gives aset of heuristic rules for deriving a good AST structure from a grammar. For an even more compact internal representation of the program than ASTs see Waddle (1990).
ty, and it is not u n u s ~ afor ~ 30% of the time spent in the front-end to be
piler. oadmap 2.1 From p r o g r text ~ ~ to tokens -the lexical s~ructure
eading the program text
2.1.2 Lexical versus sy~tacticanalysis 2. l .3Regular expressions and regulardescri~tions 2.3.4 Lexical analysis
.I .5-2.1.9 Creating lexical analyzers 2.1.10-2.1 .12.3 Symbol handling and its applications
text consists of ch~acters,but the use of the s t a ~ ~ch~acter-read ar~ in^ routines sincetheseroutines are
necess~y,and on some py of the c ~ ~ a c tthe e ~file s ~ o ~ t a iOlder ~s. eed up read in^ of the program file and to n ~ a c h i ~ the e s r e c o ~ ~ e ~ method d e d is to ly the ~ a s t ~input s t methodand es problem: ode^ ~ a c ~ i nhave enerated ~ r o files g are ~ seldom ~ ~ that er to obtain the size of a file, so memory e entire input file in ples of such tokens are any of these need to be stored for
further use by the compiler and allocating space for them is much easier if their sizes are known in advance. Suppose, for example, that a string is read using a routine that yields characters one by one. In this set-up, the incoming characters have to be stored in some temporary buffer until the endof the string is found; the size of this buffer is not known in advance. Only after the end of the string has been readcan the final allocation of space For the string take place; andonce we have the final destination we still have to copy the characters there. This may lead to complicated allocation techniques, or alternatively the cornpiler writer is tempted to impose arbitrary limits on the largest allowable string length; it also costs processing timefor the copying operation. With the entire file in memory, however, one can just note the position of the first string character, find the end, calculate the size, allocate space, and copy it.Or, if the input file stays in memory throughout the entire compilation,one could represent the string by a pointer to the first character andits length, thus avoiding all allocation and copying. ing the entire program textin memory has the additional advantage that error messages can easily show the precise code around the place of the problem.
There is some disagreement as to whether ‘newline’ is a character or not, and if it is, what it looks like. Trivial as the question may seem, it can be a continuous source of background bother in writing and using the compiler. Several facts add to the confusion. First, each operating system implements its own convention. In UNIX, the newline is a character, with the valueof octal 12. In MS-DOSthe newline is a combination of two characters, with values octal 15 and 12, in that order; the meaning of the reverse order and that of the characters in isolation is undefined. And in OS-370 the newline is not a character at all: a text file consists of lines called ‘logical records’ and reading it produces a series of data structures, each containingasingleline.Second,inthosesystemsthatseemtohave a newline character, it is actually rather an end-of-line character, in that it does not occur at the beginning of the first line, but does occur at the end of the last line. Again, what happens when the last line is not terminated properly by a ‘newline character’ is undefine And last but not least, some people have strong opinionson the question, not allof them in agreement with the actual or the desired situation. Probably the sanest attitude to this confusionis to convert the input to a fixed internal format as soon as possible. This keeps the operating-system-dependent part of the compiler to a minimum; some imple~entationlanguages already provide library routines for doing this. The internal format must allow easy lexical analysis, for noma1 processin and easy reproduction of the original program test, for error reporting. A convenient format is a single character array in which the lines are stored consecutively, each t e ~ i n a t e by a newline character. But when the text file format supplied by the operating system differs too much from this, such an array may not be cheap to construct.
Havingbothalexicalandasyntaxanalysisrequires one todecidewheretheborder betweenthetwolies.Lexicalanalysisproducestokensandsyntaxanalysis cons~~mes them, but what exactly is a token? Part of the answer comes from the language definition
it is design. A good guideline is ‘If it can be separated from its left and right white space without changing the meaning, it’s a token; otherwise it isn’t.’ If lowed between the colon and the equals sign in :=, it is two tokens, and ppear as a separate token in the grammar. If they have to stand next to each thing intervening, it is one token, and only one token occurs in theg r a m m ~ . ean that tokens cannot include white space: strings can, and they are tokens e, since adding white space in a string changes its meaning. Note that the ate the string are not tokens, since they cannot be separated from their acters by white space without changing the meaning. ~ ~ m e nand t s white space are not tokens in that the syntax analyzer does not cony are generally discarded by the lexical analyzer, but it is often useful to e them, to be able to show some program text surrounding an error. om a pure need-to-know point of view, all the lexical analyzer has to supply in the struct Token.are the c l a s S and repr fields as shown in Figure1.1l , but in practice it is very much wort~whileto also record the name of the file, line number, and character posiit started).Such i n f o ~ a t i o nis tion in whichthetokenwasfound(oractuallywhere i ~ v ~ l u ~for b l egiving user-friendly error messages, which maysurface much later on in the c o ~ ~ i l ewhen r , the actual program text may be long discardedfrom memory.
he shapes of thetokens of a languagemay be describedinformallyinthelanguage anual, for example: ‘An identifier is a sequence of letters, digits, and underscores that starts with a letter; no two consecutive underscores are ailowed init, nor can it have a trailscore.’ Such a description is quite satisfactory for the user of the language, but ler c o n ~ t ~ c t i opurposes n the shapes of the tokens are more usefully expressedin what are called ‘regular expressions’. Regular expressions are well known from their use as search expressio~sin text editors, where for example the search expression ab* is used t that consists of an a followed by zero or morebs. essio~is a f o ~ u l athat describes a possibly infinite set of strings, be viewed both as a recipe for generating these strings and asa patngs. The above regular expression a b * , for example, generates the b ... }. When we have a string that can be generated by a ay that the regular expression~ a t c h e sthe string. e most basic regular expressionis a pattern that matchesjust one character, and the ne that specifies that characterexplicitly; an example is the pattern which matches the character a. There are two more basic patterns, one for matching a set of c~aractersand one for matching all characters (usually with the exception of the n ~ - o f - l i ~character, e if it exists).Thesethreebasicpatternsappear at thetop of Figy , z, ...stand for any character and R,R R2,...stand for any regonally be followedby a re~etitiono ~ e r ~ t oexamples r; are b? r a possiblyemptysequence of bs; and b+ for a non-empty
on o ~ e r a t o r ~One . is the invisible operator, which indicates
x
Basic ~ a t t ~ r n s :
[VZ...l
epetition operators: R? R* R+
~atc~in~: The character x Any character, usually except a newline Any of the characters x, y, z, ... An R or nothing (=optionally anR ) Zero or more occurrencesof R One or more occurrencesof R
Composition operators:
R IIR2
An R l followed by an R2 Either anR l or anR 2
~rouping: (R)
R itself
RIA2
Figure 2.4 C o ~ p o n e n t sof regular expressions.
I
concatenation; it occurs for example between the a and the b in ab*. The second is the operator which separates alternatives; for example, ab* I cd? matches anything that is matched by ab* or alternatively by cd?. The repetition operators have the highest precedence (bind most tightly); next comes theconcatenationoperator;andthealternativesoperator 1 hasthelowestprecedence. Parentheses can be used for grouping. For example, the regular expression ab* I cd? is equivalent to ( a (b*)) 1 (c (d?) 1. A more extensive set of operators might for example include a repetition operator of the form Rm-n, which stands for m to n repetitions of R, but such forrns have limited usefulness and complicate theimple~entationof the lexical analyzerco~sidera~ly.
A comparison with the right-hand sidesof production rules inCF g r a ~ m a r ssuggests itself. We see that only the basic patterns are characteristic of regular expressions. They share with the BNF notation the invisible concatenation operator and the alternatives operator, and with EBNF the repetition operators and parentheses. atorsand [,1, (,and ) as separators prevents them from being used to match themselves as actual characters. There is usually some trickery involving esca e characters to force these characters to stand for themselves rather than being taken as operators. One example of such an escape character is the backslash, \, which is used as a prefix: \* denotes the asterisk, \\ the backslash
etc.Anotheristhequote,which is used to s u ~ o u n dtheescapedpart: +? denotes a plusfollowed by a ~uestionmark, denotes ter itself, etc. As we can see, additional tric ery is needed to represent the re esthetically satisfyi~ aracters used been had *,+,?,etc., with a special ~ e a n i rather n~ endow the n o r ~ a characters l t this is not the path that history has taken and the present situation pre
roble^^.
can substitute the ri~ht-handside of the first rule ~ i n a l s in ) the second and further rules, adding
tion of the previous rules and its r i g ~ t - h a nside ~ corresponds to the desired re~ularexpression. ular description for the identi~ierdefined at thebe i n n i of ~ ~this section is: digit -+ [O-91 underscore "+ __ letter-or-digit -+ letter 1 digit u~der~cored-tail"+ underscore letter-or-digit+ ntifier -+ letter letter-or-digit* underscored-tail* t is relativelyeasy to seethatthis i ~ ~ l e n l e n t s estr~ctionsabouttheuse of theunderand no trailing u~derscore. score: no two cons~cutive u~derscores
The s~bstitutionprocess descri~edabove c identifier -+ [a-zA-Zl ([a-zA-Z] I [O-91) * (-
([a-zA-Z] I
[ 0 - 9 ] )+) *
~ h i cafter ~ , some si~plificatio~, redL~cesto: identifier
" 3
[a-zA-Z]
[a-zA-ZO-9] * (-[a-zA-ZO-9]+)*
he ri~ht-handside is the re ulas e~pressionfor i enti f i e r . This is a clear case of ~o~ciseness versus re~~dability.
e is spe~ifiedby a re shape and co~espond words also fall in this iscussed in Section 2.1 .lo. Other
tokens can occur in many shapes and correspond to more complicated regular expressions; examplesareidentifiersandnumbers,Stringsandcommentsalso fall inthis class, but again they often require special treatment, The combination of token class name and regular expression is called a to assignment-symbol
-+
:=
The basic task of a lexical analyzer is, given a set S of token desc~ptionsan tion P in the input stream, to determine which of the regular expressions in S will match a segment of the input starting atP and what that segmentis. If there is more than one such segment, the lexical analyzer must have a disa~biguating rule; normally the longest segmentis the one we want. This is reasonable: if S contai~s the regular expressions =, = /, and =/=, and the input is =/=, we want the full = /= matched.Ifthelongestsegment is matchedbymorethanoneregularexpressionin S, again tie-breaking is needed and we must assign priorities to the token desc~ptionsin S. Since S is a set, this is somewhat awkward, and it is usual to rely on the textual order in which the token descriptions are supplied: the token that has been defined textually first in the token description file wins. To use this facility, the compiler writer has to specify the more specific token descriptions before the less specific ones: if any letter sequence is an identifier except xyzz y , then the following willdo the job: magic-symbol -+ xyzzy identifier -+ [ a - z l IRoadmap Creating lexical analyzers: 2. l.5 Creating a lexical analyzerby hand 2.1.6 Creating a lexical analyzera ~ t o ~ a t i c a l l y 2.1 .?Transition table compression 2. l.8 Error andl ling in lexical awalyzers 2.1.9 A traditional lexical analyzer generator-lex
Lexical analyzers can be written by hand or generated automatically, in both cases based on the specification of the tokens through regular expressions; the required techniques are treated in this and the following section, respectively. Generated lexical analyzers in particular require large tables and it is profitable to consider methods to compress these tables (Section 2.1.7). Next, we discuss input error handling in lexical analyzers. An example of the use of a t~aditionallexical analyzer generator concludes the sections on the creation of lexical analyzers. It is relatively easy to write a lexical analyzer by hand. Probably the best way is to start it with a case statement over thefirst character of the input. The first characters of the different tokens are often different, and such a case statement will split the analysis problem into many smaller problems, each of which can be solved with a few lines of ad hoc code. Such lexical analyzers can be quite efficient, but still require a lot of work, and m a y be difficult to modify.
ures2.5 though 2.12 containtheelements of asimplebutnon-triviallexical nizes five classes of tokens: identifiers as defined above, integers, onetokens, and the token classes ERRONEOU~and EOF. As one-character tokens t the operators -I-, -,*,and /,and the separators ; ,,(comma), (,) ,{,and },as ht be used in an actual programming language. We skip layout andcomment;commentstartswithasharpcharacter # andendseitherat another # or at end of line. Single characters in the input not covered by any of the above r ~ c o ~ n i z as e dtokens of class ERRONEOUS. An alternative action would be to discard such char~cterswith a warning or error message, but since it is likely that they represent error for an actual token, it is probably better to pass them on to the parser to ere was so~ethingthere. Finally, since most parsers want to see an explicit ken, the pseudo-character end-of-input yields the real token of class EOF for en~-o~-~ile. /* Define class constants;0-255 reserved for ASCII characters: */ #define IDE~TIFIER #define ~ N ~ E ~ #define ERRONEOUS 259
256 257 E258 ~
typedef struct { char *file-name; int line-number; int char-number; } Position-in-File; int class; char *repr; Position-in-File p o s ; , } Token-Type; extern Token-Type Token; xtern tern
void void
start-lex (void) ; get-next-token (void) ; igure 2.5 Header file l e x . h of the ~ ~ n d ~ r i t tlexical e n analyzer
shows that the Token-Type has been extended with a fieldfor recording e ~ositionin the input at which the token starts; it also includes the definitions of the class co~stants.The lexical analyzer driver, shown in Figure 2.6, consists of declarations of localdatatomanagetheinput,aglobaldeclaration of Token, andtheroutines start l e x ( ) ,which starts the machine, and get-next-token ( ) ,which scans the tain the next token and put its data Token. in After skipping layout and comment, the routine get-next-token ( ) (Figure 12.7)
#include #include
!!input.h" "lex.h"
/* for get-input0 */
/* PRIVATE */ static char *input; static int dot; static int input-char; #define
next-char()
/* dot position in input*/
/* character at dot position */
(input-char = input
[++dot1 1
/* PUBLIC */ Token-Type Token; start-lex (void) { input = get-input 0 ; dot = 0; input-char = input Cdotl ;
void
Figure 2.6 Data and start-up of the handwritten lexical analyzer.
recordstheposition of thetoken to be identifiedinthefield Token. pos by calling note-token-posi t ion ( ) ; the code for this routine is notshownhere.Next, get-next-token () takes a five-way split based on the present input character, a copy of which is stored in input-char. Three cases are treated on the spot; two more cornplicated cases are referred to routines. Finally, get-next-token ( ) converts the chunk string by calling of the input which forms the token into a zero-terminated input-to-zst ring ( ) (not shown) and stores the result as the representation of the token. Creating a representation for the EOF token is slightly different since there is no corresponding chunkof input. Figures 2.8 through 2.10 show the routines for skipping layout and recognizing identifiers and integers. Their main task is to move the variable dot just past the end of the form they recognize. addition, In recognize-ident if ier () and recognize-integer ( ) set Token. class. The routine get-next-token ( ) and its subroutines frequently test the present input charactertoseewhetheritbelongsto a certain class; examplesare calls of is-letter (input-char) and is-digit (input-char) . The routines used for this are definedas macros and are shown in Figure 2.1 1. As anexample of its use,Figure2.12shows a simplemainprogramthat calls get-next-token ( ) repeatedly in a loop and prints the i n f o ~ a t i o nfound in Token. The loop terminates when a token with class EOF has been encountered and processed. Given the input # *# 8 ; ##abc-dd-8 ;z #/ z it prints the results shown in Figure 2.13.
void get-next-token (void) { int start-dot; skip-layout-and-comrnent();
/* now we are at the start of a token orat end-of-file, so: */ note-token-position ( ) ; /* split on first character of the token */ start-dot = dot; ) { if (is-end-of-input (input-char) Token.class = EOF; Token.repr = llsEOF>ll; return;
1
if (is-letter (input-char) ) {recognize-identif ier () ;} else if (is-digit (input-char) ) {recognize-integer () ; } else if (is-operator (input-char)I I is-separator (input-char) ) { Token.class = input-char; next-char();
3
else {Token.class = ERRONEOUS; next-char();}
1
Token.repr = input-to-zstring(start-dot, dot-start-dot);
Figure 2.7 Main reading routineof the handwritten lexical analyzer.
void skip-layout-and-comment(void) { while (is-layout (input-char) ) {next-char () ;} while (is-comment-starter (input-char) ) { next-char ( ) ; while (!is-comment-stopper(input-char)) { if (is-end-of-input (input-char)) return; next-char () ;
1
1
next-char ( ) ; while (is-layout
(input-char) ) {next-char ( ) ;}
Figure 2.8 Skipping layout and comment in the handwritten lexical analyzer.
We see that often questions of the type i s-let ter (ch) are asked. These questions have the property that their input parameters are from a finite set and their result depends
void recognize-identifier(v0id) { Token.class = IDENTIFIER; next-char(); while (is-letter-or-digit
(input-char) ) {next-char (1 ;}
while (is-underscore(input-char) && is-letter-or-digit (input[dot+ll)
) {
1
1
next-char () ; while (is-letter_or-digit(input-char)) {next-char0 ;}
Figure 2.9 Recognizing an identifier in the handwritten lexical analyzer.
void
1
recognize-integer (void) { Token.class = INTEGER; next-char(); ) {next-char (1 ;} while (is-digit (input-char)
Figure 2.10 Recognizing an integer in the handwritten lexical analyzer.
#def ine is-end-of-input (ch) ((Ch) == ' \ O r ) #define is-layout (ch) (!is-end-of-input (ch) && (ch) <= #define is-comment-starter (eh)((Ch) == '#') #define is-comment-stopper (ch) ((ch) == r # r I I (eh) == r\n') #define #define #define #define #define #define
is-uc-letter (ch) is-lc-letter (ch) is-letter (ch) is-digit (eh) is-letter-or-digit (ch) is-underscore (ch)
#define is-operator #define is-separator
(ch) (eh)
)
('24' <= (ch) && (eh) <= 'Z') ('a' <= (ch) & & (ch) C = '2')
(is-uc-letter (ch) I I is-lc-letter ( ' 0 ' <= (ch) && (ch) <= ' 9 ' ) (is-letter (ch) I I is-digit (ch)) ( (Ch) == 0
(ch) )
(strchr( l l + - * / t l r (ch)) != 0) (strchr(";,() {}", (ch)) != 0)
Figure 2.11 Character classification in the handwritten lexical analyzer.
on the parameters only, This means that for given input parameters the answer will be the same every time. Ifthe finite set defined by the parameters is small enough, we can com-
66
~
~
#include
~ 2.IT From / o~ ~ r
lflex.hIf
o text ~ tort o ~~ e ~~ the s lexic~l str~cture
/* forstart-lex (1 , get-next-token (1 */
int main(void) { start-lex () ; do { get-next-token () ; switch (Token. class) { case IDENTIFIER: printf (tlIdentifiertl) ;break; caseINTEGER:printf (ttIntegerct) ;break; case ERRO~EOUS: printf ("Erroneous token");break; case EOF: printf ("End-of-filepseudo-token");break; default: printf ("Operator or separatorll) ; break;
1
1
printf ( : %s\n", Token. repr) ; } while (Token. class != EOF) ; return 0;
Figure 2.12 Driver for theh a n d ~ ~ t t elexical n analyzer.
Integer: 8 Operator or separator: ; Identifier: abc Erroneous token: Erroneous token: __ Identifier: dd-8 Operator or separator: ; Identifier: zz Erroneous token: __ End-of-file pseudo-token: cEOF> I
Figure 2.13 Sample resultsof the hand-written lexical analyzer.
pute all the answers in advance, store them in an array and replace the routine and macro and the gains in calls by simple array indexing. This technique is called pr~comp~~ation speed achieved with it can be considerable. Often a special tool (program) is used which perfoms theprecomputationandcreates a newprogramcontainingthearrayand the replacements for the routine calls. Precomputation is closely linked to the use of program generation tools. Precomputation can be applied not only in handwritten lexical analyzers but everywhere the conditions for its use are fulfilled. We will see several other examples in this book. It is especially appropriate here, in one of the places in a compiler where speed
matters: roughly estimated, a program line contains perhaps 30 to 50 characters, and each of them has to be classifiedby the lexical analyzer. Precomputation for character classification is almost trivial; most programers do not even think of it as precomputation. Yet it exhibits some properties that are representative of precomputationusedelsewhereincompilers.One of themoreseriousapplications ch~acteristicis thatnaiveprecomputationyieldsvery large tables,whichcanthen be compressed either by exploiting their structure or by more general means. We will see examples of both.
~ a i v ~
~ r ~ c o ~ ~ ~ t a t i ~ n
The input parameter to each of the macros of Figure 2.11 is an 8-bit character, which can have at most 256 values, and the outcomeof the macro is one bit. This suggests representing the table of answers as an arrayA of 256 l-bit elements, in which elementA [ch] contains the result for parameter ch. However, few languages offer l-bit arrays, and if they do, accessing the elements on a byte-oriented machineis slow. So we decide to sacrifice 7 X 256bitsandallocateanarray of 256bytes for theanswers.Figure2.14showsthe relevant part of a naive table implementation of is-operator () ,assuming that the ASCII character set is used. The answers are collected table the in is-operator-bit [3 ;the first 42 positions contain zeroes, then we get some ones in the proper ASCII positions and 208 more zeroes fill up the array to the full 256 positions. We could have relied on the C compiler to fill out the rest of the array, but it is neater to have them there explicitly, in case the language designer decides that (position 126) or 2 (position 242 in some character codes)is an operator too. Similar arrays exist for the other 11 character classifying macros.
#defineis-operator(ch)(is-operator-bit
[(ch)&0377] )
static const char is_operator__bit[256] = { 0, /* position 0 */ 0 , 0 , ... /* another 41 zeroes */ 1, /* position 42 */ 1, 0, 1, 0‘ 1,
1;
/* /*
0, 0, ...
‘ * I ,
1-1
*/
*/
/* position 47 */ /* 208 more zeroes */ l/’,
Figure 2.14 A naive tablei ~ p l e ~ e n t a t i of o nis-operator () .
Another small complication arises from the fact that the ANSI C standard leaves it undefined whether the range of a char is 0 to 255 (unsigned char) or -128 to 127 (signed char). Since we want to use the input characters as indexes into arrays, we
have to make sure the range is 0 to 255. Forcibly extracting the rightmost 8 bits by ANDing with the octal number 0377 which reads 111I 1111 in binary -solves the problem, at the expense of one more operation,as shown in Figure 2.14. This technique is usually called Ita ,which is somewhat misleading since the term seems to suggest a process of looking through a table that may cost an amountof time linear in the size of the table. But since the table lookup is implemented by array indexing, its cost is constant, like thatof the latter, In C, the cope package provides similar functions for the most usual subsets of the characters,butonecannotexpectittoprovidetests for setslike { I -+ I -I * / }. One will have to create one’s own, to match the requirements of the source language. There are 12 character classifying macros in Figure 2.1 1, each occupying 256 bytes, totaling 3072 bytes. Now 3 kilobytes is not a problem in a compiler, butin other compiler construction applications naive tables are closer to 3 megabytes, 3 gigabytes or even 3 terabytes (DuJardin et al., 1998), and table compression is usually essential. We will show that even in this simple casewe can easily reduce the sizeof the tables by more than a factor of ten.
-
I
I
~ ~ ~ ~ ~ the e stables s i n g We notice that the leftmost 7 bits of each byte in the arrays are always zero, and the idea suggests itself to use these bits to store outcomes of some of the other functions. The proper bit for a function can then be extracted by ANDing with a mask in which one bit is set to 1 at the proper bit position. Since there are 12 functions, we need l 2 bit positions, or, rounded upwards, 2 bytes for each parameter value. This reduces the memo~yrequirements to 5 12 bytes, a gainof a factor of 6, at the expense of one bitwise AND instruction. If we go through the macros in Figure 2.14, however, we also notice that three macros test for one character only: is-end-of-input () ,is-comment-starter ( ) ,and i s-underscore ( ) . Replacing the simple comparison performed by these macros by a table lookup would not bring in any gain, so these three macros are better left unchanged. This means they do not need a bit position in the table entries. Two macros define their classes as combinations of existing character classes: i S-let t( )erand is-l et t er-or-dig ( ) . These i cant be implemented by combining the masks for these existing classes, so we do not need separate bits for them either. In total we need only7bitsperentry,whichfitscomfortably in one byte. A representativepart of the a single array implementation is shown in Figure 2.35. The memory reqL~irements are now of 256 bytes, charbi s [t ]. This technique exploits the particular structure of the arrays and their use; in Section 2.1.7 we will see a general compression technique. They both reduce the memory requirements e n o ~ o u s l yat the expense of a small loss in speed.
The previoussectionsdiscussedtechniques for writingalexicalanalyzerbyhand.An alternative method to obtain a lexical analyzer is to have it generated auto~aticallyfrom regular descriptions of the tokens. This approach creates lexical analyzers that are fast and easy to modify. We will consider the pertinent techniques in detail, first becauseauto~ati-
(1<<1) /* a1 bit, shifted left 1 pos. (1<<2) /* a 1 bit, shifted left 2 pos. (1~~5)
#define UC-LETTER-MASK #define LC-LETTER-MASK #define OPERATOR-MASK
I
#define LETTER-MASK (UC-LETTER-MASK
LC-LETTER-MASK)
#define bits-of (ch)(charbits
E (eh)~ ~ 0 3 7 )7 1
#define is-end-of-input (ch) ( (ch) ==
,\O'
#define is-uc-letter (ch) #define is-lc-letter (eh) #define is-letter (ch)
(bits-of (ch) & UC-LETTER-MASK) (bits-of (ch) & LC-LETTER-MASK) (bits-of (eh) & LETTER-MASK)
#define is-operator
(bits-of
(ch)
& OPERATOR-MASK) (ch)
static const char charbitsi2561= { 0000, /* position 0 */
...
0040, 0040,
...
0000,
0002, 0002, 0000, 0004,
0004, S
.
/* /*
r * r l
position 42 */
*/
/* position 64 */ /* 'A' */ /* 'B' */ /* position 96 */ /* la' */ /* 'b' */
.
0000
/* position 255
*/
Figure 2.15 Efficient classification of characters (excerpt).
cally generated lexical analyzers are interesting and important in themselves and second because the techniques involved will be used again in syntax. analysiscode and generation. Roadmap 2.1.6 Creating a lexical analyzer automatically 2.1.6.1 Dotted items 2.1 6.2 Concurrent search 2.1.6.3 Precomputing the item sets 2. l .6.4 The final lexical analyzer 2. l .6.5 Complexity of generating a lexical analyzer 2.1.6.6 Transitions toS, 2.1.6.7 Complexity of using a lexical analyzer
*/ */
A naive way to determine the longest matching token in the input is to try the regular expressions one by one, in textual order; when a regular expression matches the input, we note the token class and the length of the match, replacing shorter matches by longer ones as they are found. This gives us the textually first token among those that havethe longest match. An outline of the code for n token descriptions is given in Figure 2.16; it is similar to that for a handwritten lexical analyzer. This process has two disadvantages:it is linearly dependent on the number of token classes, and it requires restarting the search process for each regular expression. SET the global token (Token,class, Token .length) TO (0, 0) ;
// Try to match token description T1+Rl: FOR EACH Length SUCH THAT the input matches Tl+Rl over Length: IF Length > Token .length: SET (Token .class, Token .length) TO (TI, Length) ; // Try to match token description TZ+R2: FOR EACH Length SUCH THAT the input matches T2-+R2 over Length: IF Length > Token .length: SET (Token .class, Token .length) TO (T2, Length) ; *.. FOR EACH Length SUCH THAT the input matches Tn+Rn over Length: IF Length > Token .length: SET (Token .class, Token .length) TO (Tn,Length) ; IF Token .length = 0: Handle non-matching
character;
Figure 2.16 Outline of a naive generated lexical analyzer.
We will now develop an algorithm which does not require restarting and the speed of which does not depend on the number of token classes. To do this, we first describe a peculiar implementation of the naive search, which still requires restarting. Then we show how to perform this search in parallel for all token classes while stepping through the input; the time required willstill be proportional to the numberof token classes, but restarting will no longer be required: each character is viewed only once. Finally we will show that the results of the steps can be precomputedfor every possible input character (but not for unbounded sequences of them!) so that the computations that depended on the number of token classes can be replaced by a table lookup. This eliminates the dependency on the number of token classes and improves the efficiencye n o ~ o u s l y .
S
Imagine we stop the attempt to match the input to a given token description before it has either succeeded or failed. When we then study it, we see that we are dealing with four components: the part of the input that has already been matched, the part of the regular expression that has matched it, the part of the regular expression that must still find a match, and the rest of the input which will hopefully provide that match. A schematic view is shown in Figure 2.17.
Still tobe matched
m r Already matched
regularexpression
I I I
/ /
/ /
input
.
1-1
I I I
.
I
'17 Components of a token description and components of the input,
The traditional and very convenient way to use these components is as follows. The two parts of the regular expression are recombinedinto the original token description, with the gap markedby a dot 0 . Such a dotted token description has the form
r+a.p otted item, or an item for short. The dotted item is then viewed as positioned between the matched part of the input and the rest of the input, as shown schematically in Figure2.18.
input + [ Stillmatched Already by a
~/
Dotted itern c n+l
I I
to be matched by P
Figure 2.18 The relation between a dotted item and the input.
When attempting to match a given token description, the lexical analyzer const~cts dotted items between each consecutive pair of input characters. The presence of a dotted
between two input characters c, and c~, means that at this position the part been matched by the characters between the start of the token and c I Zand , that if part p is matched by a segment of the input starting withc,+l, a token of class T will have been recognized. The dotted item at a given position represents a ‘hypothesis’ about the presence of a token Tin the input. An item with the dot infront of a basic pattern at the end a reduce item; together they are called dot in front of regular a subexpression that correspo repetition operator or a parenthesized subexpression. What makes the dotted items extremely useful is that the item between c, and c,+1 can be computed from the one between c,-1 and c,, on the basis of c,. The result of this computation can be zero, one or more than one item -in short, a set of items. So lexical analyzers record setsof items between the input characters. Starting with a known item setat the beginning of the input and repeating the computation for each next character in the input, we obtain successive sets of items to be positioned between the charactersof the input. When during this process wec o n s t ~ cat reduce item, an item with the dot at the end, the corresponding token has been recognized in the input. This does not mean that the correct token has been found, since a longer token may still be ahead. So the recognition process must continue until all hypotheses have been refuted and there are no more items left in the item set. Then the token most recently recognized is the longest token. If there is more than one longest token, a tie-breaking rule is invoked; as we have seen, the code in Figure 2.16 implements the rule that the first among the longest tokens prevails. This algorithm requires us to have a way of creating the initial item set and to compute a new item set from the previous one and an input character. Creating the initial item set is easy: since nothing has been recognized yet, it consists of the token descriptionof the token we are hunting for, with the dot placed before the regular expression: R+ will now turn to the rules for computing a new itern from a previous one and an input character. Since the old item is conceptually stored on the left of the input character and the new item on the right (as shown in Figure 2.18), the computation is usually called ‘moving the item over a character’. Note that the regular expression in the item does not change in this process, only the dotin it moves.
~ ~ u r umoves ~ ~ e r For shift items, the rules €or moving the dot are simple.If the dot is in front of a character c and if the input has c at the next position, the itemis transported to the other side and the dot is moved accordingly:
And if the character after the dot and the character in the input are not equal, the isitem not transported over the character at all: the hypothesis it contained is rejected. The character set pattern [abc...l is treated similarly, except thatit can match any oneof the characters in the pattern. If the dot in an item is in front of the basic pattern .,the item is always moved over
the next character and the dot is moved accordingly, since the pattern .matches any character. Since these rules involve moving items over characters, they are called ch~r~cter moves. Note that for example in the item T+ the dot is not in front of a basic pattern. It seems to be in front of the a, but that is an on: the a is enclosed in the scope of the repetition operator * and the item is actually 9
E moves
A non-basic item cannot be moved directly over a character since there is no character set to test the input character against. The item must first be processed ('developed') until only basic items remain. The rules for this processing require us to indicate very precisely where the dot is located, and it becomes necessary to put parentheses around each part of the regular expression thatis controlled by an operator. Aniteminwhichthedot is in front of anoperator-controlledpatternhastobe replaced by one or more other items that express the meaning of the operator. The rules for this replacement are easy to determine. Suppose the dot is in front of an expression R followed by a star:
So the item actually The star means that R may occur zero or more times in the input. represents two items, onein which R is not present in theinput, and one in which thereis at least one R. The first has the form and the second one:
Note that the parentheses are essential to express the difference between item(1) and item (3). Note also that the regular expression itselfis not changed, only the position of the dot in it is. When the dot in item (3) has finally moved to the endof R, there are again two possibilities: either this was the last occurrence of R or there is another one coming; therefore, the item
must be replaced by two items, (2) and (3). When the dot has been moved to another place, it may of course end up in front of another non-basic pattern, in which case the process has tobe repeated until there are only basic items left. Figure 2.19 shows the rules for the operators from Figure 2.4. In analogy to the character moves which move items over characters, these rules can be viewed as moving items over the empty string. Since the empty stringis represented as E (epsilon), they are calledE moves.
.
T+a ( S )*p T+a(S
0)
*p
... Figure 2.19
E move rules for the regular operators.
A S~~~~~ run To demonstrate the technique, we need a simpler example than the identifier used above. assume We that there two token are classes, integral-number and f ixed-point-number. They are described by the regular expressions shown in Figure 2.20. If regular descriptions are provided as input to the lexical analyzer, these must first be converted to regular expressions. Note that the decimal point has been put between apostrophes,topreventits inte~retationasthebasicpattern for 'any character'. The second definition says that fixed-point numbers need notstart with a digit, but that at least one digit must follow the decimal point. integral-number -+ [O-9]+ f ixed-point-number "+ [0-91 * . [0 -91 + Figure 2.20 A simple set of regular expressions.
We now try to recognize the input 3 ,l;using the regular expression fixed-point-number "+ ( [O-91 ) * ' .
I
( TO-91
+
We then observe the following chainof events, The initial item set is fixedsoint-number -+
0
( 10-91 ) * ' .' (LO-91 )+
Since this is a non-basic pattern, it has to be developed using items:
E moves; this yields two
.' (LO-91 ) + fixed-point-number "3 ( 0 [O-91 )* ' fixed-point-number -+ ([O-91 ) * C+ ' .' ( EO-91 ) +
The first item can be moved over the 3 ,resulting in f ixed-point-number "+ ( [O-91
0 )
* ' .' ( LO-91 ) +
but the second itemis discarded. The new item develops into fixed-point-number -+ ( 0 [O-91 )* ' .' (LO-91 )+ fixed-point-number "+ ( [O-91 * C+ ' .' ( LO-91 )+
Moving this set over the character
.
leaves only one item:
fixed-point-number -+ ([O-91) * ' .'
0
(LO-91 ) +
which develops into fixed-point-number
"+ ( [O-91 ) *
'.'
(0
[O-91 )+
This item canbe moved over the 1, which results in fixed-point-number
" 3 ([O-91
)*
.'
([O-91
.'
( 0
e)+
This in turn developsinto fixed-point-number -+ ([O-91 ) * fixed-point-number -+ ( [ 0 - 9 1 ) *
'
.
I
10-91 ) + ([0-91)+ 0
<
recognized
We note that the last item is a reduce item, so we have recognized a token; the token class ~ brecord ~ r ,the token class and the end point, and continue is ~ i ~ e ~ - p o i n t - n ~We the algorithm, to look for a longer matching sequence. We find, however, that neither of the items can be moved over the semicolon that follows the 3 .l in the input, so the process stops. Whenatoken is recognized, its classand its endpointarerecorded,andwhena longertoken is recognized later, thisrecord is updated.Then,whentheitemset is exhausted and the process stops, this record is used to isolate and return the token found, So we and the input position is moved to the first character after the recognized token. return a token with token class f ixed-point-number and representation 3 1, and the input position is moved to pointat the semicolon.
The above algorithm searches for one token class only, but it is trivial to modify it to search for all the token classes in the language simultaneously: just put all initial items for
them in the initial item set. The input 3 .l ;will now be processed as follows. The initial item set integral-number -+ fixed-point-number -+
0
( [ 0 - 9 1 )*
’.‘
(EO-91) +
.’ ’.’
([0-g])+ ( EO-91 ) +
develops into integral-number -+ ( 0 fixed-point-number -+ fixed-point-number -+
+
EO-91 ) ( 0 [ 0 - 9 1 )* (CO-91 ) * 0
Processing the 3 results in integral-number -+ ( CO-91 0 ) + fixed-point-number “3 ( [ O - 9 1 e ) * ’.’ ( [ O - g l 1+
which develops into integral-number -+ ( 0 [ 0 - 9 1 ) + integral-number -+ ( CO-91 ) + fixed-point-number -+ ( 0 [ 0 - 9 1 ) * ’.’ fixed-point-number -+ ( [ O - 9 1 ) * 0 ’ . I
([O-91 ) + (CO-91 ) +
cc<<<
recognized
<<
recognized
Processing the .results in f
ixed-point-number-+ ( 10-91 ) *
.‘
which develops into fixed-point-number -+
(EO-91)
*
.’
( 0
CO-91 ) +
Processing the 1 results in fixed-point-number -+ ( [ 0 - 9 1 ) *
I,’
( LO-91
which develops into fixed-point-number -+ ( [ 0 - 9 1 ) * .’ ( 0 1 0 - 9 1 ) fixed-point-number “3 ( 10-91 ) * ‘. ( [ 0 - 9 1 ) +
Processingthesemicolonresultsintheemptyset,andtheprocessstops.Notethatno integral-number items survive after the decimal point has been processed. The need to record the latest recognized token is illustrated by the input 1 .g,which may for exampleoccurlegallyin FORTRAN, where .ge . is a possible form of the greater-than~or-equaloperator. The scenario is then as follows. The initial item set integral-number ”+ 0 ( CO-91 ) + fixed-point-number -+ 0 ( 1 0 - 9 1 ) * ’.’ ( LO-91 ) +
develops into integral-number -+ ( 0 [ 0 - 9 1 ) + fixed-point-number -+ ( 0 [ 0 - 9 1 ) * ’.’ fixed-point-number -+ ( CO-91 )* 0 ’.
([O-91 )+ (CO-91 1-1-
S U ~ S ~2.~1.6~ ~I rOe a ~
t ai lexical ~ ~ a ~ a l y ~ e r a ~ t o ~ a t i c a l77 ly
Processing the i results in integral-number -+ ( [0-91 0 ) + f ixed-point-number-+ ( [0-91 0 ) *
.
( [O-91 ) -t
./
([0-9] ) + ([0-9] ) c
which develops into
integral-number -+ ( integral-number -+ ( [ 0 - 9 ] ) + 0 fixed-point-number "+ ( 0 [0-91) * fixed-point-number "+ ([0-9]) *
c<<<
.'
@
recognized
Processing the .results in
f ixed-point-number-+ ( [0-91 ) *
.
@
( [0-91 ) +
which develops into fixed-point-number
-+ ([0-g])*
I
.
(0
[0-91 ) +
Processing the letter g results in the emptyset, and the process stops. In this run, two charSo the acters have already been processed after the most recent token was recognized. read pointer has tobe reset to the positionof the point character, which turned out not to be a decimal point after all. In principle, the lexical analyzer must be able to reset the input over an arbitrarily long distance, but in practice it only has to back up over a few characters. Note that this backtracking is much easier if the entire input is in a single array in memory. We now have a lexical analysis algorithm that processes each character once, except for those that the analyzer backed up over. An outline of the algorithm is given in Figure 2.21. The function Get next token () uses three functions that derive from the token descriptio~sof the language:
-Initial -Next it
item (set ) (Figure 2.22), which supplies the initial item set; em (Itset em set, Ch) (Figure 2.23), which yields the item set resulting from moving 1 t over em Ch; set -Class of: token recognized in (Item which set), checksto see ifany t emis a set reduce item, and if so, returns its token class. If there are item in 1 severalsuch items, itappliestheappropriatetie-breakingrules.Ifthere is none, it returns the valueNo l as c S.
The functions Initial item set (1 and Next item set (1 aresimilarinstructure. Both start by d e t e ~ n i n gwhich items are to be part of the new item set for external t ial it ( ) does emthis by set deriving them from the language specifireasons. Ini cation, Next item set ( ) by moving the previous items over the character Ch. Next, both functions determine which other items must be present due to the rules from Figure 2.19, by calling the function E closure ( ) . This function, which is shown in Figure 2.24, starts by applying a closure algorithm from Figure 2.25 to the Itern set being processed. The inference rule of the closure algorithm adds items reachable from other items by E moves, until all such items have been found. For example, from the input item set
IMPORT Input char[l. . l SET Read index TO1;
// as from the previous module // the read index into Input char [l
;
PROCEDURE Get next token: SET Start of token TO Read index; SET End of lasttoken TO Uninitialized; SET Class of last token TO Uninitialized; 0; SET Item set TO Initial item set WHILE Item set/= Empty: SET Ch TO Input char [Read index] ; SET Item set TO Next item set (Item set, Ch) ; SET Class TO Classof token recognized in (Item set); IF Class /= No class: SET Class of last token TO Class; SET End of last token TO Read index; SET Read index TO Read index + 1;
SET Token .class TO Class of last token; SET Token .repr TO Input char [Start of token .. End of last token] SET Read index TO End of last token + 1; Figure 2.21 Outline of a linear-time lexical analyzer.
F ~ C T I O NInitial item set RETURNING an item set: SET New item set TO Empty;
// Initial contents - obtain from the language specification: T-+R IN the language specification: FOR EACH token description SET New item set TO New item set + item T-+@R; RETURN
E
closure
(New
item
Figure 2.22 The function Ini
integral-number -+ fixedgoint-number
it produces the item set integral-number -+ ~ixe~-point-number fixed-point-number
set) ; a1t item i set for a lexical analyzer.
FUNCTION Next item set (Item set, Ch) RETURNING an item set: SET New item set TO Empty;
// Initial contents - obtain from character moves: FOR EACH itemT+a.BP IN Item set: IF B is a basic patternAND B matches Ch: SET New item setTO New item set+ item T+aB RETURN E closure
(New
item
igure 2.23 Thefunction Next
set) ;
item
set ( ) for a lexical analyzer.
FUNCTION E closure (Item set) RETURNING an item set: SET Closure set TO the Closure set produced by the it; closure algorithm2.25, passing the Item set to
// Filter out SET New item FOR EACH item IF I is a Insert R E T U R ~New
the interesting items: set TO Empty set; I IN Closure set: basic item: I in New item set;
item
set;
~ i g ~ 2.24 r e The function E
Let Closure
c losure ( 1 for a lexical analyzer.
set be a set of dotted items.
Put each item in Item
set in Closure
set.
If an item in Closure set matches the left-hand sideof one of the E moves in Figure 2.19, theco~espondingright-hand side must be present in Closure set. i
~ 2.25~ Closure ~ ealgorithm for dotted items.
We recognize the item sets from the example at the beg inn in^ of this section. The function E closure ( ) thenremoves all non-basicitemsfromtheresultandreturnsthe cleaned-up E closure.
S
We have now constructed a lexical analyzer that will work in linear time, but considerable work is still being done for each character. In Section 2.1S . 1 we saw the beneficial effect of p~ecomputingthe values yielded by functions, and the question arises whether we can do the same here. Intuitively, the answer seems to be negative; although characters are a finite domain, we seem to know nothing about the domain of I t em (The setvalue of Ini t ial it () canem obviously setbe precomputed, since it depends on the token descriptions only, but it is called only once for every token, and the gain would be very limited.) We h o w , however, that the domain is finite: there is a finite number of token descriptions in the language specification, there is a finite number of places where a dot can be put in a regular expression, so there is a finite number of dotted items, Consequently, there is a finite number of sets of items, which means that,at least in principle, we it em(1 t setem set can precompute and tabulate the valuesof the functions Next Gh) and Class of token recognized in (Item set). There is a problem here, however: the domain not only needs to befinite, it has to be reasonablysmalltoo.Supposethereare 50 regularexpressions(areasonablenumber), with 4 places for the dot to go in each. So we have 200 different items, which can be combined into 2200or about 1.6 X lo6' di€ferent sets. This seriously darkens the prospect of tabulating them all. We are, however, concerned only with item sets that can be reached by repeated applications of Next i tem set () to the initial item set: no other sets will occur in the lexical analyzer. Fortunately, most items cannot coexist with most other items in such an item set. The reason is that for two items to coexist in the same item set, their portions before the dots must be able to match the same string, the input recognized until that point. As an example, the items ./
I
some-token-l -+a * x some-token-2 "+ b * x
cannot coexist in the same item set, since the first item claims that the analyzer has just seen an a and the second claims that it has just seen a b, which is contradictory, Also, the itemsetscancontainbasic items only.Bothrestrictionslimitthenumber of itemsso severely, that for the above situation of 50 regular expressions, one can expect perhaps a few hundre~sto a few thousands of item sets to be reachable, and experience has shown that ta~ulationis quite feasible. The item set considered by the lexical analyzer at a given moment is called its state. The function Initial item set( > provides state, and the function Next item set (Item set Ch) describes its stateons; thefunction Next it em a t~ansitionunction. Thealgori If is called a ~ n i t e ~ ~autot~te Since there are only a finite number of states, it is customary to number them, starting fromSo for the initial state. The question remains how to determine the set of reachable item sets. The answer is very simple: by just const~ctingthem, starting from the initial item set; that item set is certainly reachable. For each character Ch in the character set we then compute the item set Next it em (Itsetem ,set Ch). This process yields a number of new reachable item sets (and perhaps some old ones we have already met). We repeat the process for each of the new item sets, until no new item sets are generated anymore. Since the set of I
S
item sets is finite, this will eventually happen. This procedure is called the ;it finds the reachable subsets of the set of all possible items, plus th between them. It is depicted as a closure algorithm in Figure 2.26. " "
la. A 'state' is a set of items. lb. Let States be a set of states. 2a. A 'state transition' is a triple (start state, character, end state). ions be a set of state transitions. 2b. Let Transit S to contain a single state, Initial 2. Set Transit ions to the empty set.
() set .
item
If St es at contains a state S, St at must contain es the state E and Transi tions must contain the state transition (S, Ch,E) for each character Ch in the input character set, where E = Next item set(S, Ch). ~ i ~ u 2.26 r e The subset algorithm for lexical analyzers.
For thetwotoken set ():
descn~tionsabove, we findthe initial state Initial item
integral-number -+ (* [0-91 + fixed-point-number -+ ( 0 [0-91 )* ' .' (LO-91 ) + fixed-point-number -+ ([0-91) * * ' .' ([O-91 ) +
We call this state So. For this example we consider only three character classes: digits, the decimal points and others -semicolons, parentheses, etc. We first compute Next item set ( S o , digit),which yields integral-number -+ ( 0 CO-91 ) + integral-number -+ ([O-91 ) + * .' ([O-91 ) + fixed-point-number -+ ( 0 [0-91 ) * ' fixed-point-number -+ ( [0-91) * * .' (10-91 ) +
<
recognized
I
andwhichwe call state SI;thecorrespondingtransition is (So, digit,S , ) , Next we compute Next item set (So 1,whichyieldsstate $2: I
fixedJoint-number
-+ ([0-91) * ' .'
(e
CO-91 ) +
withthetransition (So, . , S,). Thethirdpossibility, Next item set (So other) yields the empty set, whichwe call S,; this supplies transition (So, other,S,). We have thus introduced three new sets, S l,S 2 , and S, , and we now have to apply ( S 1 , digit) yields the inference rule to each of them. Next item set
I
in~egral-number -+ ( integral-number "+ ( [0-9] ) + * ixe~-~oint-number-+ ( 0 10-91 ) * umber -+ (10-93 ) * 0
<<
I
.!
recognized
( [ 0 - 9 ] )+ ([O-91 )+
ze as the state SIwe have already met. Next i t em s e t (SL umber -+ ([O-91) *
.)
I
I
(
liar state S 2 . Next i t e m s e t (S1 o t h e r ) yieldstheempty over the character classo t h e r . ow turn to state S2. Next i t e m s e t (S2 I d i g i t ) yields I
fixe~-~oint-number-+ ([0-91 ) * t-number "+ ([O-91 ) *
I
.' (* [O-91 ) + .! (L0-91 ) + *
!
<<
set
recognized
ich we call S 3 . And Next i t e m s e t ( S 2 I ' .' ) yields S,. at state S3 allows a non-empty transition only on the digits, and then No new states are generated, and our closure algorithm terminates five sets, out of a possible 64 (see Exercise 2.38). sition table Next s t a t e [ S t a t e Ch] is given in Figure 2.27; x t s t a t e nowratherthan Next i t e m s e t since theitem set S, is shown as a dash. As is usual, the states index the rows lurnns. This figure also shows the token recognition table Class ed i n [ S ta t e 1,which indicates which token is recognized in n be computed easilyby e~arniningthe items ineach state; it also if more than one tokenis recognized in a state. I
[l
sxtta t e it
Class of token recognized
in[]
Ch p ooi nt ht e r S2
S2
integral-number fixed-~oint-number
Transition table and recognition table for the regular expressions from Figure 2.20.
t is c ~ s t o ~ to a ~depict y the states with their contents and their transitions in a tra
in Figure 12.28. Each bubble represents a state and shows the item ns are shown as arrows labeled with the character that causes the ed regular expressions are marked with an e ~ c l ~ a t i omark. n To fit ubbles,someabbreviationshavebeenused: D for [0 -9 ], I for ,and F for f i ~ e d - ~ o i n t - n u m b e r .
Transition diagram of the states and transitionsfor Figure 2.20.
Precomputing the item sets results in a lexical analyzer whose speed is independent of the number of regular expressions to be recognized.The code it uses is almost identical to that of the linear-time lexical analyzer of Figure 2.21. The only difference is that in the final lexical analyzer Initial item set is a constantand Nest item set [3 and Class of token recognized [in ] areconstant arrays, Forreference,thecode for the routine Get nest token ( ) is shown in Figure 2.29, We have now reached our goal of generating a very efficient lexical analyzer that needs only a few inst~ctionsfor each input character and whose operation is i of the number of token classes it has to recognize, The code shown in Figur basic code that is generated by most modern lexical analyzer generators. An example of such a generator is Zex, which is discussed briefly in Section 2.1.9. It is interesting and in some sense satisfying to note that the same technique is used in canners. Each computer virus is identified by a specific regular exprese , and using a precomputed transition table allows the virus scanner to hunt for an arbitrary numberof different viruses in the same timeit would need to hunt for one virus.
The main component in the amountof work done by the lexical analyzer generator is proportional to the number of states of the FSA; if there are NFSAstates, NFLyAactions have to be p e ~ o ~ to e dfind them, and a table of size N F , y Athe ~ number of characters has to be -readingandparsingtheregular desc~ptions,writingthe compressed.Allothertasks
PROCEDURE Get next token: SET Startof token TO Read index; SET End of last token TO Uninitialized; SET Class of last token TO Uninitialized; SET Item set TO Initial item set; WHILE Item set/= Empty: SET Ch TO Input char [Read index]; SET Item set TO Next item set [Item set, Ch]; SET Class TO Class of token recognized in [Item set]; IF Class /= No class: SET Classof last token TO Class; SET End of last token TO Read index; + 1; SET Read index TO Read index SET Token .class TO Class of last token; of token .. End of last token] SET Token .repr TO Input char [Start SET Read index TO End of last token+ l; ~
~ i ~ 2.29 u r Outline ~ of an efficient Iinear-time routineGet
next
token () .
driver -are negligible in comparison. In principle, it is possible to construct a regular expression that requires a number of states exponential in the length of the regular expression. An example is: a-and-b-6-apart
-+ .*a......b
which describes the longest token that ends in an a and a b, 6 places apart. To check this condition, the automaton will have to remember the positions of all a s in the last 7 positions. There are 27=128different combinations of these positions. Since an FSA can dis~inguishdifferent situations onlyby having a different statefor each of them, it will have to have at least 128 different states. Increasing the distance between the a and the b by 1 doubles the numberof states, which leads to exponential growth. Fortunately, such regular expressions hardly ever occur in practical applications, and five to ten states per regular pattern are usual. As a result, almost all lexical analyzer generation is linear in the numberof regular patterns.
Ourattitudetowardstransitionstothe e e S, is ambivalent.Onthe one hand, transitions to S, are essential to the functioning of the lexical analyzer. They signal that the game is over and that the time has come to take stock of the results and isolate the tokenfound.Also,properunderstanding of somealgorithms,theorems,andproofsin finite-state automata requires us to accept them as real transitions. On the other band, it is customary and convenient to act, write, and speak as if these transitions do not exist. Traditionally, S, andtransitionsleadingto it areleftout of atransitiondiagram(seeFigure 2.28), the corresponding entries in a transition table are left empty (see Figure 2-27),
and we use phrases like 'the state S has no transition on the character C' when actually S does have sucha transition (of course it does) but it leads to S,. We will conform to this convention, but in order to show the 'real' situation, we show the transition diagram again in Figure 2.30, now with the omitted parts added.
Figure 2.30 Transition diagram of all states and transitions for Figure2.20.
exie The time required to divide a program text into tokens seems linear in the length of that text, since the automaton constructed above seems to touch each character in the text only once. But in principle this is not true: since the recognition process may overshoot the end of the token while looking for a possible longer token, some characters will be touc~e~ more than once. Worse, the entire recognition process can be quadratic in the size of the input. Suppose we want to recognizejust two tokens: single-a "+ 'ar a-string-plus-b
"+ 'a'*'b'
e input is a sequence of n as, with no b anywhere. Then the input must be tokens s ingle-a, but before recognizing each s ingle-a, the lexical hunt down the entire input to convince itself that there is no b. When it ngle-a and resets the Read index back to E n d of uaily S t a r t of token + 1 inthiscase. So recogfirst S ingle-a touches n characters, the second hunt touches IZ -1characters, aracters, etc., result in^ in quadratic behaviorof the lexical analyzer. unately,asintheprevioussection,such cases do notoccurin pro~rammin~ e lexical analyzer has to scan right to the end of the text to find out what should recognize, then so will the human reader, and a programming language withtwotokensasdefinedabovewoulddefinitelyhave a less thanaverage 998) describes a more complicated lexical analyzer that kens in linear time.
t arbitrary matrices; they exhibit a lot of structure. For one thing,
cognized, only very few characters will at any point continue that and most entries in the table are empty. rse. Densities (fill ratios) of 5% or less from a move over a character Ch all conindicate that a Ch has just been recognized, and there are not too s tend to contain only a few different values which, in addition, her columns. The idea suggests itself to exploit this redundancy table. Now with a few hundred states, perhaps a hundred difWO or four bytes per entry, the average uncompressed lexical upies perhaps a hundred kilobytes. On modern computers this code generation tables may be ten or a hundred times larger, l essential, so we will explain the techniques here. idea that may occur to the reader is to apply compression algorithms of the ety to the transition table, in the same way they are used in programs. No doubt they would do an excellent job on the the compressed table must still allow cheap access to Next digging up that value from a Lempel-Ziv compressed table ost uncomfo~able! is a richcollection of algorithms for comp~essin~ tableswhileleaving the isoptimaland each strikes a differentcompromise. As a d for theinventivemind.Most of the algo~thmsexist in very one of them can be improved with some ingenuity. We rsions of the two most commonly used algorith~s,row disfact that a large percentage of the entries are empty by putentries in those locations. How they do this differs from algorithm to algothat the so-called empty locations are not really empty but ty set S,. So we end up with locations contain in^ both a
non-empty state and S, (no location contains more than one non-empty state). When we access such a location we must be able to find out which of the two is our answer. Two solutions exist: mark the entries with enough information so we can know which is our answer, or mdke sure we never access the empty entries. The implementation of the first solution depends on the details of the algorithm and will be covered below. The second solution is i~plementedby having a bit map with a single bit for each table entry, telling whether the entry is the empty set. the co~pressedtable we check the bit, and if we find the entry is empty we have got our answer; if not, we access the table after all, but now we h o w that what we find thereis our answer. The bit map takes 1/16 or 1/32 of the size of the original uncompressed table, depending on the entry size; thisis not good for our compression ratio. Also, extracting the correct bit from the bit map requires code that slows down the access. The advantage is that the subsequent table compression and its access are simplified. And s u ~ r i s i n ~ lhavy, ing a bit map often requires less space than marking the entries.
strips: each row becomes a strip. For the moment we assume we use a bit map Empty st [at ] to weed e out all access to empty states, so we can consider the empty entries to be really empty. Now the strips are packed in aone-di~ensionalarray Entry f ] of minimal length according to the rule that two entries can share the same location if either oneof them is empty or both are the same. We also keep an array Displacement [1 indexed by row number (state) to record the positionat which we have packed theco~espondingrow in Entry [1 Figure 2.31 shows the transition matrix from Figure 2.27 in reduced form; the first column contains the row (state) num~ers,and is not part of the matrix. Slicing it yields four strips, (1,-, 2), (1, 2), (3, -, -) and (3, -, -), which can be fitted at displacements0, 0, 1 l in an array of length 3, as shown in Figure 2.32. Ways of finding these displacements will be discussed in thefollowi~gsubsection. -9
state 0
3
1 2 3
digit=l other=2 point=3 1 1 3 -
2 2
Figure 2.31 The sitio ion matrix from Figure 2.27 i n reduced form.
The resulting data structures, including the bit map, are shown in Figure 2-33. Note that we do not need to allocate room for the fourth, empty element in Figure 2.32, since it st at e, e will never be accessed. The code for retrieving the value of Next Ch] is given in Figure 2.34.
[St
at
0 l 2
1
-
2
1
-
2
3" 3"
3 1
3
2
-
Figure 2.32 Fitting the strips intoone array.
state [O. . 3 1 [l.. 3 ] = ( ( 0 , 1, 0 ) , ( 0 , 1, 0 ) , ( 0 , 1, 1 ) , ( 0 , Displacement [O. .31 = ( 0 , 0, 1, 1 ) ; Entry [l..31 = (1, 3, 2 ) ; Empty
l, l));
Figure 2.33 The transition matrix from Figure 2.27 in cornpressed form.
IF Empty state [State] [Ch] : SET New state TO No state; ELSE entry in Entry[ l is valid: + Ch3 SET New state TO Entry [Displacement [State]
;
Figur~2.34Codefor SET New s t a t e TO Next s t a t e [ S t a t e , Chl
~ s s u m i n gtwo-byte entries, the uncompressed table occupied 12 X 2 = 24 bytes. In thecompressedtable,thebitmapoccupies 12 bits = 2 bytes,thearray Displacement [] 4 x 2 = 8 bytes, and Entry [] 3 X 2 = 6 bytes, totaling 16 bytes. In this example the gain is less than spectacular, but on larger tables, especially on very large tables, the algorithm performs much better and compression ratios of 90--95% can be expected. This or less. reduces a tableof a hundred kilobytes to ten kilobytes Replacing the bit map with markings in the entries turns out to be a bad idea in our example, but we will show the techni~ueanyway, since it performs much better on large tables and is often used in practice. The idea of ~ a r ~ isn tog extend an entry with index [St e , atCh] with a field cont~ningeither the State or the Ch,and to check this field when we retrieve the entry. Marking with the state is easy to understand: the only entries mar~edwith a state S in the compressed array are those that originate from the strip with the values for S, so if we find that the entry we retrievedis indeed marked withS we know it isfrom the correct state.
The same reasoning cannot be applied to marking with the character, since the characterdoesnotidentify the strip.However,when we indexthepositionfoundfrom Displacement [ S t a t e ] by a character C and we find there an entry marked C, we know that it originates from a strip starting at D i splacernent [ S t a t e ] . And if we So we make sure that no two strips have the same displacement, this identifies the strip. can also mark with the character, provided no two strips get the same displacement. Sincethestaterequirestwobytes of storageandthecharacteronlyone,wewill choosemarking by character(seeExercise2.20 for theotherchoice). The stripsnow become ((1, l), -, (2, 3)j, ((1, l),-, (2, 3)j, ((3, l), -, -j and ((3, l), -, -), which can be fitted as shown in Figure 2.35. We see that we are severely hindered by the requirement that no two strips should get the same displacement.The complete data structures are shown in Figure 2.36. Since the sizes of the markings and the entries differ, we implement them in different arrays. The corresponding code is given in Figure 2.37. The array Displace ment [] still occupies 4 X 2 = 8 bytes, Mark [] occupies 8 X l = 8 bytes, and E n t r y [] 6 X 2 = 12 bytes, totaling 28 bytes. We see that our gain has turned into a loss.
Figure 2.35 Fitting the strips with entries markedby character.
Figure 2.36 The transition matrix compressed withmaking by character.
As mentionedbefore,eventhebestcompressionalgorithms do notworkwell on small-size data; there is just not enough redundancy there. Try compressing a 10-byte file with any of the well-known file Compression programs!
~ i n ~the i n best ~~ i s ~ ~ u ~ e ~ e n t s Finding those displacements that result in the shortest entrym a y is an N~-completeproblem; seebelow for ashort i~troductiontowhat‘NP-complete’means. So wehave to resort to heuristics to find sub-optimal solutions. One good heuristic is to sort the strips
IF Mark ~ ~ i s p l a c e m e ~ t [ S t a t e-t- ] Ch] /= Ch: SET New s t a t e TO N o s t a t e ; LSE e n t r y i n E n t r y [ l i s valid: SET New s t a t e TO Entry[Displacement[State]
-t.
Ch] ;
Code for SET New s t a t e TO Next s t a t e [ S t a t e , Chl for marking by character.
sity, with the most dense (the one with the most on-empty entries) first, extensible array (see Section 5.1.3) of entries, in which we store the strips means that we take the strips in decreasing order as sorted, and storeeach on from the left in which it will fit without conflict. A conflict arises if and the strip have non-empty entries at a certain position and these entries
/ 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1
I
~ 1 0 0 0 0 1 0 1
A damaged comb finding room for its teeth.
to picture the non-empty entries as the remaining teeth on a damaged the first-fit a l g o r i t ~as~finding the first place where we can stick in the comb oing into holes left by the other combs; see Figure2.38. This is why the algorithm is sometimes called thecorn works because it does the difficult ca filled ly strips) first, room in the holes leftby their big y sparse strips come later and can find osophy underlies many fitting heuristics: fit the large objects first and s in the holes left over; this applies equally to packing vacation gear in anced table compression algorithm using row displacement is given
by
As a rule, solvingverifying a pro a solution, once it hasbeengiven.Forexample,sortinganarray of n elements costs at least O(n In n ) operations, but verifying that an m a y is sorted can be done withn-l operations.
There is a large class of problems which nobody knows how to solve in less than exponential time, but for which verifying a given solution can be done in less than exponential time (in so-called Rem~kably,all these problems are equivalent in the be converted to any of the others without introducing exponential time is is so, again nobody knows. These problems are te problems, for ‘nondete~inistic-polynomial’. By extension, any problem that can be solved by using an NP-complete problemwithoutintroducing an extra exponentialtimedependency is also called NP-complete, evenif its solutions cannot be verified in polynomial time, Finding the displacements for the shortest may is such a problem. The real N~-completeproblem is ‘Give a set of displacements that result in an array of length k’ since for this problem verifying the answer is easy. Once we have an algorithm A ( k ) for this problem, we can use it to find the shortest array by doing binary search, until we find a value kmjn suchthat A ( ~ ~ i nyields ) a set of displacementsand A (kmin--l)yields ‘There isn’t one’. For the problems with verifying the negative answer, see any book on algorithmic complexity, for example Lewis and Papadimitriou(1998). It is unlikely that an algorithm will be found that can solve these NPcompleteproblemsin less thanexponentialtime,butfortunatelythis need not worry us too much, since for almost all of these problems good heuristic algo~thmshave been found, which will yield answers that are good enough to work with. The first-fit decreasing heuristic for row displacement is an example.
There is another, less intuitive, technique to compress transition tables, which works better for large tables when used in combination with a bit map to check for empty entries. En this approach, we selecta subset S from the total setof strips, such thatwe can combine all strips in S without displacement and without conflict: they canjust be positioned all at the same location. This means that the non-empty positions in each strip in S avoid the nonempty positions in all the other strips or have identical values in those positions. It turns out that if the original table is large enough we can find many such subsets that result in
packings in which no emptyentries remain. The non-empty entries in the strips just fill all the space, and the packing is optimal. The sets are determined by first c o n s t ~ c t i ~and g then coloring a so-called inter^^^,a graph in which each strip is a node and in which there is an edge between each pair of strips that cannot coexist in a subset because of conflicts, Figure 2.39a shows a fictitious but reasonably realistic transition table, and its interference graph is given in Figure 2.40.
w x y z 0
l 2 3 4 5 6 7
w x y z w x y z 0
1 3 l -
2" - 4 - - 6 2 " -" 5 1 - 4 -
12"
l 2 3 4 5 6 7
- 7 " " "
3 - 4 l - - 6 - 2 "
"_
5
1 - 4 - 7 " " "
1 2 4 63 7 4 5
(b) igure 2.39 A transition table (a)and its compressed form packedby graph coloring (b).
/O\
2 4
5
3
7
0 Interference graph for the auto~atonof Figure 2.39a.
Thisseeminglyarbitrarytechniquehingesonthepossibility of coloring a graph (almost) optimally. A graph is colored when colors have been assigned to its nodes, such that no two nodes that are connectedby an edge have the same color; usually one wants to color the graph with the minimal number of different colors. The important point is that thereareverygoodheuristic algorith~stoalmostalwaysfindtheminimalnumber of colors; the problem of always finding the exact minimal number of colors is again NPcomplete. We will discuss some of these algorithms in Section 4.2.7, where they are used
for register allocation. The relation of graph coloring to our subset selection problem is obvious: the strips correspond to nodes, the colors correspond to the subsets, and the edges prevent conflicting stripsfromendingupinthesamesubset.Withoutresortingtothemoresophisticated heuristics explained in Section 4.2.7, we can easily see that the interference graph in Figure2.40canbecoloredwithtwocolors.Ithappens to be a tree,andanytreecanbe colored with two colors, one for the even levels and onefor the odd levels. This yields the packing as shown in Figure 2.39b. The cost is 8 x 2 = 16 bytes for the entries, plus 32 bits = 4 bytes for the bit map, plus 8 x 2 = 16 bytes for the mapping from state tostrip, totaling 36 bytes, against 32x 2 = 64 bytes for the uncompressed matrix.
The only error that can occur in the scheme described in Section 2.16 4 is that no regular expression matches the currentinput. This is easily remedied by specifying at the very end of the list of regular expressions a regular expression .,which matches any single character, and have it return a token Unknown character. If no further action is taken, this token is then passed to the parser, which will reject it and enter its error recovery. Depending on the quality of the error recovery of the parser this may or may not bea good idea, but it is likely that the resulting error message will not be very informative. Since the lexical analyzer usually includes an identification layer (see Section 2.1 .lo), the same layer can be used to catch and remove the Unknown character token and give a more sensible error message. If one wants to be more charitable towardsthe compiler user, one can add special regularexpressionsthatmatcherroneoustokensthatarelikelytooccurintheinput. An example is a regular expression for a fixed-point number along the above lines that has no digits after the point; this is explicitly forbidden by the regular expressions in Figure 2.20, but it is the kind of error people make. If the grammar of the language does not allow an integral-number to be followed by a point in any position, we can adopt the specification integral-number + [ 0 - 9 1 + fixed-point-number -+ [0-97 . EO-91 + bad-fixed-point-number -+ 1 0 - 9 ] * ' . ' *I
This specification will producethe token bad-f ixed-point-number on such erronea warningorerrormessage, ous input. The lexicalidentificationlayercanthengive append a character 0 to the end of Token .repr to turn it into a correct representation, and change Token .class to fixed-point-number. Correcting the representation by appending a 0 is i~portant,since it allows routines further on in the compiler to blindly accept token representations knowing that they are correct. Thisavoidsinefficientchecksinsemanticroutinesoralternativelyobscure crashes on incorrect compilerinput. It is in general imperative that phases that check incoming data for certain properties do not pass on any data that does not conform to those properties, even if that means patching the data and even if that patching is algorithmically inconvenient. The only alternative
processing altogether. Experience has shown that if the phases of a ere strictly to this rule, avoiding compiler crashes on incorrect proIt. Following this rule does not prevent all compiler crashes, but at t for each incorrect program that causes a compiler crash, there is also a that causes thesane compiler crash.
of the UNIX program lex. ral freely available impleNIX lex. Some of these d are veryefficient. t for the same token set ular definitions, one for liary C code. Theproon of a single routine, urprising, since it coninput file according to [1. When the code executes a return statement with some value, the rou( ) returns with that value; otherwise, yylex () proceeds to isolate the next
set-up is conve~ientfor both retrieving and skipping tokens. ee sections in the lexical analyzer desc~ptionare separated by lines that conons which correspond o the lex fomat. The the names of regular ssions when they are applied rather than defined. The section also includes the file to introduce definitions for the token classes; the presenceof the C code is signaled y the markers % { and % }. e second section contains the regular expressions for the token classes to be recogerwiththeirassociated C code;againtheregularexpressionnamesare braces. We see that the code segments for integer, identifier, l e x () by returning lass this is the first in yyt ext []. Comment and layout is skipped automatically by m e n t means: a # occu~ingzero or that stops ata #, include the # as well. the third section.
%{
#include
l t l e x .h"
Token-Type Token; int line-number = 1; %}
whitespace
\tl
letter [a-zA-Z] digit LO-91 11 I1 underscore 1 {digit} letter-or-digit("letter} ({underseore}{letter-or-digit}+) underscored-tail ({letter}{letter-or-digit}*{underscored-tail}*) identifier
[-+*/l L; {}l
operator separator
I
O
%S&
{digit}+ {identifier} {operator} I {separator} # ["#\nI*#? {whitespace} \n
{return INTEGER;} {return I~ENTIFIER;} {return yytext[~I;} {/* ignore comment */} {/* ignore whitespace */} {line-number++;} {return ERRONEOUS;}
%%
void void
start-lex
(void) {}
get-next-token (void) { Token.class = yylex0 ; if (Token.class == 0 ) { Token.class = EOF; Token.repr = l l c E O F > l l ; return;
1
1
Token.pos.line-number = line-number; strcpy(Token.repr = (char*)malloc(strlen(yytext)+l) yytext) ; I
(void) {return l;} 1 Lex input for the token set from Figure2.20.
that case. Finally, since the representation of the token in the array yytext [1 will be overwritte~by that of the next token, it is secured in Token.repr by the fairly horrible C-ism in the last statel~entin the routine. The decl~ationof yywrap ( ) assists in proper end-of-file ~ a ~ d ~further i n ~details ; can be found in any lex manual, for example that by Levine, Mason and
The handwritten lexical analyzer of Section 2.1.5 recorded the position in the input file of the token delivered by t r a c ~ n gthat position inside the routine next-char ( ) . Unfortunately, we cannotdo this in a reliable way in lex, for two reasons. First, some variants of lex read ahead arbitrary amounts of input before producing the first token; and second, some use the UNEX input routine f r e a d ( ) rather than g e t c () to obtain input. In both cases, the relation between the characters read and the token recognizedis lost. We solve half the problem by explicitly counting lines in the lex code. To solve the entire problem and record also the character positions inside a line, we need to add code to measure and tally the lengths of all patterns recognized, We have not shown this in our code to avoid clutter. This concludes our discussion of lexical analyzers proper. The basic purpose of the stream of tokens generated by a lexical analyzer in a compiler is to be passed on to a syntax analyzer. For purely practical reasons it is, however, convenient, to introduce additional layers between lexical and syntax analysis. These layers may assist in further identification of tokens (Section 12.1.lo), macro processing andfile inclusion (Section 2.1.12.1), conditional text inclusion (Section 2.1.12.2), and possibly generics (Section2.3 .l 2.3). We will now first turn to these intermediate layers.
In a clean design, the only task of a lexical analyzer is isolating the text of the token and identifying its token class. The lexical analyzer then yields a stream of (token class, token representation) pairs. The token representation is carried t ~ o u g hthe syntax analyzer to the rest of the compiler, where it can be inspected to yield the appropriate semantic information. An example is the conversion of the representation 37 7 # 8 (octal 377 in Ada) to the integer value 255. in a broad compiler, a good place for this conversion would be in the initialization phase of the annotation ofthesyntaxtree,wheretheannotationsthat derive from the tokens form the basis of further attributes. In a narrow compiler, however, the best place todo computations on the token text is in the lexical analyzer. Such computations include simple conversions, as shown above, but also more elaborate actions, for example identifier identification. Traditionally, almost all compilers were narrow for lackof memory and did considerable semantic processing in the lexical analyzer: the integer value 255 stored in two bytes takes less space than the string re~resentation 37 7 # 8'I . With modern machines the memory considerations have for the most part gone away, but language properties can force even a modern lexical analyzer to do some semantic processing. Three such properties concern identifiers that influence subsequent parsing, macro processing, and keywords. In C and C++, typedef and class declarations introduce ~~entifiers that in~uencethe parsing of the subsequent text. In particular, in the scopeof the declaration t y p e d e f i n t T;
the code fragment
(T *)
is a cast which converts the subsequent expression to ‘pointer to ‘I”,and in the scope of the variable declaration int T;
it is an incorrect expression with a missing right operand to the multiplication sign. In C and C++ parsing can only continue whenall previous identifiers have been identifiedsufficiently to decideif they are type identifiers or not. We said ‘identified sufficiently’ since in many languages we cannot do full identifier identification at this stage, Given the Ada declarations type Planet = (Mercury, Venus, Earth, Mars) ; type Goddess = (Juno, Venus, Minerva, Diana) ;
then in thecode fragment for P in Mercury .. Venus loop
the identifier Venus denotes a planet, and in for G in Juno .. Venus loop
it denotes a goddess.Thisrequiresoverloadingresolutionandthealgorithm for this belongs in the context handling module rather than in the lexical analyzer. (Identification and overloading resolution are covered in Section 6.1.1 .) A second reason to have at least some identifier identification done by the lexical analyzer is related to macro processing. Many languages, including C, have a macro facility, which allows chunks of program text to be represented in the program by identifiers. Examples of parameterless macros are #define EOF #define DIGIT 257
256
from the lexical analyzer in Figure1,11; a macro with parameters occurred in #define is-digit (c)
(’0’
c=
(c)
&&
(c)
c=
’9’)
The straightforward approach is to do the macro processing as a separate phase between reading the program and lexical analysis, but that means that each and every character in the program will be processed several times;also the intermediate result may be very large. See Exercise 2.26 for additional considerations. Section 2.1 .l2 shows that macro processing can be conveniently integrated into the reading module of the lexical analyzer, provided the lexical analyzer checks each identifier see to if it has been defined as a macro. A third reason to do some identifier identification in the lexical analyzer stems from the existence of keywords. Most languages have a special set of tokens that look like identifiers but serve syntactic purposes: the keywords or reserved words. Examples are if, switch, case, etc. from Java and C, and begin, end, etc. task, from Ada. There is again a s~aig~tforward approach to deal with the problems that are caused by this, which is specifying each keyword as a separate regular expression to the lexical analyzer, textually before the regular expression for identifier. Doing so increases the size of the transition table considerably, however, which may not be acceptable.
These three problems can be solved by doing a limited amount of identifier identification in the lexical analyzer, just enough to serve the needs of the lexical analyzer and parser. Since identifier identi~cationhas many more links with the rest of the compiler than the lexical analyzer itself has, the process is best delegated to a separate module, the Get next symboltablemodule.Inpracticaltermsthismeansthattheroutine token () ,which is our version of the routine get-next-token () described extensively above, is renamed to something like Get nextS irnple token () ,and that the real Get next token ( ) takes on the structure shown in Figure 2.42. The procedure Switch to macro () does the fancy footwork needed to redirect further input to the macro body; see Section 2.1.12.1 for details.
FUNCTION Get next token( ) RETUR~INGa token: SET Simple token TO Get next simple token 0 ; IF Class of Simple token = Identifier: SET Simple token TO Identify in symbol table (Simple token); // See if this has reset Class of Simple token: IF Class of Simple token = Macro: ; Switch to macro (Simple token) RETURN Get next token () ; ELSE Class of Simple token /= Macro: // Identifier or Type identifier or Keyword: RETURN Simple token; /= Identifier: ELSE Class of Simple token RETURN Simple token; Figure 2.42 A Get
next
token () that does lexical i ~ ~ n t i f i ~ ~ t i ~ n ,
Effectively this introduces a separate phase between the lexical analyzer proper and the passer, the lexical i ase, as shown Figure in 2.43. Lexical identificaeRemer,19'74).Oncewehavethismechanism in place, it tion is also called scree can also renderservices in theimplementation of genericdeclarations;thisaspect is covered in Section 2.1.12.3. We will first consider i~plementationt e c h ~ i ~ for ~ e symbol s tables, and then see how to do macro processing and file incl~sion;the section on lexical analysis closes by examining the useof macro processing ini m ~ l e ~ e n t i ngeneric g declarations.
a sy e list) can be thought of as an extensible array of En its basic form records that can be indexedby a string rather thanby an integer. The string is the identifier and the associated record contains collected information about the identifier. The basic interface of a symbol table module consistsof one single function:
3 Pipeline from input to lexical identification.
FUNCTION Identify (Identifier name) ~ E T U ~ N IaNpointer ~ to Identifier info;
When called with an arbitrary string Identifier name it returns a pointer to a record if inf o;when it is later called with that same string, it returns the of type Idlent ier same pointer, reg~dlessof how often this is done and how many other calls of Identi (f ) intervene. y The compiler writer chooses the record typeIdent i f soier that all pertinent i n f o ~ a t i o nthat will ever need to be collected for an identifier can be stored in it. It is i ~ p o ~ athat n t the function Identify ( ) returns a pointer to the record rather than a copy of the record, since we want to be able to update the record to collect i n f o r ~ a tion in it. In thisrespect Identify () acts just likean may of recordsin C. If C allowed arrays tobe indexed by strings, we could declare an array struct
Identifier-info
Sym-table [11111 ;
anduse &Sym-table [Identif ier-name] instead of Identify (Identif ier name). When used in a symbol table module for a C compiler, I dent if i er could, info for example, contain pointers to the following pieces of information: 1. the actual string (for error messages and for verification in hashing; see below); 2. a macro definitio~(see Section 2.1.12); 3. a keyword definition; 4. a list of type, variable and function definitions (see Section 6.1.1); S. a list of struct and union name definitions (see Section 6.1.1); 6. a list of struct and union field selector definitions (see Section 6.1.1);
In practice, manyof these pointers would be null for most of the identifiers. This approach splits the problem of building a symbol table module into two problems: how to obtain an access mechanism which emulates an extensible string-inde~able array, and how to design and maintain the data attached to the string (which in our case is the identifier). The latter problem is actually a set of problems, since many pieces ofinformation about identifiers have to be collected and maintained, for a variety of reasons and often stemming from different partsof the compiler. We will treat these where they occur; here we will turn to the i ~ p l e ~ e n t a t i of o nthe access mechanism.
info
It would be nice if implementation languages provided extensible string-indexable arrays as a built-in type constructor, but with the exception of awk, Perl, and a few others, they don’t. There are several ways to implement such a mechanism. he most obvious one is simpletoimplementbutveryinefficient:usealinkedlist of (string,data)pairs. The access timeis linear in the number of entries, which makes thecost of entering n identifiers into the symbol table a quadratic function of n: the cost is O(n2). To get to the desired string more easily, a tree structure-binary or more advanced -might be used. The access time is now logarithmic in the number of entries, which makes the total cost O(n In n ) , which is better. The best approach is to use a hash table, which has almost constant, or, with some care, totally constant access time. This makes the total cost linear. variants in hash algorithm design; we will present here whatis called . The basic building block of hashing is the hash function. ~ u n ~ t i oaccepts n a string as a parameter and maps it onto an integer in the range 1, for some N. Reasonable values of N range from a few hundreds to a few tens of thousands,butinprincipleanynumberwillwork.Theseintegersaretobeusedas indexesintoanarray of N elements,the e. Eachelementinthehashtableisa container for set aof (string, data) pair containers are often unceremoniously e idea t s is. that a stringS, which is m a p ~ e donto the integer A4 by the hash called ~ ~ ~ ~The function, will be stored in bucket number M , together with its data. Since the buckets are elements of an array, finding the right bucket can be done in constant time, by indexingthe hash table. A call of Identify (Identifier name) starts by computing the hash value of Identifier name, to find the numberof the bucket that is the proper placefor Ident if i er and name its associated data. It finds the bucketB by indexing, and examinesit to see if it contains the record (Identifier name, data). If it is indeed found, Identi y (f) returns the address of the record. If it is not found, Identifier name has never been seen before: it certainly cannot be inanyoftheotherbuckets. The call to Identify ( ) thencreatesanemptyrecord for it inthebucket B, storesthedata of Ident i f in ier it, and returns name the address of the record. An outline of the code for the function I dentis given i in fFigurey 2.44. Inevitably,sometimestwodiffe gswillhashto the same integ~randendup in . Ifwechoose N muchlargerthanthelargest thesamebucket,inwhat is calleda expected number of different identi~ersin a program (a not too well-defined concept!), most buckets will be empty, many will contain one string, and a few will contain more than onestring. We canthereforeusethesimplestdatastructureto i m p l e l ~ e the ~ t buckets: linked lists. Since the Ident i f ier records ininfo the linked lists still contain their strings we can easily find the corresponding record by stepping through the linked list. If we do not find a corresponding record, we have to add an empty one. So we arrive at the datalayoutshowninFigure2.45,inwhichthreeidentifiers, length, width, and height have been introduced into a hash table of size 5 (actual sizes are more likely to be between 1000 and l 0 000). The hash function used was *
FUNCTION Identify (Identifier name) RETURNING a pointer to Identifier info: SET Bucket number TO Hash value (Identifier name); IF there is no Identifier info record for Identifier name : in Bucket [Bucket number] Insert an empty Identifier info record for Identifier name ; in Bucket [Bucket number] RETURN Pointer to Identifier info record for Identifier name in Bucket [Bucket number] ; Figure 2.44 Outline of the function Identi f y.
( sum
of the ASCII values of the characters in the name length of name ) MOD table size
+
and it so happens that bothwidth and height map onto the same integer,4.
hash-table bucket 0 bucket l bucket 2 bucket 3 bucket 4
_" "_
w i d t h \ O h e i g h t \ O
Figure 2.45 A hash table of size 5 containing three identifiers.
" -
Wecannow see whatthefunction Identify insymbol table (Token) in Figure 2.46 must do. It calls the function Identify (Token) from Figure 2.44, examines the record obtained, and resets the class of the token accordingly, if needed. A possiinsymbol table (Token) based on the data proposed ble structure for Identify above for C is shown in Figure 2.46. To accommodate the new i n f o ~ a t i o nin the token wehaveextendeditwith a field Identifier info pointer; this is one of the extensions discussed in Section 2.1.5. Since the extensions are usually very class-specific, they can be allocated in a union, so their space require~entsare not a problem.
F ~ C T I O NIdentify in symbol table (Token) RETURN^^^ a token: SET Identifier info pointer TO Identify (Representation of Token); IF Identifier info indicates macro: SET Classof Token TO Macro; ELSE IF Identifier info indicates keyword: SET Classof Token TO Keyword; ELSE IF Identifier info indicates type identifier: SET Class of Token TO Type identifier; ELSE Identifier info indicates variable, struct, union, etc. identifier: // Do not change Classof Token; Append Identifier info pointer to Token; RETURN Token; F i ~ 2.46 ~ r A~ possible Identify
in
symbol
table (Token)for C.
Note that if the token represents an identifier, it now has two references to the identifier in it: one as the representation that any token has, and one in the identifier info. And what is more, the two do not have to be the same. For example, in a language that allows but does not differ~ntiatelower-case and upper-case letters in identifiers (as does Ada) the representation will be the actual representation of that token, for example PostScript in one place and Post script in another, and the string in the identifierinfo will always be the same canonical representation,for example POSTSCRIPT.
e buckets introducesa linear time component in the access mechanism and a quadratic one in the entire algorithm. This can easily be seen by considering what happens when we usea hash table of size 1:all strings will be mappedonto the same integer, 0, and a single linked list will result. In fact, the hash table does nothing fundamental to the quadratic nature of the algorithm; it just replaces one list of length n by N lists of average length n/N. A simplebuteffectivewaytoimproveefficiency is by makingthelists S . The easiest way to effect this is by moving the record that is accessed to the head of the linked list. The rationale is that when an identifier is used at a certain position
in the progr~m,it is likely that it will again be referenced soon. So the next access will find the most recently used identifier at the beginning of the list. This simple trick avoids almost all list searching.
An important deter~inantof the quality of a hash table mechanism is the hash function. A good bash function should be fast and should map the identifiers evenly over the integers, withnoobviousrelationshipbetweentheidentifiersthatmapontothesameinteger. Related identifiers like a, b,ab,ba, AB, etc. BA,should Ab, map onto different integers; the hash value should dependon the value of each character in the string and also on their order. It is surprisingly easy to spend considerable effort on designing a hash function and still come up with a bad one. Mc~enzie,Harries and Bell (1990) have analyzed the performance of several widely used hash functions and have concluded that the following simple iterative a~gorithmperforms very well, provided that ‘good’co~binationsof the hash table size N and the multiplication constantk are chosen. h0 = 0 hi = k ~ h , -+~ci
for 1 S i S n
ITS(h,, 30) MOD N
Here, n indicates the length of the identifier, ci is the ASCII value of tine i-th character in the identifier, and B I T S ( h , ~ yields ) the least significant (rightmost)m bits of the integer h. Excellentvalues for k and N appearto be61 3 and 1008. A fasteralgorithmcanbe obtained for k =4 and N=1403, since the multiplication by 4 can be implementedby a very efficient left-~hift,butthesmallvalue of k causes sets of similar identifiers to cluster. Obviously, larger values for N result in a better performance, because there will be fewer collisions, but raising N will also cost more memory. With k=613 and N=1008, we obtain width, and height, the inte~ers610, 812, and 3005 for the hash values of length, respectively.
d having a certain string as a value is to be substituted in its place. A macro definition can specify formal parameters, which have to be substituted by the actual p~ameters.An example in C is #define is-capital (ch)
(’A‘ c= (ch)
&&
(ch)
c=
’Z‘ )
whichstatesthat is-capital (ch) mustbesubstituted by ( A’ <= (ch) && (ch) <= 2; ) withthepropersubstitution for ch. Theparenthesesaroundtheexpressionandtheparametersserve to avoidprecedenceconflictswithoperatorsoutsidethe expression or inside the parameters, A call (also called a is-capital
(txt [i] )
which supplies the actual parametert x t C i 1,is to be replacedby ('A'
<=
(txt[i])
&&
(txt[i]) <=
'Z')
The string value of the macro is kept in the macro field of the record associated with the identifier. We assume here that thereis only one level of macro definition, in that each of I, regardless of macrodefinition of anidentifier I overwritesapreviousdefinition scopes. If macro definitions are governed by scope in the source language, the macro field will have to point to a stack (linked list) of definitions. Many macro processors, i n c ~ ~ d i nthat g of C, define a third substitution ~ecllanismin addition macro to substitution and parameter substitution: . A file inclusion co~espondingfile is directive contains a file name, and possibly formal parameters; the retrieved from the file system and its contents are substituted for the file inclusion directive, possibly after parameter substitution. In C, file inclusions can nest to arbitraryde th. Another text manipulation fe the ones mentioned above, is G ~ i ~ a t i o nActually, . con better name, b a would be feature is traditionally called conditional compilation. The text inclusion is controlled by some form of if-statement recognizable to the macro processor and the condition itinmust be such that the macro processor can evaluate it. It may, for example, test if a certain macro has been defined or compare two constants. If the condition evaluates to true? the text up to the following macro processorELSE or END I F is included; nesting macro processor I F statements should be honored as they are met in this process. And if the condition evaluates to false, the text up to the following macroELSE or END I F is skipped, but if an ELSE is present, the text between it and the matching END I F is included instead. An example from C is #ifdef UNIX char *file-name-separator = #else #ifdef MSDOS char *file-name-separator = '\\'; #endif #endif l / ' ;
Here the #if def U N I X tests if the macro U N I X has been defined. If so, the line char *f ile-name-separator = / ;is processed as programtext,otherwise a test for the presence of a macro MSDOS is done. If both tests fail, no program code results from the above example. The conditional co~pilationin C is line-oriented; only complete lines can be included or skipped and each syntax f r a g ~ e n involved t in co~ditionalc o ~ ~ i l ~ t i o ~ occupies a line of its own. All conditional compilation markers start with a #l character at the beginningof a line, which makes them easy to spot. Some macroprocessorsallowevenmore e~aboratetextmanipulation. The PL/I preprocessor features for-statements and procedures that will select and produce program text, in addition to if-statements. For example, thePL/I code
%DECLARE I FIXED; %DO I := 1 TO 4 ; A(1) := I * (I - 1); %END; %DEACTIVATE I;
in whicb the% sign marks macro keywords, produces the code A(l)
:= l * (1
-
1);
A(4)
:= 4 * ( 4
-
1);
A(2) := 2 * ( 2 - l); A(3) := 3 * (3 - l) ;
In fact, the PL/I preprocessor acts on segmentsof the parse tree rather than on sequences of characters, as the C preprocessor does. Similar techniques are used to generate structured document text, infor example SGML or XML, from templates,
features can be implemented conveniently by a single mechanism: rs. Each stack element consists at least of a read pointer and an end-of-text pointer. If the text has been read in from a file, these pointers point into the c o ~ e s p o n d i nbuffer; ~ this is the case for theinitial input file and for included files. If the text is already present in memory, the pointers point there; this is the case for macros and parameters. The initial input file is at the bottom of the stack, and subsequent file inclusions, macro calls, and parameter substitutions are stacked on top of it. The actual input for the lexical analyzeris taken from the top input buffer, untilit becomes exhausted; we h o w this has happened when the read pointer becomes equal to the end pointer. Then the input buffer is unstacked and reading continues on what is now the top buffer.
~ac~-calls The input buffer stack is incorporated in the module for reading the input. It is controlled by i n f o ~ a t i o nobtained in the lexical identification module, which is at least two steps further on in the pipeline. So, unfo~unatelywe need up-calls, or rather back-calls, to signal macro su~stitution,which is recognized in the lexical identification module, back to the input module, See Figure 2.47.
\r
. . .. 4
,
c"
c '
...-c
-
" " " " " C
back-calls
7' Pipeline from input to lexical ide~tification, with feedback.
It is easy to see that ina clean modularized system these back-callscannot be written. We have seen that a lexical analyzer can overshoot the end of the token by some characters, and these characters have already been obtained from the input module when the signal to do macro expansion arrives, This signal in fact requests the input module to insert text before characters it has already delivered. More in particular,if a macro mac has been defined as donald and the input reads mac ; ,the lexical analyzer requires tosee the characters m, c, a,and ;before it can recognize the identifier mac and pass it on. The lexical identification module then identifies mac as a macro and signals to the input module to insert the text donald right after the end of the characters m, a, and c. The input module cannot do this since it has already sent off the semicolon following these characters. Fighting fire with fire, the problem is solved by introducing yet another back-call, one from the lexical analyzer to the input module, signaling that the lexical analyzer has back” tracked over the senicolon. This is something the input module ca~limplement, by just resetting a read pointer, since the characters are in a buffer in memory. This is another advantage of maintaining the entire program text in a single buffer. If a more complicated buffering scheme is used, caution must be exercisedif the semicolon is thelast character in an input buffer: exhausted buffers cannot be released until it is certain that no more backtracking back-calls for their contents will be issued. Depending on the natureof the tokens and the lexical analyzer, this may bedifficult to ascertain. All in all, the three modules have to be aware of each other’s problems and internal functions;actuallytheyformoneintegratedmodule. Still, thestructureshown in Figure 2.47 is helpful in p r o g r a ~ i n the g module(s). andl ling the parameters requires some special care, on two counts. The first one is that e any~substitution ~ ~ one has to be careful to determine the extent of an actual parameter~ has been applied to it. Otherwise the sequence #define A #define B (p) B (A)
a,b p
would cause E3 ( A ) to be replaced by E3 (a ,b) which gives B two p~ametersinstead of the required one. The second concerns the substitution itself. It requires the formal parameters to be replaced by the actual parameters, which can in principle be done by using the normal macro substitution mechanism. In doing so, one has to take into account, however, that the scope of the f o ~ a parameter l is just the macro itself, unlike the scopes of real macros, which are global. So, when we try to implement the macro call is-capital
(txt Eil )
by simply defining its formal parameter and substituting its body: #define ch txt [i] ( ’ A ’ <= (ch) && (eh) <= ‘Z ‘ )
e
wemayfindthatwehave is-capi ta1 (ch. +
just redefinedanexistingmacro
ch. Also,thecall
1 ) would produce
#define ch ch -I- 3. ('A' <= (Ch) && (ch) <= 'Z')
with disastrous results, One simple way to implement this is to generate a new name for each actual (not formal!) parameter. So the macro call is-capital (txt[il )
may be implemented as txt [il #define arg-00393 ('A' c= (arg-00393) && (arg-00393) c= 'Z')
that txt [i1 happens to be the 393rd actual parameter in this run of the macro processor. ~ o r m aprocessing l then turns this into ('A'
c=
(txt[i])
&&
(txt[i]) <=I' Z ' )
which is correct. A more efficient implementation that causes less clutter in the symbol table keeps a set of 'local' macros with each buffer in the input buffer stack. These local ly to that bufferonly; their values are set from the actual parameters. shows the situation in which the above macro call occurs in an included file mac .h; the lexical analyzer has just read the [in the first substitution of the parameter, ~ependingon the languagedefinition, it may or may not be an errorfor a macro to be recursive or for a file to include itself; if the macro system also features conditional text inclusion, such recursion may be meaningful. A check for recursion can be made simply down the input buffer stack andcomp~ingidentifiers.
is usually simple to implement; the difficult en pipeline of Figure 2.47, or the input buffer stack of Figure 2.48, The answer varies considerably with the detailsof the mechanism. Conditional text inclusionas described in the lang~agemanual is controlled by certain items in the text and acts on certain items in the text. The C preprocessor is controlled by \n# [\ n \ t 3 * [a- z] + (which tokensthatarematched by theregularexpression describes tokens like #if def stating right after a newline). These tokens must be recognizedbythe tokenizin~process,topreventthemfrombeingrecognizedinsideother tokens, for example inside c o ~ e n t s Also, . the C preprocessor works on entire lines. The L/I reprocessor is controlled by tokens of the form % [A-Z ]* andworksontokens ed in the usual way by the lexical analyzer. e main point is that the place in theinput pipeline where the controlori~inatesmay differ from the place where the control is exerted, as was also the case in macro substitumake the int~ractionpossible, interfaces must be present in bothplaces. ,in the G preprocessor, a layer must be inserted between the input module and the
parameter
development of ch endpts seadptr
macso
( ' A ' < = ( c h ) & & ( c h ) ~ = l ~ ' )
'p
ch = txt [i] parameter endptslist
file
I..
readptr
. . i s - c a p i t a l ( t x t
[ i l l . .. - . .
"
f
mac.h
" 3 -
endpts
seadptr
file
is-capital
I\
I . . . . # i n c l u d e1 8 m a c . h 1 r . . . .
'p
input.c
eztr
readptr
Figure 2.48 An input buffer stack of include files, macro calls, and rnacro parameters.
lexical analyzer. This layer must act on input lines, and must be able to perform functions like 'skip lines up to and including a preprocessor #e 1s e line'. It is controlled from the lexical identification module, as shown in Figure 2.49.
. . . . ..
..
% .
-
-- back-calls"
"
@
# G @
"
c
" " " " c
c
5%'
r
" "
" " " "
9 Input pipeline, with line layer and feedback,
A PL/I-like preprocessor is simpler in this respect: it is controlled by tokens supplied This m e ~ that s all bythelexicalidentification layer andworks on thesametokens. tokens from the lexical identification module can be stored and the preprocessing actions can be performed on the resulting list of tokens. No back-calls are required, and even the
more advanced preprocessing features, which include repetition, can be performed conveniently onthe list of tokens.
A generic unit X is a template from which anX can be createdby instantiation; X can be a type, a routine, a module, an object definition, etc., depending on the language definition. Generally, parameters have to be supplied inan instantiation; these parameters are oftenof a kind that cannot normally be passed as parameters: types, modules, etc. For example, the code GENERIC TYPE List link (Type) : a Type FIELD Value; a Pointer to List link (Type) FIELD Next;
declares a generic type for the links in linkedlists of values; the type of the values is given by the generic parameter Type. The generic type declaration can beused in an actual type A type for linkstobeusedinlinked lists of declarationtoproducethedesiredtype. integers couldbe instantiated from this generic declarationby code like TYPE Integer list link: ; INST~TIATEDList link (Integer)
which supplies the generic parameter Integer to List 1 ink. This instantiation would then act as if the p r o g r a ~ e had r written TYPE Integer list link: an Integer FIELD Value; a Pointer to Integer list link FIELD Next;
Generic instantiation looks very much like parameterized text substitution, and treating a generic unit as some kind of parameterized macro is often the simplest way to implement generics. Usually generic substitution differs in detail from macro substitution. In our example we have to replace the text I ~ ~ T A ~ T I AList T E ~1 ink (Integer)by the fields themselves,but List link (Integer) by thename Integer list link. The obvious disadvantage is that code is duplicated, which costs compilation time and run-time space. With nested generics, the cost canbe exponential in the numberof generic units. This can be a problem, especially if libraries use generics liberally. For another way to handle generics that does not result in code duplication, see Section 6.5.3.2.
We have now come to the endof our discussion of lexical analysis. We have seenthat lexical analyzers can conveniently be generated from the regular expressions that describe the tokens in the source language. Such generated lexical analyzers record their progress in sets of ‘items’, regular expressions in whicha dot separates the part already matched from the part still to be matched. It turned out that the resultsof all manipulatio~sof these item sets can be precomputed during lexical analyzer generation, leading to finite-state automata
or FSAs. Their implementation results in very efficient lexical analyzers, both in space, provided the transition tablesare compressed, and in time. T~~ditionally, lexical analysis and lexical analyzers are explained and implemented directly from the FSA or t r ~ s i t i o ndiagrams of the regular expressions (Thompson,1968), without introducing dotted items (Earley, 19’70). Dotted items, however, unify lexical and syntactic analysis and play animpo~antrole in tree-rewriting code gen~ration,so we have based our explanationof lexical analysis on them. We have seen that the output of a lexical analyzer is a sequence of tokens, (token class, token representation) pairs. The identifiers in this sequence often need some identification and further processing for the benefit of macro processing and subsequent syntax analysis. This processing is conveniently done in a lexical i~entificationphase. We will now proceed to considers y n t ~ n ~ Z y s ialso s , known as~ ~ r s i n g .
There are two ways of doing parsing: top-down and bottom-up. For top-down parsers, one hasthechoice of writingthem byhandorhavingthemgeneratedautomatically,but bottom-upparserscanonlybegenerated.In all threecases, the syntax s ~ c t u r etobe recognized is specified using a context-free g r a ~ a r grammars ; were discussed in Section 1.9. Section 2.2.2 details considerations concerning error detection and error recovery in syntax analysis. Roadrnap
2.2 From tokens to syntax tree -the syntax 2.2.1 Two classes of parsing methods 2.2.2 Error detection and error recovery 2.2.3 Creating a top-down parser mmually 2.2.4 Creating a top-down parser automatically
2.2.5 Creating a bottom-up parser auto~atically
~ r ~ m aare r san essential tool in language specification; they have several important aspects. First, a grammar serves to impose a structure on the linear sequence of tokens which is the program. This structure is all-impo~antsince the semantics of the program is specified in terms of the nodes in this structure. The process of finding the structure in the ,andamodule that performsthistask i flat str~amof tokens is called Second, using techniqu the field of formal languages, gra a em~loyedto construct a parser for it automatically. This is a great help in compiler construction. Third, g r ~ a r aid s programmers to write syntactically correct programs and provide answers to detailed questions about thesyntax, They do the same for compiler writers. Therearetwowell-knownandwell-researchedwaysto do parsing: dete~inistic left-to-right top-down (the LL method) and deter~nisticleft-to-right bottom-up (the LR andLALRmethods).Left-to-rightmeansthattheprogramtext, or morepreciselythe sequ ns, is processed from left to right, one token at the time. ~ntuitivelyspeaking, lie means that no searching is involved: each token brings the parser one *
step closer to the goal of constructing the syntax tree. The theoryof f o ~ alangu l vides a more rigorous definition. The terms top-down and bottom-up will be below. The dete~inisticparsing methods re uire an amount of time that is a linear function of thelength of theinput:theyaremethods. ~ n f o ~ u n a t e l they y , do notsolve all parsing problems: they work d classes of grammars only. A ramm mar copied ‘as is’ from a language manual hasa very small chanceof leading to a deterministic ramm mar method,unless of coursethelanguagedesignerhastakenpainstomakethe matchsuch a method.Droppingtheproperty of being d e t e ~ n i s t i c-allowingsome searching to take place -results in algorithms that can handle all gra~mars,but these algorithms are no longer l ~ n e ~ - t i mand e their time and space requirements vary9 sometimes e~atically.Since these methods do not (yet) play a si nificant role in col~pilercons t ~ c t i o nwe 9 will not discuss them here. Besides p ~ s speed i ~ there ~ i s another reason to require ~ e t e ~ i n a c ay :~ r a m ~ for ar which a dete~inisticparser can be generatedis guaranteed to benon-a~biguous,which is of course a very important property of a p r o g r a ~ i n g l a n g u a ~ grammar. e ambiguous and allowing dete~inisticparsing are not exactly the same (the second implies the first butnotviceversa),but requi~ngdeter~inacyis technicallythebestnonambiguity test we have. Since an a r b i t r a ~grammar will often fail tomatchone of thestandardparsin ~ a n s f the o ~grammar so that it wil methods, it is im~ortanttohavetechniquesto Methods to do so are explained in Sections 2.2.4.3 and 2.2.5.7. The t r a n s f o ~ e d will assign syntax trees to at least some programs that differ from the ori~inaltr unavoidably causes some problems, since the semantics is described in terms of nal syntax trees. So g r ~ m a transfo~ation r methods must also create t r a n s f o ~ e dsemantic rules. We will assume that the grammar of the progra~minglanguage is non-ambi This implies that to each input program there belongs either one sy p r o g r a ~is syntactically correct, or no syntax tree, and then the pr more syntax errors.
sequence of tokens. the syntax tree means that a tree of nodes must be created and that these nodes must be labeled with grammar symbols, in such a way that:
-leaf nodes are labeled with terminals and inner nodes are labeled with non-te~~inals; -the top nodeis labeled with thestart symbol of the g r ~ ~ ~ ;
-the c h i l ~ e nof an inner node labeled N correspond to the members of an a l t e ~ a t i ~ofe N, in the same orderas they occur in that alternative; -the t e r ~ n a l slabeling the leaf nodes co~espondto the sequence of tokens, in the same order as they occur in the input. Left-to-right parsing starts with the first few tokens of the input and a syntax tree, which initially consists of the top node only. The top nodeis labeled with the start symbol.
The parsing methods can be distinguished by the order in which they construct the nodes in the syntax tree: the top-down method constructs them in pre-order, thebottom~up methods in post-order. A short introduction to the terms ‘pre-order, and ‘post-order’ can be found below. The top-down method starts at the top and c o n s ~ c t the s tree downwards to match the tokens in the input; the bottom-up methods c o ~ ~the i ~tokens e in the input into parts of the tree to finally construct the top node. The two methods do quite different things when they c o n s t ~ cat node. We will first explain both methods in outline to show the s i ~ i l a r i t i and ~ s then in enough detail to designa parser generator.
traversing trees and visiting the nodes of those tree. Such traversals are per~ormedas part of some algorithms, for example to draw a picture of the tree. When a process v ~ s ~ t s e in a treeitperforms a specificactiononit: a rocess it can for exam le print i n f o ~ a t i o naboutthenode.When o things: it traverses the the node itself; the order it performstheseactions is crucialanddeterminesthenature ofthe by traversing its top node. e traversal process starts at the top of the tree in both cases and evenually visits all nodes in the tree; the order in which the nodes are visited differs,though.Whentraversing a nod first visits the node N and then traverses trees in left-to-right order, When traversing a node N in , the process first N. traverses N ’ s subtrees in left-to-right order and then visits the node Other variants (multiple visits, mixing the visits inside the left-to-right traversal, deviatin from the left~to-righttraversal) are possible but less usual. h the difference between pre-order and post-order seems small when written down in two sentences, the effect is e n o ~ o u s .For example, the first node visited in pre-orderis the top of the tree, in post-order it is itsleftmost ~ottom-mostleaf.Figure 2.50 showsthesametree, once with the nodes numbered in pre-order and once in post-order. Preorder is generally used to distributei n f o ~ a t i o nover the tree, post-order to collect i n f o ~ a t i o nfrom the tree.
Note that there are three different notions involved here:visiting a node, which means doing something with the node that is significant to the algorithm in whose service the traversal is performed; truversi~ga node, which means visiting that node and traversingits ~ ~which ~ ~means ~ traversin its top node, which subtrees in some order; and ~ r u v aetree,
(a) Pre-order
(b)Post-order
A tree with its nodes numbered in pre-orderand post-order.
will then recursively traverse the entire tree. ‘Visiting’ belongs to the algorith~;‘traversing’ in both meanings belongs to the control mechanism. This separates two concerns and is the source of the usefulness of the tree traversal concept. In everyday speech theset e ~ s are often confused, though.
ser begins by constructing the top node of the tree, which it knows to be start symbol, It now constructs the nodes in the syntax tree in pre-order, which means that the topof a subtree is constructed before anyof its lower nodes are. When the top-down parser constructs a node, the label of the node itself is already known, say N,this is true for the top node and we will see that itis true for all other nodes as well. Using infor~ationfrom the input, the parser then determines the correct alternative for N; how it can do this is explained in Section 2.2.4.1. Knowing which alte~ative applies, it knows the labels of all the children of this node labeled N. The parser then proceeds to construct the first child of N ; note that it already knows its label. The process of determining the correct alternative for the leftmost child is repeated on the further levels, until a leftmostchild is c o n s ~ c t e dthat is a terminalsymbol. The terminalthen ‘matches’ the first token t 1 in the program. This does not happen by accident: the topdown parser chooses the alte~ativesof the higher nodes precisely so that this will h a ~ ~ e n . We now know ‘why the first token is there,, which syntax tree segment produced the first token. The parser then leaves the terminal behind and continues by construct in^ the next -order; this could for example be the second child of the parent of the first Figure 2.51, in which the large dot is the node that is being constructed, the smaller dots represent nodes that have already been constructed and the hollow dots cate nodeswhoselabelsarealreadyknownbutwhichhavenotyetbeenconstru ~ o t h i n gis known about the rest of the parse tree yet, so that part is not shown. In summary, the main task of a top-down parser is to choose the correct alternatives for known *
*-
lion-te~inals.Top-down parsing is treated in Sections2.2.3 and 2.2.4.
1
/l\ /\ 2
parse tree
0
0
0
3
4
/l\
”
input - I
*l
*2
t3
t4
*S
t6
*l
*8
*g
re 2.51 A top-down parser recognizing the first token in the input.
es in the syntax tree in post-order: the top of a subtree is c o n ~ ~ c t after e d all of its lower nodes have been constructed. W e n a bottom-up parser constructs a node, all its children have already been constructed, and are resent and known; the label of the node itself is also known. The parser then creates the node, labels it, and connectsit to its children. A botto~-upparser always constructs the node that is the top of the first complete subtree it meets when it proceeds from left to right through theinput; a complete subtreeis a tree all of whose children have already been c o n s ~ c t e d .Tokens are considered as subtrees of height 1 and are constructed as they are met. The new subtree must of course be chosen so as to be a subtree of the parse tree, but an obvious problem is that we do not parse tree yet; Section 2.2.5 explains how to deal with this. ildren of the first subtree to be const~ctedare leaf nodesonly, labeled with terthe node’s correct alternative is chosen to match them. Next, the second subinput is found all of whose children have already been constructed;the ~hildren e can involve non-leaf nodes now, created by earlier cons~uctingof nodes. A node is const~ctedfor it, with label and a~propriatealternative. This process is repeated until finally all children of the top node have been constructed, after which the top node itself is constructe~and the parsingis complete.
Figure 2.52 shows the passer afterit has const~cted(recogni~ed)its first, second, and thirdnodes. The largedotindicatesagainthenodebein const~cted,thesmallerones those that have already been const~cted.The first node pans tokens t3, t4, and t 5;the second spans t7 and t,; and the third node spans the first node, token $6, and the second node. Nothing is ~ n o w nyet about the existence of other nodes, but branches have been drawn upward from tokens t and t 2 , since we know that they cannot be part of a smaller subtree than the one spanning tokenst3 throug~t,; otherwise that subtree would have been the first to be constructed. In s u m ~ a r ythe , main task of a bottom-up parser i edly find the first node all of whose children have already been const~cted. parsing is treated in Section 2.2.5.
/
l
"
'1
t2
3 '
t4
'5
'6
'7
'8
'9
A bottom-up parser constructing itsfirst, second, and third nodes.
An error is detected when the constructionof the syntax tree fails; since both top-down and methods read the tokens from left to right, this occurs when ~rocessing Then two questions arise: what error message to give to the user, and whether and howto proceed after the error. The position at which the error is detected may be unrelated to the position of the actual error the user made. In the C f r a ~ ~ e n t x = a(p+q(
- b(r-S);
the error is most probably the opening parenthesis after the g, which should have been a closing p~enthesis,butalmost all parsers will reporttwomissingclosingpasentheses before the semicolon. It will be clear that it is next to impossible to spot this error at the right m o ~ e n tsince , the segment x = a (pig( -b (r -S ) is correct with g a functi - a monadic minus. Some advanced error andl ling methods consider the entire pr when producing error messages, but after 25 yeas these are still e~perimental,and are hardly ever found in compilers. "he best one can expect from the efficient methods in use
x
today is that they do not derail the parser any further. Sections 2.2.4.5 and 2.2.5.8 discuss such methods. It has been suggested that with today’s fast interactive systems, there is no point in continuing program processing after the first error has been detected, since the user can easily correct the error and then recompile in less time than it would take to read the next users like to have some idea of how many syntax errors there are left in mpilingseveraltimes,eachtimeexpecting it tobethelasttime, is demoralizing. We therefore like to continue the parsing and give as many error messages as there are syntax errors. This means that we have todo error re There are two strategies for error recovery. One, called e theinputtokenstream andor the parser’s i n t e ~ a state l so thatcancontinue;we will discuss below the question of whether the resulting parse tree will still be consistent. There is an almost infinite number of techniques to do this; some are simple to implement, others complicat~d,but all of them have a significant chance of derailing the parser and roducinanavalanche of spuriouserrormessages. The other,called very, does not modify the input stream, but rather discards all parser info ntinues parsing the rest of the program with a grammar for ‘rest of program’ (Richter, 1985). If the parse succeeds, there were no more errors; if it fails it has certainly found another error. It may miss errors, though. It does not produce a parse tree for syntactically incorrect programs. The g ~ for ‘rest ~ of program’ ~ m for a language ~ L is called the S L, since it generates all suffixes (tail ends) of ail programs in L. Althou mar of a lang~ageL can be derived easilyfrom the original ~ r a m ~ofa L, r suffix grammars can generally not be handled by any of the dete~inisticparsing techniques. They need stronger but slower parsing methods, which requires the presence of two parsers in the compiler. on-conecting error recovery yields very reliable error detection and recovery, but is relatively difficult to implement. Itis not often foundin parser generators. It isimportant that the parser never allows an inconsistent parse tree to be constructed, whengivensyntacticallyincorrectinput. All errorrecoveryshouldbeeithererrorand always produce parse trees that conform to the syntax,or be non-correcting ce no parse trees for incorrect input. As already explained in Section 2.1.8 where we were concerned with token representations, allowing inconsistentdata to find their way into later phases in a compiler is asking for trouble, the more so when this data is the parse tree. Any subsequent phase working on an inco~sistentparse tree may easily access absent nodes, apply algorithms to the wrong data structures, follow no~-e~istent pointers, and get itself in all kinds of trouble, all of which happens far away from the place where the erroroccuned, Any error recovery technique should be designed and implemented so that it will under no circul~stancesproduce an inconsistent parse tree; ifit cannot avoid doingso for technical seasons, the implementation should stop further process in^ after the pars~ngphase. Non-conecting error recovery has to do this anyway, sinceit does not produce a parse tree at all for an incorrect program. Most parser generators come with a built-in e r detection and recovery mechanism, so the com~ilerwriter has little say in the matter. owing how the error handling works may allow the compiler writer to makeit behave in a more user-friendly way, however,
Given a non-terminal N and a token t at position p in the input, a top-down parser must decide which alternative of N must be applied so that the subtree headed by the node labeled N will be the correct subtree at position p. We do not know, however, how to tell that a tree is correct, but we do know when a tree is incorrect: when it has a different token than t as its leftmost leaf at position p. This provides us with a reasonable approxi~ation to what a correct tree looks like:a tree that starts witht or is empty. The most obvious way to decide on the right alternativefor N is to have a (recursive) Boolean function which tests N's alternatives in succession and which succeeds when it finds an alternative that can produce a possible tree. To make the method deterministic, we decide not to do any backtracking: the first alternative that can produce a possible tree is assumed to be the correct alternative; needless to say, this assumption gets us into trouble occasionally. This approach results in arec~rs~~e arser; recursive descent parsers have for many years been popular with compiler writers and writing one m a y still be the simplest way to get a simple parser.The technique does haveits li~itations, tho~lgh, as we will see.
for the grammar from Figure 2.53; the driver is shown in Figure2.55. Since it lackscode for the construction of the parse tree,it is actually a recognizer. The grammar describes a very simple-~indedkind of arithmetic expression,one in whichthe -t operator is right-associative.Itproducestokenstringslike IDENTIFIER -IIDE EN TI FIE^ -t IDENTIFIER) EOF,where EOF stands for end-of-file. The parser text shows an astonishingly direct relationship to the grammar for which it was written. This similarity is one of the great attractions of recursive descent parsing; the lazy Boolean operators && and I I in C are especially suitable for expressin it.
input "3 expression EOF expression -+ term rest-expression term -+ I ~ E ~ T I F I EIR parenthesized-expression parenthesized-expression -+ ( / expression /)' rest-expression -+ expression I E ~ i ~ 2.53 u r A~ simple grammar for demonstrating top-down parsing.
Each rule N corresponds to an integer routine that returns 1 (true) if a terminal production of N was found in the present position in the input stream, and then the part of the input stream corresponding to this terminal production of N has been consu~ed.Otherwise, no such terrninalprodu~tionof N was found, the routine returns0 (false) and no input was consumed. To this end, the routine tries each of the alternatives of N in turn, to see if
#include
fftokennumbers. hff
/* PARSER */ int input(void) { return expression ()
1
&&
require
int expression (void){ return term ( ) & & require
1
(token
(EOF) );
(rest-expression (1 ) ;
int term(void) { return token(1~E~TIFIE~) I I parenthesized-expression();
1
parenthesized-expression (void) { return token ( ( ) && require (expression ())
&&
require
int rest-expression(void) { return token( ’ c ’ ) && require (expression0)
1I
1;
int
1 1
(token ( ’) ‘1 ) ;
int token (int tk) { if (tk != Token.class) return 0; get-next-token() ; return 1; int require (int found) { if ( !found) error () ; return 1;
1
A recursive descent parserh-ecognizerfor the grammar of Figure 2.53.
one of them is present. To see if an alternative is present, the presence of its first member is tested, recursively. If it is there, the alternative is considered the correct one, and the other ~ e ~ ~are e required r s to be present. If the first member is not there, no input has been consu~ed,and the routine is free to test the next alternative. If none of the alternatives succeeds, N is not there, the routinefor N returns 0, and no input has been consumed, since no successful call to a routine has been made. If a member is required to be present and it is not found, there is a syntaxerror, which is reported, and the parser stops, and parenthesized-expression in The ro~tinesfor expression, term, 4 are the direct resultof this approach, and so is the routine token ( ) . The rule for rest-expres s ion contains an empty alternative;since this can always be assumed be present, it can be represented simplyby a 1 in the routine for rest-express ion. otice that the precedence and the semanticsof the lazy Boolean operators & & and I I give us exactly whatwe need.
5 Driver for the recursive descent parserhecognizer.
in spite of their initial good looks, recursive descent parsers have a number of drawbacks. First, there is still some searching through the alternatives; the repeated testing of class effectively imple~entsrepeated b a c ~ t r a c ~ nover g one the global variableToken. token. Second, the method often fails to produce a correct parser. Third, error handlin~ leaves much to be desired, The second problem in particular is bothersome, as the following three examples will show. 1. Suppose we want to addan may element as a term:
I
term -+ I~ENTIFIER indexed-element parenthesized-expression indexed-element -+ IDENTIFI~R ' [' expression 'I '
and create a recursive descent parserfor the new ~ r a ~ r We n ~then . find that the routine for indexed-e lement will never be tried: when the sequence IDENTIFIER [ occurs in the input, the first alternative of term will succeed, consume the identifier, and leave the indigesti~lepart [ express ion 3 in the input. 2. A similarbutslightlydifferent pheno~enonoccursinthe g r a ~ m of ~~ i g u r e2.56, which produces ab and aab. A recursive recognizer for it contains the routines shown in Figure 2.57. This recognizer will not recognize ab,since A ( ) will consume the a and require (token ( a' ) ) will fail. And when the order of the alternatives in A () is inverted, aab will not be recognized. I
I
I
I
F
i 2.56~ A simple ~ grammar ~ ~ with a F I ~ S T ~ O L L conflict. O~
int S(void) { return A ( )
1
&&
int A(void) { return token('a0
}
(token (' a' ) )
require
II
&&
require
(token (' b' ) ) ;
1;
igure 2.57 A faulty recursive recognizer for grammarof Figure 2.56.
3. Suppose we want to replace the -I- for addition by a - for subtraction. Then the right associativity expressed in the grammar from Figure 2.53 is no longer acceptable. This means that the rulefor expressi o n will now have to read: expression "+ expressionterm
I
...
If we construct the recursive descent routine for this, we get int
expression (void) { return expression () require(term0)
&&
11
require
...;
(token ( - ))
&&
} but a call to this routine is guaranteed to loop. ecursive descent parsers cannot handle left-recursivegrammars,which is aseriousdisadvantage,since most p r o g r a ~ i n g language g r a ~ m a r sare left-recursive in places.
ve from those of writW this const~ction of s , that do not e ~ ~ i ~ i t arser to be pe~ormedare called LL(1) ~ r a ~ m a rthose LL( l ) conflicts. The LL(1) parsing ~~echanism represents a push-downautomaton, as described in Section 2.2.4.4. An impo~antaspect of a parser is its error recovery capability: manual and automatic techni~uesare discussed in Section 2.2.4.5. An example of the use of a traditional top-down parser generator concludes this section on the creation of top-down parsers. Inprevioussections we haveobtainedconsiderable aim byusing precomp~ltatio~, and we can do the same here. When we look at the recursive descent parsing process in more detail, we see that each time a routine for N is called with the same token t as first token of the input, the same sequenceof routines gets called and the samealte~ativeof N is chosen. So we can precompute for each rule N the alternative that appliesfor each token t in the input. Once we have this information, we can use it in the routine for N to decide right away which alternative applies on the basis of the input token, One advantage is that this way we will no longer need to call other routines to find the answer, thus avoiding the search overhead. Another advantage is that, ~lnespectedly, italso provides a solution of
Roadrnap 2.2.4 Creating a top-down parser auto~atically 2.2.4. l LL( l) parsing 2.2.4.2 LL(1) conflicts as an asset 2.2.4.3 LL(I) conflicts as a liability 2.2.4.4 The LL( l) push-down automaton 2.2.4.5 Error handling in LL parsers 2.2.4.6 A trad~tional top-down parser generator -LLgen
sorts to the problems with the three examples above.
When we examine the routines in Figure2.54 closely, we observe that the final decision on the success or failure of, for example, the routine term () is made by comparing the input token to the first token produced by the alternatives of term ( ) : IDENTIF IER and parenthesized-expression (). So wehavetoprecthesets of firsttokens producedbyallalternativesinthegrammar,theirso-called sets. Itiseasytosee that in order to do so, we will also have to precompute theFIRST sets of all non-terminals; the FIRST sets of the terminals are obvious. The FIRST set of an alternative a, FIRST(aj, contains all terminals a can start with; if a canproducetheemptystring E, this E is includedintheset FIRST(aj. find in^ FIRST(a) is trivial when a starts with a terminal, as it doesfor example in parenthesized-expression "+ ' (' expression ')'
but when a starts with a non-terminal7 sayN, we have to find FIRST(N). FIRST(^, however, is the union of the FIRST sets of its alternatives. So we have to determine theF1 sets of the rules and the alternatives simultaneously in one algorithm. The FIRST sets can be computedby the closure al~orithmshown in Figure 2.58. The initializations set the FIRST setsof the terminals to contain the terminals as singletons, and set the FIRST set of the empty alternative toE; all other FIRST sets start off empty. Notice thedifferencebetweelltheemptyset { 1 andthesingleton c o n t a i ~ i nE:~ {E}. The first inference rule says that if a is an alternative of N , N can start with any token a can start with. The second inference rule says that an alternative a can start with any token its first member can start with, except E. The case that the first member of a is nullabie (in w h i c ~ case its FIRST set contains E) is covered by the third rule. The third rule says that if the first member of a is nullable, a can start with any token the restof the alternative after the first member (p)can start wit If a contains only one member, the rest of the a ~ t e ~ a t i v e is the empty alternative and F ST(a) contains E, as per initialization 4. The closure a l g o ~ t te h ~inates since the FIRST sets can only grow in each application of an inference rule, and their largest possible contents is the set of all terminals Eand . Xn practice it t e ~ n a t e sveryquickly. The initialandfinalFIRSTsets for oursimple g r a ~ m aare r shown in Figures2.59 and 2.60, respectively. The FIRST sets can now be used in the const~ctionof a predictiv in Figure 2.61. It is called a for short) because it predicts the presence of a given alternative without trying to find out
alled FIRST sets for all terminals, non-terminals and alternatives of non-terminals in G. 2. A token set called FIRST for each alternative tail in G; an alternative tail is a sequence of zeroor more g r a m m ~symbols a if A a is an alternative or alternative tail inG. nals T, set FIRST(T)to {T }, erminals N, set FIRST(N) to the emptyset, 3. For all non-empty alternatives and alternative tails a, set FZRST(a) to the IRST set of all empty alternatives and alternative tails to{E). le N+a in G, FIRST(N) must contain all tokens in FIRST(a), inc~udingE if FIRST(a) contains it. 2. For each alternative or alternative tail a of the form A p, FIRST(a) must contain all tokens inFZRST(A), excluding E, should FIRST(A) containit. 3. For each alternative or alternative tail a of the form A p and FIRST@) conustcontain all tokensin FIRST(p), including E if 8 Closure a l g o ~ t for h ~computing the FIRST sets in a g ~ a G. m ~ ~
explicitly if it is there. Actually the term ‘predictive’ is somewhat misleading: the parser does not predict, it knows for sure. Its ‘prediction’ can only be wrong when there is a syntax error in the input. ~e see that the code for each alternative is preceded by a case label based on its ST set: all testing is done on tokens only, using switch statements inC. The routine for a g r a ~ ~rule a r will now only be called whenit is certain that a terminal production of that rule starts at this point in the input (barring syntactically incorrect input), so it will always succeed and is represented by a procedure rather than by a Boolean function. This also applies to the routine token () ,which now only has to match the input token or give an error ~ e s s a ~the e ; routine require ( ) has disappeared. c o ~ p l i c ~ t i o narises with case the label for empty thealternative in rest-expression. Since it does not itself start with any token, how can we decide whether it is the correct alternative? We base our decision on the following consideration: when a non-terminal N produces a non-empty string we see a token that N can start with; when N produces an empty string we seea token that can followN. So we choose the nullable alternative of N when we find ourselves looking ata token that can followN. his requires us to determine the set of tokens that can i ~ e d i a t e l yfollow a given
R~le/alte~native (tail) input expression EOF EOF
expression term rest-expression rest-expression term
I
IDENTIFIER parenthesized-expression
parenthesized-expression expression expression l ) ’ I
( l
l
)
I
rest-expression +’ expression expression I
I
&
re 2.59 The initial FIRST sets.
non-terminal N ; thisset is called the F O L L ~ can ~ be ( coxnputed ~ using an a case we do not need F O L L O ~sets of the S rithxn for computing F 0 Figure ven in 2.62, The algori t ~ O L L O sets~ of all non-te~inals empty theto set, andusesthesets as obtainedbefore. The first inferencerulesaysthat if a nonterminal N is followed by some alternative tail p, N can be followed by any token that can start with. The second rule is more subtle: if p can produce the empty string, any toke that can followM can also followN. Figure 2.63 shows the result of this algo~thmon the ~ r a m ~ of a rFigure 2.53. ~e see that ~ O L L Ores ~ ( t expression) = { EOF ) },which supplies the case labels for the nullable alternative i n the routine for res t-expres s i o n in Figure 2.61. The parser construction procedure described here is called the parser works from Left to right identifyi tion order, and ‘(l)’ because all choices look-ahead. nare A ~ram~ar that can be handled by this process is calle (but see the remark at the end of this section), The above process describes only the bare bones of LL(1) parser t en era ti on: realworld LL(1) parser generators also haveto worry about such things as
X
Rule/alt~r~ative (tail)
input expression EOF
EOF
expression term rest-expression rest-expression
{ { { { { {
FIRST set
IDENTIFIER IDENTIFIER EOF 1 IDENTIFIER IDENTIFIER ’c‘ E
}
’(’ } ’(‘ }
’(’ } ’(’ }
{ IDENTIFIER ‘(‘ } { IDE~TIFIER}
term
I
X
IDENTIFIER parenthesized-expression
parenthesized-expression expression ) expression ) ‘ ‘1‘ ( l
1 { 1 { ’(’1 {
’
(
l
I
(
‘
rest-expression c ‘ expression expression
{ IDEN~IFIER’(’ } { 1 { r+r E } { ‘c‘ 1 { IDENTIFIER (‘ }
I
{ E }
E
.60 The final FIRST sets.
-repetitionoperatorsinthegrammar;theseallow,
for example, express
ion and
rest-expression to be combined into expression
-+ term
(
term ) *
and complicate the algorithmsfor the computation of the
-detecting and reporting parsing conflicts (see below); -including code for the creation of the syntax tree; -including code and tablesfor syntax error recovery; -optimizations; for example, the routine ~ a r e ~ t ~ e s i ~ e ~ - e x p r e s()sin i Figon
ure 2.61 is only called when it has already been established thatToken.class is (,so the test in the routine itself is super~uous.
Actually, technically speaking, ess discussed yields )parser generation p re powerful in theory, but it turns out that there are no full-LL(1) gra encehas no directpractical conseq~~ences andeverybodycalls‘strong-LL( 1) parsers’ ‘LL(1) parsers’. There is an indirect difference, though: since the full-LL( 1) parser generation process collects more information, it allows better error recovery. But even. this
void input (void) { switch (Token.class) { case IDENTIFIER: case‘(’ : expression () ; token default: error () ;
1 void
(EOF) ; break;
1
expression (void) { switch (Token.class) { case IDENTIFIER: case’(’: term ( ) ; rest-expression ( 1 ; break; default: error () ;
1
1
void termboid) { switch (Token.class) { caseIDENTIFIER: t~ken(1DENTIFIE~); break; case ’(’: parenthesized-expression(); break; default: error () ;
1
1
parenthesized-expression (void) { switch (Token. class) { case ’(’: token ( ‘( ’) ; expression ( ) ; token ( ’1 ‘ default: error () ;
void
1 void
; break;
1
rest-expression (void) { switch (Token-class){ case ’+‘: token ( / ) ; expression (1 ; break; break; case EOF: case default: error ( ) ; +I
I)’:
1
1
void token(int tk) { i€ (tk != Token.class) error(1 ; get-next-token () ;
1
.61 A predictive parserfor the g r a l of~Figure ~ ~ 2.53.
property is not usually exploited in compilers. Further details are given in Exercise 2.42.
ed FOLLOW sets for all non-terminals in G. ed FIRST sets for all alternatives and alternative tails inC. minals N, set FOLLOW(N) to the empty set. sets to the values determinedby the algorithm for FIRST sets. of the form M+aNp in G, FOLLOW(N) must contain all
p),excluding E, should FIRST@) contain it.
2, For each rule of theform M+aNp in G where FIRST@) contains LOW(N) must containall tokens in FOLLOW(2M). Closure algorithm for computingthe FOLLOW sets in g
Rule input expression term parenthesized-expression rest-expression
{ { { { {
FIRST set
IDENTIFIER ' (' IDENTIFIER ' (' ID EN TI FIE^ (' ' (' } 'S'
E
}
set
} } }
r
E,
~ G. m ~
FOLLOW
{} { EOF ' ) '} { '+'EOF ')' { '+' EOF ')' { EOF '1' }
} }
Fi~ure2.63 The FIRST and FOLLOW sets for the grammar from Figure 2.53.
e now return to thefirst of our three problems describedat the end of Section 2.2.3.1: the a ~ ~ i t i oofn indexed-element to term. When we generate code for the new grammar,we find that F I R S T ( i n d ~ x e ~ - e l ~ mis e ~{tIPE~TIFIER ) }, and the code for term becomes: void term(void) { switch (Token. class) { caseIDENTIFIER: token(1~ENTIFIE~); break; case I~ENTIFIER: indexed-element();break; case (' : parenthesized-expression(); break; error ( ) ; default:
1
1
Two different cases are marked with the same case label, which clearly shows the internal a conflict is conflict the g r a ~ a suffers r from: the C code will not even compile. Such iet, and g r a ~ m a r sthat are free from them are called ‘LL(1) grammars’. It is the task of the parser generator to check for such conflicts, report em and refrain from generating a parser if any are found. The grammar in Figure 2.53 is LL(l), but the g r a ~ ~ extended ar with the rule for indexed-element is not: it contains an LL( 1) conflict, more in ~ ~ i c u l aa r ict. For this conflict, parser the generator could for example report: ‘Alternatives 1 and 2 of term have a F I ~ S T ~ I R S T conflict on token I DENT I F I ER’. For the non-te~inalsin the g r a ~ of m Figure ~ 2.56 we find the following FIRST and FOLLOW sets: set FIRST Rule
S A
“3
A ‘ a r ’b’ ”3 ‘af I E
{ ’a‘ } { ‘ar E }
F O L L O ~set
{} { ’a’ }
This yields the parser shown in Figure 2.64. This parser is not LL( 1 due to the conflict in theroutine for A. Herethe first alternative of il. is selectedoninput a, since a is in F input on o selected a, since a is in F void S(void) { switch (Token.class) { case ’a’: A(); token(’a’); token(’b0; break; () ; default: error
1
1
void A(void) { switch (Token.class) { case a‘ : token ( ‘a‘ ; break; case ‘a‘: break; default : error (1 ;
}
1
i 2.64~A predictive u ~ parser ~ for the g r a r n ~ of a ~Figure 2.56.
~
Our third example concerned a left-recursive grammar: expression
”+
expression
’-‘ term I ...
,
This willcertainlycause an LL(1) conflict, for thefollowingreason:the F1 express ion will contain the FIRST setsof its non-recursive alternatives (indicated here by ...),but the recursive alte~ativestarts with expression,so its FIRST set will con-
tain the FIRST sets of all the other alternatives: the left-recursive alternative will have a FIRST~IRSTconflict with all the other alternatives. We see that the LL(1) method predicts the alternative Ak for a non-terminal N when the look-ahead tokenis in the set FIRST(A,) U (if Ak is nullable then FOLLOW(N))
Thisinformationmustallowthealternativetopredicttobeidentifieduniquely, above sets for A ...A,zmust all be disjunct. This leads to the following three re~uirementsfor a g r a ~ m ato r be L
so the
For each non-terminal N,the FIRST sets of all its alternatives must be disjunct. -No FIRST/FOLLOW conflicts: ~ ( be~ disjunct For each non-terminal N that has a nullable alternative, F ~ L L O must from the FIRST setsof all alternatives of N. -No multiple n~~llable alternatives: No non-terminal can have more than one nullable alternative. Rather than creating a parser that does not work, as the recursive descent method would, LL(1) parser generation detects the LL(1) conflict(s) and generates no parser at all. This is safer than the more cavalier approach of the recursive descent method, but has a LL(1) conflicts. new disadvantage: it leaves the compiler writer to deal with When a grammar is not LL(1) -and most are not -there are basically two options: use a stronger parsing method or make the grammar LL(1). Using a stronger parsing method is inprinciplepreferable,sinceitallows us toleavethegrammarintact.Twokinds of strongerparsingmethodsareavailable:enh d LL(1)parsers,whichare still top-down, and the bottom-up methods LALR( 1) and L . The problem with these is that they may not help: the g r a ~ a may r not be amenable to any deterministic parsing method. Also, top-down parsers are more convenient to use than bottom-u~parsers when context handling is involved, as we will see in Section 3.1.cl, So there may be reason to resort to the second alternative: making the grammar LL( 1). LL( l) parsers enhanced by dynamic conflict resolvers are treated ina subsection below. Making a grammar LL(1) meanscreating a new g r a m m ~which ge~eratesthesame language as the original non-LL(l) g r a ~ a and r which is LL( 1). The advanta new grammar is that it can be usedfor automatic parser generation; thedisadvant~geis that it does not construct exactly the right syntax trees,so some semantic patch in^ up will have to be done. There is no hard and fast recipefor making a grammar LL( l);if there were, the parser generator could apply it and the problem would go away. In this section we present some so thatthedamagetotheresultingsyntaxtreeis tricksandguidelines.Applyingthem minimal requiresjudgment and ingenuity. There are three main ways to remove LL(1) conflicts: ~e~t-facto~ing, substitution, and
left-recursion removal. ctorin~can be appliedwhentwoalternativesstartdirectlywiththesame grammar symbol, as in: term -+ IDENTIFIER
I
IDENTIFIER
[ l
expression '1,
I
...
Here the common left factor IDENTIFIER is factored out, in the same way as for example the x can be factored out in x*y+x*z, leaving x* (y+z ) . The resulting grammar fragment is now LL(1), unless of course term itself can be followed by a [elsewhere in the grammar: term "3 IDENTIFIER after-identifier I ... after-identif ier-+ [' expression ' 1 '1 E
or more concisely witha repetition operator: term -+ I ~ E N T I F I ~(R
[ l
expression '1
)?
I
...
~u~stitution involves replacing a non-terminal N in a right-hand side a of a production rule by the alternatives of N. If N has n alternatives, the right-hand side a is replicated n times, and in each copy N is replaced by a different alternative, For example, the result of substituting the rule A-+a
I
B C
I
E
in is: S - + P a q l P B c q / P q
In a sense, substitution is the opposite of factoring. It is used when the conflicting entities are not directly visible; this occurs in indirect conflicts and F I R S T ~ O L L Oconflicts. ~ The ~ r a ~fragment a r term -+ IDENTIFIER I indexed-element I parenthesized-expression indexed-element "+ IDENTIFIER ' [' expression ' 1'
exhibits an indirect ~ I R S T ~ I R Sconflict T on the token IDENTI F1 Substitution ER. of indexed-element in term turns it into a direct conflict, which can thenbe handled by left-factoring. Something similar occurs the ingrammar of Figure 2.56, which has a ~ I R S T ~ O L Lconflict. O ~ Substitution of A in S yields: S
-+
'ar 'ar 'b'
I
'ar 'b'
which can again be made LL( 1) by left-factoring. Le~t-re~ursion re caninprinciple be performedautomatically.Thealgorithm removes all left-recursion from any grammar, but the problemis that it mangles the grammar beyond recognition. Careful application of the manual technique explained below will also work in most cases, and leave the grammar largely intact. Three types of left-recursion must be distinguished:
cu~§ion,in which an alternativeof N starts withN; ~ecur§ion,in which an alternative of N starts with A, an alternative of A starts withB, and so on, until finally an alternative in this chain brings us backM, to -recur§ion9in which an alternative ofN starts withaNand a can producee. Indirectandhiddenleft-recursion(andhiddenindirectleft-recursion!)canusuallybe turned into direct left-recursion by substitution. We will now see how to remove direct left-recursion. We assume that only one alternativeof the left-recursive rule N starts with N; if there are more, left-factoring will reduce them to one. Schematically,N bas the f o m N+Na
Ip
p
in which a represents whatever comes after the N in the left-recursive alternative and represents the other alternatives. This rule produces theset of strings
P pa
Baa
paaa paaaa
which immediately suggeststhe two non-left-recursive rules iV+PN’ N’-+aN’ IE
in which N’ produces the repeating tail of N, the set {anI~ 2 0 )It. is easy to verify that these two rules generate the same pattern as shown above. is transformation gives us a technique to remove direct left-recursion. en we apply it to the traditional left-recursive definitionof an arithmetic expression expression -+ expression
/-/
term
I
term
we find that N = a =
P
=
expression term term ’ - l
So the non-left-recursive equivalent is: expression -+ term expression-tail-option expression-tail-option -+ term expression-tail-option ‘-I
I
E
There is no guarantee that repeated application of the above techniques will result in m not ~ unusual . vicious circle is that removal of F1 S T ~ ~ conflicts R ~ T an LL(1) ~ r a ~ A through 1eft”factoring results in ~ullablealternatives, which cause F I ~ S T ~ con~ L L ~ ~ icts. ~ e ~ o vthese i ~ gthrough substitution causes newF I R S T ~ 1 R ~conflicts, T and so on. ut for many grammars LL(1)-ness can be achieved relatively easily.
afin
~the s ~~ ~ ue o~~ t~ oif ~i~~t r s u~ ~t r~a ~~ a~ rf o r ~ a t i o ~ s
~
While it is often possible to transforrn our grammar intoa new grammar that is acceptable g r a ~ a usually r by a parsergeneratorandthatgeneratesthesamelanguage,thenew assigns a different structure to strings in the language than our original grammar did. Fortunately, in many cases we are not really interested in the stmcture but rather in the semantics implied by it. In those cases, it is often possible to move the semantics to so-called S , syntax mles that always produce the empty string and whose only task consists of m ~ n sure g that the right actions are executed at the right time. The trick is then to carry these marker rules along in the grammar transfo~ationsas if they were n o r ~ asynl tax rules. It is convenient to collect all the semantics at the end of an alternative: it is the first place in which we are certain we have all the information. Following this technique, we can express the semantics of our traditional definition of arith~eticexpressions as follows in C: expression (int *e) -+ expression (int *e) ' -'term(int *t) {*e-= *t;} I term(int *t) {*e = *t;}
We handle the semantics of the e~pressionsas pointers to integers for our de~onstration. The C frag~ents{*e -= *t ;} and {* e = *t ;} are the marker rules; the first adds the value obtainedfrom term to that obtainedfrom expression,the second just copies the value obtained from term to the left-hand side. Note that the pointer to the result is shared between express ion on the left andexpression on the right; the initial application of the mle expres sion somewhere else in the grammar will have to supply a pointer to an integer variable. Now we find that N = a = p =
expression (int* e ) term(int *t) {*e -= *t;} term(int *t) {*e = *t;} ' - 1
So the se~anticallycorrected no~-~eft-recursive equivalent is expression (int *e) "+ term(int *t) {*e = *t;> expression-tail-option(int * e ) expression-tail-o~tion(int *e) -+ '-' term(int *t) {*e -= *ti} expression-tail-option(int * e
I
&
This makessense:the C fragment {*e = *t ;} now copiesthevalueobtainedfrom term to a locationsharedwith expression-tail-option; thecode { * e -= *t ;} does effectively the same. If thereaderfeelsthatallthis is lessthanelegantpatchwork,weagree.Still,the transformations can be perfomed almost mechanically and few errors are usually introduced. A somewhat less objectionable approach is to rig the makers so that the correct syntax tree is constructed in spite of the transfo~ations,and to leave all semantic process-
ing to the next phase, which can then proceed as if nothing out of the ordinary had happened. In a subsection of Section 2.2.4.6 we show how this can be done in a traditional top-down parser generator.
~ u t o ~ ~c o t in c~ iresolutio~ ~t There are two waysin which LL parsers can be strengthened:by increasing the look-ahead and by allowing dynamic conflict resolvers. ~istinguishing alte~atives not by their first tokenbut by theirfirsttwotokens is calledhelps, t for example,todifferentiate between IDENTIFIER ( (routine call), TIFIER (may element), IDENTIFIER of (fieldselection), ID EN TI FIE^ + (expression)andperhapsothers. A disadvantage of LL(2) is that the parser code can get much bigger. On the other hand, only a few rules need thefull power of the two-token look-ahead,so the problem can often L ~ generator(ParrandQuong,199computestherequired belimited. The A ~ T parser look-ahead for each ruleseparately: it is LL(k), for varying k. utnoamountoflookahead can resolve left-recursion. vers are conditions expressed in some programming language that are attached to alternatives that would otherwise conflict. When the conflict arises during parsing, some of the conditions are evaluated to resolve it. The details depend on the parser generator. The parser generator LLgen (which will be discussed in Section 2.2.4.6) requires a conflict resolver to be placed on the first of two conflicting alternatives. When the parser has to decide between the two, the condition is evaluated and if it yields true, the first alternative is considered to apply. If it yields false, the parser continues with the second alternative, which, of course, may be the firstof another pairof conflicting alternatives. An important question is: what i ~ f o ~ a t i ocan n be accessed by the d y n a ~ i cconflict resolvers? After ail, this i n f o ~ a t i o nmust be available dynamic~llyduring p may be a problem. The simplest information one can offeris no inform~tion. this already helps to solve, for example, the LL(1) conflict in the if-then-else s~tementin many languages. After left-factoring, the if-else statement in C may have the following fom: I
I
[ l
I
if-else-statement -+ [if‘ (’ expression statement else-tail-option else-tail-option -+ ’else’ statement E statement -+ ... I if-else-statement I ... I
)
’
in whichtherule for else-t ail-op ion t hasa W conflict. The reason for this conflict is that the rule has an alternative that produces E, and both its F1 and its FOLLOW set contain the token els e . The ensuillg LL( 1) conflict is solved by attaching to the first alternative a conflict resolver that always returns true: I
else-tail-option -+ %if
(1)
[else‘ statement
1
E
The static conflict resolver %if (i ) can be expressed more approp~atelyas %prefer in LLgen. A more informative type of infomation that can be made available easily is one or morelook-aheadtokens.Evenonetokencanbeveryuseful:supposingthelexical analyzer maintains a global variableahead-token,we can write:
basic-expression -+ %if(ahead-token == 1 1 %if(ahead-token == [' ) I %if (ahead-token == OF-TOKEN I
( I
I
I
inwhich all four alternativesstartwith
)
routine-call indexed-element field-selection identifier
IDENTIFIER. Thisimplements a poor man's
LL(2) parser.
Narrow parsers -in which the actions attached to a node are performed as soon as the node becomes available -can consult much more Information in conflict resolvers than broad compilers can, for example symbol table info~ation.This way, the parsing process can be influenced by an arbitrarily remote context, and the parseris no longer context-free. It is not context-sensitive either in the technical sense of the word: it has become a fullyfledgedprogram, of which d e t e ~ i n a c y andterminationarenolongerguaranteed. Dynamic conflict resolution is one of those features that, when abused, can lead to big problems, and when used with caution can be a great help.
We have seen that in order to construct an LL( 1) parser, we have to computefor each nonterminal N, which of its alternativestopredict for eachtoken t inthe input. Wecan mange these results in a table; for the LL(1) parser of Figure 2.61, we get the table shown in Figure 2.65.
Top of stacklstate:
Look-ahead token
expresslon expression -l-
rest-expression
&
&
expression
Pigure 2.65 Transition table foran LL(1) parser forthe grammar of Figure 2.53.
Thistablelookssuspiciouslylikethetransitiontableswehaveseeninthetablecontrolled lexical analyzers. Even the meaning often seems the same: for example, in the state term,uponseeing a ( we go tothestate parenthesized-expression. Occasionally, there is a difference, though: in the state expression,upon seeing an I ,
IDENTIFIER,we go to a series of states, term and rest-expression, There is no provision for this in the original finite-state automaton, but we can keep very close to its original flavor by going to the state term and pushing the state rest-expression onto a stack for later treatment. If we consider the state term as the top of the stack, we have replaced the single state of the FSA by a stack of states. Such an automaton is called A. A push-down automaton derived from as LL( 1) grama mars by the above procedure is dete~inistic,which means that each entry in the t~ansition table contains only one value: it does not have to try more than one alternative. The stack of states contains both non-terminals and ter~inals;together they form the prediction to which the present input must conform (or it contains a syntax error). This correspondence is depicted most clearly by showing the prediction stack horizontally above the present input, with the top of the stack at the left. Figure 2.66 shows such an ~ a ~ g e m e nint ;it, the input was (it-i) +i where i is the character representation of the token IDENTIFIER,and the I f has just been processed. It is easy to see how the elements on the prediction stack are going to match the input.
Prediction stack: Present input:
expression ) rest-expression EOF IDENTIFIER ’+’IDENTIFIER ‘)’ ’+’ IDENTI~IEREOF
ure 2.66 Prediction stack and present input in a push-down a u t o ~ ~ t o n .
A push-downautomatonusesandmodifies a push-downpredictionstackandthe table [Non-terminal, input stream, and consults a transition table Prediction Token]. Only the top of the stack and the first token in the input stream are cons~ltedby andaffected by the algorithm. The table is two-dimensionaland is indexedwithnonterminals in one dimension and tokens in the other; the entry indexed with a non-terminal N and a token t either contains the alternativeof N that must be predicted when the present input starts witht, or is empty. The automaton starts withthe start symbol of the grammar as the only element on the prediction stack, and the token stream as the input. It knows two major and one minor types of moves; which oneis applied depends on the top of the prediction stack:
-~r~diction:
ove applies when the top of the prediction stack is a n o n - t e r ~ i ~N. al ved (popped) from the stack, and the transition table entry Predict ion table EN, tl is looked up, If it contains no alte~atives,we have found a syntax error in the input. If it contains one alternative of N, then this alternative is pushed onto the prediction stack, The LL(1) propertyguaranteesthattheentrywillnotcontainmore than one alte~ative.See Figure 2.67, ove applies when the top of the prediction stack is a t e ~ i n a l .It must be equal to the first token of the present input. If it is not, there is a syntax error; if it is,
prediction stack " -
input
"_
transition table
prediction stack
7 Prediction move in an LL( 1) push-down automaton.
both tokens are removed. See Figure 2.68.
-Termination:
Parsing ter~inateswhen the prediction stack is exhausted. If the input stream is also exhausted, the input has been parsed successfully; if itis not, there is a syntax error. The push-down automaton repeats the above moves until it either finds a syntax error or terminates successfully. Note that the a~gorithmas described above does not construct a syntax tree; it is a recognizer only. If we want a syntax tree, we have to use the prediction move to construct nodes for the members of the alternative and connect them to the node that is being expanded. In the match move we have to attach the attributes of the input token to the syntax tree. Unlike the code for the recursive descent parser and the recursive predictive passer, the code for the non-recursivepredictiveparser is independent of the language; all language dependence is concentrated in the Predict ion table []. Outline code for the LL(1) push-down auto ma to^ is given in Figure 2.69. It assumes that the input tokens reside in an array Input token f 1 ..3 ;if the tokens are actually obtained by calling a function like Next input token ( ) ,care has to be taken not to read beyond end-of-
prediction stack
input
l
~ I kA i - I
tk
fk+l
1 1
_ ”
_ ”
” _
input
k+l ” -
Match move in an LL(1) push-down automaton.
file. The algorithm terminates successfully when the prediction stack is empty; since the prediction stack can only become empty by matching the EOF token, we know that the input is empty as well. When the stack is not empty, the prediction on the top of it is examined. It is either a terminal, which then has to match the input token, or it is a nonterminal, which then has to lead to a prediction, taking the input token into account. Xf either of these r e ~ u i r e ~ e nist snot fulfilled, an error message follows; an error recovery algorithm may then be activated. Such algorithms are described in Section 2.2.4.5. It is instructive to see how the automaton arrived at the state of Figure 2.66. Figure 2.70 shows all the moves. Whethertousean LL( I) predictiveparser or an LL(1) pusll-downautomaton is mainly decided by the compiler writer’s preference, the general s t ~ c t u r eof the compiler, and the available software. A predictive parser is more usable in a narrow compiler since it makes combining semantic actions with parsing much easier. The push-down a~~tomaton is more impo~anttheoretically and much more is known about it; little of this, however, has found its way into compiler writing. Error handling may be easier in a push-down automaton: all available information lies on the stack, and since the stack is actually an array, the i n f o ~ a t i o ncan be inspected and ~ o d i f i e ddirectly; in predictive parsers it is hidden in the flowof control.
We have two major concerns in syntactic error recovery: to avoid infinite loops nnd to avoid producing compt syntax trees. Neither of these dangers is ima~inary.Many compiler writers, including the authors, have written ad-hoc error correction methods only to find that they looped on the very first error.The grammar S + a c
1
b S
IMPORT Input token [l. .] ;
// from lexical analyzer
SET Input token index 1; TO SET Prediction stack TO Empty stack; PUSH Start symbol ON Prediction stack; WHILE Prediction stack/= Empty stack: Pop top of Prediction stack into Predicted; IF Predicted is a Token class: // Try a match move: IF Predicted = Input token [Input token index] .class: SET Input token index TO Input token index -t 1; // matched ELSE Predicted /= Input token: I t I Predicted; ERROR "Expected token not found: ELSE Predicted is a Non-terminal: // Try a prediction move, using the input token as look-ahead: SET Prediction TO Prediction table [Predicted, Input token [Input token index]]; IF Prediction= Empty: ERROR "Token not expected: I!, Input token [Input token index] ]; ELSE Prediction /=IEmpty: PUSH The symbols of Prediction ON Prediction stack; Figure 2.69 Predictive parsing withan LL( 1) push-down automaton.
Initial situation: Prediction stack: Input:
input (' IDENTIFIER '+' IDENTIFIER
Prediction moves: Prediction stack: Prediction stack: Prediction stack: Prediction stack: Input:
expression EOF term rest-expression EOF parenthesized-expression rest-expression EOF (' expression 0 ' rest-expression EOF IDENTIFIER EOF IDENTIFIER (' IDENTIFIER
Match move on ( ' : Prediction stack: Input:
expression ,), rest-expression EOF IDE~TIFIER ')' '+, IDENTIFIER EOF IDENTIFIER
l
)
'
I)'
Figure 2.70 The first few parsing moves for (i + i1 + i.
,+' IDENTIFIER EOF
l + '
provides a simple demonstration of the effect; it generates the Ian uage b*ac. Now suppose theactual input is c. Theprediction is S, which,being a non-terminal,mustbe replaced by one of its alternatives,in a predictionmove. The first alte~ative,ac, is rejected since the input does not start with a. The alternative bS fails too, since the input doesnot start with b either. To the naivemindthissuggests a wayout:predict bS anyhow, insert a b in front of the input, and give an error message ‘Token b inserted in line ...’. The inserted b then gets ~ a t c h e dto the predicted b, which seems to advance the parsing but in effect brings us back to the original situation. Needless to say, in practice such infinite loops originate from muchless obvious interplayof g r a m m ~rules. Faced with the impossibility of choosing a prediction, one can also decide to discard the non-terminal. This, however, will cause the parser to producea corrupt syntax tree. To see why this is so, return to Figure 2.51 and imagine what would happen if we tried to ‘improve’ the situation by deleting oneof the nodes indicated by hollow dots. A third possibility is to discard tokens from the input until a matching token is found: if you need a b, skip other tokens until you find a b. Although this is guaranteed not to loop, it has two severe problems. Indiscriminate skipping will often skip important structuring tokens like procedure or ) ,after which our chances for a successful recovery are reduced to nil. Also, when the required token does not occur in the restof the input at all, we are left with a non-empty prediction and an empty input, and it is not clear how to proceed from there. A fourth possibility is inserting a non-te~inalat the frontof the prediction, toforce a match, but this would again lead to a corrupt syntax tree. So we need a better strategy, one that guarantees that at least one input token will be consumed to prevent looping and that nothing will be discarded from or inserted into the prediction stack, to prevent c o ~ p t i n gthe syntax tree. We will now discuss such a strategy, the acceptable-set method. is actually a framework for systematically constructing a safe ling, 1985). It centers onan‘acceptableset’ of tokens,and consists of three steps, all of which are pedormed after the error has been detected. three steps are:
-Step 1: construct the
acceptable set A from the state of the parser, using some suitable algorithm C; it is required thatA contain theend-of-file token; -Step 2: discard tokens fromthe input stream untila token tA from the setA is found; -Step 3: resynchronize the parserby advancing it untilit arrives in a state in whichit consumes the token tA from the input, using some suitable algorithm R;this prevents looping. Algorithm C is a parameter to the method, and in principle it can be determined freely. The second step is fixed. Algorithm R,which is used in Step 3 to resync~onizethe parser, must fit in with ~gorithmC used to construct the acceptable set. In practice this means thms C and R havetobe d together, The acceptable set is sometimes ow set andthetechnique -seterror reco~ery,buttoavoidconfusion with the FOLLOW set described in Section 2.2.4.1 and the FO~LOW-seterror recovery *
described below, we will not use these terms. A wide range of algorithms presents itself for Step l, but the two simplest possibilities, those that yield the si leton {end-of-file} or the set of all tokens, are unsuitable: all input andnoinputwill b iscarded, respectively, and in both cases it is difficult to see how to advance the parser to accept the token tA. The next possibility is to take the empty algorithm for R. This means that the state of the parser must be corrected by Step 2 alone and so equates the acceptable set with the set of tokens that is correct at the moment the error is detected, Step 2 skips all tokens until a correct token is found, and parsing can continueimmediately. The disadvantage is thatthismethodhasthetendencyagainto throw away important s ~ ~ t u r i ntokens g like procedure or ) ,after which the situation is beyond redemption. The term for this technique is quite appropriate. Another option is to have t writer d e t e ~ i n ethe acceptable set by hand. If, for example, expressions in a language are always followed by ) ,; ,or ,,we can store set whenever we start parsing an expression. this set in a global variable Acceptab1e Then, when we detect anerror, we skip the input untilwe find a token that is in Accept able set (Step 2), discard the fragment of the expression we have already parsed and insert a dummy expression in the parse tree (Step3) and continue the parser. This is sometimes called the ‘acceptable-set method’ ina more narrow sense. Although it is not unusual in recursive descent parsers to have the acceptable sets chosen by hand, the choice can als : use the F O L ~ O sets~ of the nonerror recovery (Hartmann, 197’77; terninals. This approach is called embert ton, 1980). methods are relatively easy to implement but have the disadvantage that thereis tee that the parser can indeed consume the input token inStep 3. For example, if we are parsing a program in the language C and the input contains a ( b + int ; c ) ,a syntax error is detected upon seei the int,which is a keyword, not an identifier, in C, Since we are at that moment parsi expresS ion,the FOLLOW set does not contain a token int but it does contain a olon. So the int is skipped in Step 2 but the semicolon is not. Then a dummy expression is inserted in Step 3 to replace +,This leaves us with a ( -dummy-expres s ion- ; c) inwhichthesemicolon still cannot beconsumed. The reason is that, although in general expressions may indeed be followed by semicolons, which is why the semicolon is in the FOLLOW set, expressions in parameter C syntax. lists may not, since a closing parenthesis must intervene according to Another problem with FOLLOW sets is that they are often quite small; this results in s ~ p p i n glarge chunks of text. So the FOLLOW set is both too large and not large enough to serve as the acceptable set, Both problems can be remedied to a large extent by basing the acceptable set on continuations, as explained in the next section.
A~
l a ~~t o ~y a tui c c ~ ~ t u ~ l ~~~~0~ ~ - s e t ~~s~~ on contin~~tions
The push-down automaton i~plementationof an LL(1) parser shows clearly what material we have to work with when we encounter a syntax error: the prediction stack and the first few tokens of the rest of the input. More in detail, the situation looks as follows:
Prediction stack: Input:
A B
C EOF
i ...
in which we assumefor the mol~entthat the prediction starts with non-ter~inal, a A. there is a syntax error, we know that A has no predicted alternative on the input to but to guarantee a correct parse tree, we have to make sure that the prediction on the stack comes true. Something similar applies if the prediction starts with tae ~ n a l . Now suppose for a moment that the error occurred because the endof input has been reached; this simplifies our problem temporarily by reducing one of the p~ticipants,the rest of the input, to a single token,EOF. In this case we have no option but to construct the rest of the parse tree out of thin air, by corning up with predictions for the required nont e ~ i n a l sand by inserting the required terminals. Such a sequence of terrninals that will completely fulfill the predictions on the stack is called a c of that stack (Rohrich, 1980). A continuation can be constructed for a given stack by replacing each of the nonterminals on the stack by a terrninal production of it. So there are almost always infinitely many continuations of a given stack, and any of them leads to an acceptable set in the way explained below. For convenience and to minimize the number of terminals we have to insert we prefer the shortest continuation: we want the shortest way out. tinuation can be obtained by predicting for each non-terminal on the stack the alternative that produces the shortest string. How we find the alternative that produces the shortest string is explained in the next subsection. We nowimaginefeedingthechosencontinuationtotheparser. This willcausea number of parser moves, leading to a sequence of stack configurations, the last of which terminates the parsing process and completes the parse tree. The above situation could, for example, develop as follows: A B C EOF p Q B C EOF Q B C EOF q B C EOF B C EOF
...
EOF &
(say A "+ pQ istheshortestalternative of A) (inserted p matched) is (say Q "+ $3.is theshortestalternative of Q ) (inserted qmatched) is
(always-present matched) EOF is ends) process parsing (the
Each of these stack configurations has a F1 ST set, which contains the tokens that would be correct if that stack configuration were met. We take the union of all these sets as the acceptableset of theoriginalstackconfiguration A B C EOF. The acceptablesetcontains all tokensin the shortest continuation plus the first tokens of all side pathsof that continuation. It is important to note that such acceptable sets always include the EOF token; see Exercise 2.45. We now return to our original problem, in which the rest of the input is still present and starts with i. After having determined the acceptable set (Step l),we do the following:
// Step 1: construct acceptable set: SET Acceptable set TO Acceptable set of (Prediction stack); // Step 2: skip unacceptable tokens: WHILE Input token [Input token index] IS NOT IN Acceptable set: MESSAGE "Token skipped:( I , Input token [Input token index] ; SET Input token index TO Input token index + 1;
// Step 3: resynchronize the parser: SET the flag Resynchronized TO False; WHILE NOT Resynchronized: Pop top of Prediction stack into Predicted; IF Predicted is a Token class: // Try a match move: IF Predicted = Input token [Input token index] .class: + 1; // matched SET Input token index TO Input token index SET Resynchronized TO True; // resynchronized! ELSE Predicted /= Input token: Insert a tokenof class Predicted, including representation; MESSAGE IIToken inserted of class l @ , Predicted; ELSE Predicted is a Non-terminal: // Do a prediction move: SET Prediction TO Prediction table [Predicted, Input token [Input token index]]; IF Prediction= Empty: SET Prediction TO Shortest production table [Predicted]; // Now Prediction /= Empty: PUSH The symbols of Prediction ON Prediction stack; Figure 2.71 Acceptable-set enor recovery in a predictive parser.
Step 2: skip unacceptable tokens: Zero or more tokens from the input are discarded in order, until we meet a token that is in the acceptable set. Since the token EOF is always acceptable, this step terminates. Note that we may not need to discard any tokens at all: the present input token may be acceptable in oneof the other stack configurations. Step 3: resynchronize the parser: We continue parsing with a modified parser. This modified parser first tries the usual predict or match move. If this succeeds the parser is on the rails again and parsing can continue normally, but if the move fails, the modified parser proceeds as follows. For a non-terminal on the top of the prediction stack, it predicts the shortest alternative, and for a teminal it inserts the predicted token. Step 3 is repeated untila move succeeds and the parseris resynchronized. Since the input token was in the 'acceptable set', it is in the FIRST set of one of the stack configurationsconstructed by therepeatedSteps 3, so resynchronization is guaranteed.
The code canbe found in Figure 2.71. The parser has now accepted one token, and the parse treeis still correct, provided we producedthepropernodes for thenon-terminalstobeexpandedandthetokenstobe inserted. We see that this approach requires the user to supplya routine that will create the tokens to be inserted, with their representations, but sucha routine is usually easy to write.
i ~ dthei a~ter~ative ~ ~ with the shortest ~ r o d ~ ~ t i o n Each alternative of each non-terminal in a grammar defines in itself a language, a set of strings.Weareinterestedhereinthelength of theshorteststringin each of these languages. Once we have computed these, we know for each n o n - t e ~ n a which l of its alternatives produces the shortest string; if two alternatives produce shortest strings of the same length, we simply choose one of them. We then use this infor~ationto fill the array Shortest production table[]. The lengths of the shortest productions of all alternatives of all non-terminals can be computed by the closure algorithm in Figure 2.72. It is based on the fact that the length of shortest productions of an alternative N+AB ...is the sum of the lengths o f the shortest productions of A, El, etc. The initializations Ib and 2b set the minimum lengths of empty alternatives to 0 and those of terminal symbols to l. All other lengths are set to 00, so any actual length found will be smaller. The first inference rule says that the shortest lengthof an alternative is the sum of the shortest lengths of its components; more complicated but fairly obvious rules applyif the alternative includes repetition operators. The second inference rule says that the shortest length of a n o n - t e ~ i n a is l the minimum of the shortest lengths of itsalternatives.Notethat we have imple~entedvariables as (name,value) pairs. Figure2.73showsthe table Shortest production table [1 for thegrammar of Figure 2.53. The error recovery steps on parsing (i++i ) + i are given in Figure 2.74, The figure starts at the point at which the error is discovered and continues until the parser is on the rails again. Upon detecting the error, we determine the acceptable set of thestack expression ) ’ rest-expression EOF tobe { IDE~TIFIER( + OF }. So, we see that we do not need to skip any tokens, since the + in the input is in the acceptable set: it is unacceptabletonormalparsingbut is acceptable totheerror recovery. Replacing expression and term with their shortest productions brings the terminal IDENTI ERF1to the top of the stack. Since it does not match the +,it has to be inserted. It will matched be instantaneously, which brings the non-terminal rest-expression to the topof the stack. Since that non-terminal hasa normal prediction for the + symbol, the parseris on the rails again. We see that it has inserted an identifier between the two pluses.
accepts as input a g r a that ~ is~more or less LL(1), inter code. The non-terminals in the g r a ~ m a can r have parameters, and rules can have local variables,bothagainexpressedin C. Formally,thesegments of C code correspondto onymous &-producing rules and are treated as such, but in practice an LL(1) g r ~ m a r reads like a program with a flow of control that always chooses the right alternative. In
1. A set of pairs of the form (productionrule, integer). 2a. A set of pairs of the form (non-terminal, integer). 2b. A set of pairs of the form (terminal, integer).
I~itiali~~tion~: 1a. For each production rule N”+A ...A, with n 30 there is a pair (N+A ...A,
.I
m ) .
lb. For each production ruleN”+E there is a pair (N+E, 0). 2a. For each non-te~inalN there is a pair ( N , m). 2b. For each terminalT there is a pair (T, l), 1, For each production rule N+A l ...A, with n >0, if there are pairs (A l, I l)to (A,, I,) with all Zi <m,the pair (N+A ... A,, IN) must be replaced by a pair (N+A ...A,, I,,,) where I,,, =
Ei
provided I,,,
i=l
2. For each non-terminal N, if there are one or more pairs of the form (N+a, Zi) with &<m, the pair (N, IN) must be replaced by ( N , I,,,) where I,,, is the ~nimum of the lis, provided I,,,
Shortest
Alternative Non-terminal
EOF expression input rest-expression term expression term I~ENTIFIE~ parent~esize~-ex~ression’ expression E rest-expression ( F
2 1 1
3 0
Figure 2.73 Shortest production table for the grammar of Figure 2.53.
this model, the C code is executed when the flow of control passes through it. In addition9 as explained in Section 2.2.4.3, to cover the cases in which the input g r is not ~entirely ~LL(l), and automatic error correction as explained in Section 2.2.4.5, which edits any syntactically incorrect p r o g r a ~into a syntactically correct one. The grammar may be distributed over multiple files, thus allowing a certain degree of m o d u l ~ t y .Each module file may also contain C code that belongs specifically to that module; examples are ~efinitionsof routines that are called in the C segments in the gran-
LLgen features dynamic conflict resolvers
Errordetected,since Prediction table [expression,+’I is empty: Prediction stack: expression ) ‘ rest-expression EOF Input: ‘+’ IDENTIFIER ’1’ ‘+’ IDENTIFIER EOF Shortest production€or expression: Prediction stack: term rest-expression ’) ’ rest-expression Input: ‘S’ IDENTIFIER ’1‘ ’+’ IDENTIFIER EOF
EOF
Shortest production€or tern: Prediction stack: IDENTIFIER rest-expression ) rest-expression EOF Input: ‘ c ‘ IDENTIFIER ‘)’’+’ IDENTIFIER EOF Token I ~ E ~ T ~ Iinserted E R in the input and matched: Prediction stack: rest-expression I ) ’ rest-expression Input: ‘+‘ IDENTIFIER ‘)’ ‘+‘ IDENTIFIER EOF
EOF
Normal prediction €or rest-expression, resynchronized: Prediction stack: ’+‘ expression ) rest-expression EOF Input: ’+’ IDENTIFIER ‘1’ ‘+‘ IDENTIFIER EOF
Figure 2.74 Some steps in parsing (i ++i ) +if
mar. LLgen translates each module file into a source file in C. It also generates source and include files which contain the parsing mechanism, the error correction mechanism and some auxiliary routines. Compiling these and linking the object files results in a parser. For a simple example of the application of LLgen we turn towards the minimal nonleft-recursive grammar for expressions that we derived in Section 2-2-43, and which we repeat in Figure 2.75. We have completed the grammarby adding an equally minimal rule for term,one that allows only identifiers having the value 1. In spite of its minimality, the grammar shows all the features we need for our example. It is convenient in LLgen to use the parameters for passing to a rule a pointer to the location in which the rule must deliver its result, as shown. We need only a few additions and modifications to turn these two rules into working LLgen input, and the result is shown in Figure 2.76. The first thing we need are local variables to store the intermediate results. They are supplied as C code right after the parameters of thenon-terminal:this is the {int t ;} intherules expression and expresS ion-t ai l-opt We also ion. need to modify the actual parameters to suit C syntax; the C code segments remain unchanged. Now the grammar rules themselves arein correct LLgen form. Next, we need a start rule: main, It has one local variable, result,which receives the result of the expression, and whose value is printed when the expression has been i r ~ % S~ tart, ~ i which ~ tells ~ the parsed.Readingfurtherupward, we findthe parser generator that main is the start symbol and that we wantits rule converted into a C routine called Main-Program () . The directive %token registers IDENTIFIER as a token; otherwise LLgen would assume it to be a non-terminal from its use in the rule for
expression (int *e) -+ term(int *t) {*e = *t;} expression-tail-option(int *e) I
expression-tail-option(int *e) -+ I - l term(int *t) {*e expression-tail-option(int *e)
I
-= *t;}
&
I
term(int *t) -+ IDENTIFIER {*t = l;}; Figure 2.75 Minimal non-left-recursive grammar for expressions.
%lexical get-next-token-class; %token IDENTIFIER; %start Main-Program, main; main {int result;}: expression (&result) {printf ("result
= %d\n"I result);}
I
expression (int *e) {int t;} : term(&t) expression-tail-option(e)
{*e = t;}
I
expression-tail-option(int *e) {int t;}: {*e term(&t) -= t;} expression-tail-option(e)
I
I
term(int *t): IDENTIFIER
{"t = l;}
I
Figure 2.76 LLgen code for a parser for simple expressions.
term. And finally, the directive %lexical identifies the C routine int get-next-token-class () as the entry point in the lexical analyzer from where to obtain the streamof tokens, or actually token classes. The codefrom Figure 2.76 resides in a file called parser.g. LLgen converts this file to one called parser:.c, which contains a recursive descent parser. The code is essentially similar to that in Figure2.61, com-
licated sli~htlyby the error recovery code. When compiled, it yields the desired parser. arser .g also contains some auxiliary code, shown in Figure 2.77. The extra set of braces identifies the enclosed part as C code. The first item in the code is the ~andatC o ~routine main ( ) ;it starts the lexical engine and then the generated parser, %start directive. The rest of the code is dedicated usingthenamespecifiedinthe almost exclusively to error recovery support.
{ #include
tflex.hfl
int main(void) { start-lex ( ) ; Main-Program ( 1
; return 0 ;
} Token-Type Last-Token; /* error recovery support*/ int Reissue-Last-Token = 0; /* idem */ int get-next-token-class (void) { if (Reissue-Last-Token) { Token = Last-Token; Reissue-Last-Token = 0;
1
1
else get-next-token(1 ; return Token.class;
void insert-token(int token-class) { Last-Token = Token; Reissue-Last-Token = 1; Token.class = token-class; /* and set the attributes ofToken, if any */
1
void print-token (int token-class) { switch (token-class) { case IDENTIFIER: printf (tfIDENTIF~ERil) ; break; case EOFILE : printf (ttcEOF>tf) ;break; default : printf (ft%cfl, token-class) ; break;
1
include
1
ifLLmessage. iff
igure 2.77 Auxiliary C code for a parser for simple expressions.
~~~~n requires the user to supply a routine,LLmessage (int ) to assist in the error correction process. The routine LLmes sage (int ) is called by LLgen when an error has een detected. On the one hand,it allows the user to report the error, on the other it places
an obligation on the user: when a token must be inserted, it is up to the user to construct that token, including its attributes. The int parameter cl asS to LLmes sage ( ) falls into one of three categories:
-class
r 0:
It is the class of a token to be inserted. "he user must arrange the sitoation as if a token of class clasS had just been read and the token that was actually read were still in the input. In other words, the token stream has to be pushed back over one token. If the lexical analyzer keeps a record of the input stream, this will require negotiations with the lexical analyzer. -class == 0: The present token, whose class can be found in LLsymb,is skipped by LLgen. If the lexical analyzer keeps a record of the input stream, it must be notified; otherwise no further action is required from the user. -class == -1: The parsing stack is exhausted, but LLgen found there is still input left. LLgen skips the rest of the input. Again, the user may want toinfo^ the lexical analyzer.
void LLmessage(int class) { switch (class) { default: insert-token (class); printf ("Missing token I f ) ; print-token (class) ; printf ( 1 1 inserted in front of token l!) ; print-token(LLsymb) ; printf (rl\n") ; break; case 0: printf ("Token deleted: !I ; print-token (LLsymb) ; printf ( lt\nll ); break; case -1: printf ("End of input expected, but found print-token (LLsymb) ; printf ( i 1 \ n 9 ; break;
1
token' I )
;
1
Figure 2.78 The routine LLmes s age () required by LLgen.,
The codefor LLmessage () used in Figure 2.77 is shown in Figure 2.78. Pushing back the input stream is the difficult part, but fortunately only one token needs to be pushed back. We avoid negotiating with the lexical analyzer and implement a one-token buffer Last-Token. in the routine get-next-token-class ( ) ,which is
the usual packaging of the lexical analyzer routine yielding the classof the token. The use of this buffer is controlled by a flag Rei ssue-Las t-Token, which is switched on in the routine insert-token ( ) when the token must be pushed back. When the subsequent call of get-next-token-c las S ( ) finds the flag on, it reissues the token and switches the flag off. A sample run with the syntacticallycorrect input i -i -i gives the output result =
-1
and a mn with the incorrect input i i -i gives the messages Token deleted: IDENTIFIER result = 0
o ~ s t r ~ c tai correct n~ parse tree wit^
U
t r a n s f o r ~ ~e ~ ~
a
~
~
u
In a subsection of Section 2.2.43 we suggested that it is possible to construct a correct parse tree even with a transformed ramm mar. Using t e c h n i ~ u ~similar s to the ones used above, we will now indicate how to do this. The original g r ~ m a for r simple expressions with code for constructing parse trees can be found in Figure 2.79; the definitions of the nodetypes of theparsetreearegiveninFigure2.80.Eachrule creates thenode corresponding to its non-terminal, and has one parameter, a pointer to a location in which to store the pointer to that node. This allows the nodefor a non-terminal N to be allocated by the rule for N , but it also means that there is one level of indirection more here than meets the eye: the node itself inside expression is represented by the C expression (*ep) rather than by just ep. Memory for the node is allocated at the beginning of each alternative using calls of new-expr ( ) ;this routine is defined in Figure 2.80. Next the node type is set. The early allocationof the node allows thefurther members of an alternative to write the pointers to their nodes init. All this hinges on thefacility of C to manipulate addresses of fields inside records as separateentities.
expression (struct expr **ep) : { (*ep) = new-expro; (*ep)->type = expression (& (*ep)->expr) - term(&(*ep) ->term) I { (*ep) = new-expro; (*ep)->type = IT,;} term(&(*ep) ->term) I
-
/
;
}
/
term(struct term **tp) : { (*tp) = new-term0 ; (*tp)->type = ‘I!;} I~ENTIFI~R I
Figure 2.79 Original grammar with code for const~ctinga parse tree.
r
struct expr { /* or FT' */ int type; */ struct expr *expr; /* for 'TI */ struct term *term; /* forand I - 1
' - l
>;
#define new-expro ((struct expr *)malloc(sizeof (struct expr) )) struct term { int type;
/*
1;
'1'
only */
#define new-term0 ((struct term *)malloc(sizeof (struct term))) extern void print-expr(struct expr *e); extern void print-term(struct term *t); Figure 2.80 Data structures for the parse tree.
The code in Figure 2.79 hasa serious LL(1) problem: it exhibits hidden left-recursion. is hidden by the C code { (*ep) = The left-recursion of the rule express ion new-expr ( ) ; (*ep)->type = - ;},which is a pseudo-ruleproducing E. This hidden left-recursion prevents us from applying the left-recursion removal technique from Section 2.2.4.3. To turn the hidden left-recursion into visible left-recursion, we move the C code to after express ion; this requires storing the result of expresS ion ternporarily in an auxiliary variable, e-aux. See Figure 2.81, which shows only the new rule for express ion; the one for tertn remains unchanged.
expression (struct expr **ep) : expression (ep) { struct expr *e-aux = (*ep); (*ep) = new-expr (1 ; (*ep)->type =
1
I
I - I
-
I
(*ep)->expr = e-aux;
->term) ( ) ; (*ep)->type = ' T' ;} term(&(*ep) ->term) 1 - 1
term(&(*ep)
{ (*ep) = new-expr
I
igure 2.81 Visibly left-recursive grammar with codefor constructing a parse tree.
Now that we have turned the hidden left-recursion apply the technique from Section 2.2.4.3,We find that
into direct left-recursion we can
N = expression(struct expr **ep) a =
{
1
P=
struct expr *e-aux = (*ep); (*ep) = new-expr (1 ; (*ep)->type =
- I
I
(*ep)->expr = e-aux;
'-' term(&(*ep) ->term) {(*ep) = new-expro; (*ep)->type = terrn(&(*ep) ->term)
{TI;}
which results in the code shown in Figure 2.82. Figure 2.83 shows what the new code herule expression-tail-option is calledwiththeaddress (ep) of a pointer (*ep) to the top node collected thus far as a parameter (a). When another term is found in the input, the pointer to the node is held in the auxiliary variable e-aux (b), a new node is inserted above it (c), and the old node and the new term are connected to the new node, which is accessible through ep as the top of the new tree, This technique concts proper parse trees in spite of the grammar transfo~ationrequired for LL(1) pars-
expression (struct expr **ep) : { (*ep) = new-expro; (*ep)->type = I T ( ; } term(&(*ep) ->term) expression-tail-option(ep) xpression-tail-option(struct expr **ep): { struct expr *e-aux'= (*ep); (*ep) = new-expro; (*ep)->type =
1
I
!
-
I
;
(*ep)->expr = e-aux;
'-' term(&(*ep) ->term) expression-tail-option(ep)
I
ure 2.82 Adapted LLgen grammar with code for const~ctinga parse tree.
A sample run with theinput i-i -i yields f ( (I ) -I ) -I ) ;here i is just an identifier and I is the printed representationof a token of the class IDENTIFIER.
-
available LL(1) using the fundarnenpracticallyimportant allow the LR( 3 ) or
*ep :
3 Tree t r ~ s f o r ~ a t i operformed n by expression-tail-option.
LALR(1) parser construction technique to result in a parser; those that do not are said to exhibit LR(1) or LALR(1) conflicts, and measures to deal with them are discussed in Section 2.2.5.7, Techniques to inco~orateerror handling in LR parsers are treated in Section
2.2.5.8. An example of the use of a traditional bottom-up parser generator concludes this section on the creationof bottom-up parsers. Roadmap 2.2.5 Creating a bottom-up parser automatically 2.2.5.1 LR(0) parsing 2.2.5.3 LR(0) conflicts 2.2.5.4 SLR( 1) parsing 2.2.5.5 LR( 1)parsing 2.2.5.6 LALR(1) parsing 2.2.5.7 Making a grammar LR(1) -or not 2.2.5.8 Error handling inLR parsers 2.2.5.9 A traditional bottom-up parser generator-yacchison
The main task of a bottom-up parser is to find the leftmost node that has not yet been constructed but all of whose children have been constructed. This sequence of children is le, because this is where we get hold of the next node to be constructed. for a parent N and connecting the children in the handle to that node is ing the handle to N. In Figure 2.52, node 1, terminal t 6 , and node 2 together form the handle, which has just been reduced to node 3 at the moment the picture was taken. To construct that node we have to find the handle and we have to h o w to which right-hand side of which non-terminal it corresponds: its reduction rule. It will be clear that finding the handle involves searching both the syntax tree as constructed so far and the input. Once we have found the handle and its reduction rule, our troubles are over: we reduce the handle to the non-terminal of the reduction rule, and restart the parser to find the next handle. Althoughthere is effectivelyonly one deterministictop-downparsingalgorithm, LL(k), there are several different bottom-up parsing algorithms. All these algorithms differ onlyinthewaytheyfindahandle;thelastphase,reduction of thehandletoanonter~inal,is the same for each of them. We mention the following bottom-upalgorith~shere: precedence ~ a ~ s:pretty i ~ g we&, but still used in simple parsers for anything that looks like an a~thmetic e~press~on; ounded-context with k tokens left context andm tokens right context; reason,very popular in the1970s, especially BC(2,1), but now out of fashion; L R ( ~theoretically ~: impo~antbut too weak to be useful; SLR(I):an upgraded versionof LR(O), but still fairly weak; LR(I):like LR(0) but both very powerful and very memory-consu~ng;and ~ L ~a slightly ( ~ watered-down ~ : version of LR(I), which is both powerful and usable: the workhorseof present-day bottom-up parsing. We will first concentrate onLR(O), since it shows all the principlesin a nutshell. The steps to LR( 1)and from there to LALR()lare then simple. It turns out that finding a handle is not a simple thing to do, and all the above algorithms, with the possible exception of precedence parsing, require so much detail that it is
humanly impossible to write a bottom-up parser by hand: all bottom-up parser writing is done by parser generator.
One of the immediate advantagesof bottom-up parsing is that it has no problems with leftrecursion. We can therefore improve our grammar of Figure 2.53 so as to generate the proper left-associative syntax tree for the + operator. The result is left-recursive -see Figure 2.84. We have also removed the non-terminal parenthesized-expression by substituting it; the grammar is big enough as it is.
input -+ expression EOF expression "+ term I expression ' S ' term term -+ IDENTIFIER 1 expression ' )' Figure 2.84 A simple grammar for de~onstratingbottom-up parsing.
LR parsers are best explained using diagrams with item sets in them. To keep these diagrams manageable, it is customary to represent each non-terminal by a capital letter and each terminal by itself or by a single lower-case letter. The end-of-input token is traditionally represented by a dollar sign. This form of the grammar is shown in Figure 2.85; we have abbreviated the input to Z, to avoid confusion with the i, which stands for IDENTIFIER.
Z + E $ E + T I E + T T + i I ( E ) Figure 2.85 An abbreviated form of the simple g
r
a for~bottom-up ~ parsing.
In the beginning of our search for a handle, we have only a vague idea of what the handle can be and we need to keep track of many different hypotheses about it. In lexical analysis, we used dotted items to summari~ethe state of our search and sets of items to represent sets of hypotheses about the next token. LR parsing uses the same technique: item sets are kept in which each item is a hypothesis about the handle. Where in lexical analysis these item sets are situated between successive characters, here they are between successive g r a ~ m a rsymbols. An N+a. p between two grammar symbols as a possiblehandle,thatthis ap is tobe meansthat we maintainthehypoth reduced to N when actually found applicable, and that the part a has already been reco ctly to the left of this point. When the dot reaches the right end of the item, as we have identified a handle. The members of the right-hand side ap have all
ns to ~ y n t tree ~x
-th
nized, since the item has been obtained by moving the dot successively over each m e ~ b e of r them, These members can now be collected as the children of a new node ,an item with the dot at the end is called a reduce item; the
*
The various LR parsing methods differ in the exact formof their LR items, but not in ethods of usingthem. So there are LR(0) items, SLR(1) items, LR(1) itemsand l) items, and the methods of their construction differ, but there is essentially only
trate how LR items are used to do bottom-up parsing. Assume e are interested in theinitial item set, the setof hypotheses about the handle we have before the first token, Initially, we know only one node of the tree: the es us the first possibility for the handle: %+*E$, which means that if we ognize an E followed by end-of-input, we have found a handle which we can reduce to Z, the top of the syntax tree. But since the dot is still at the beginning of the t-hand side, it also means that we have not seen any of these g ~ a ~ symbols a r yet. first we need to see is an E. The dot in front of the non-te~inalE suggests that we e wrong symbol atthe moment and that the actual handle mayderive two new items to the initial item set, one for each alternative of E: -FT, which describe two other hypotheses about the handle. Now we f another non-terminal T,which suggests that perhaps the handle adds two more items to the initial item set: T-+*i and T"+* ( E ) . suggests also that the handle could derive from E, but we h e w that introduces no new hypotheses. So our initial item set, $ 0 , contains five hypotheses about the handle:
s with a lexical analyzer, the initial item set is positioned before thefirst input symbol:
where wehave left open spaces between the symbols to accommodate thefuture item sets. that the four additional items in the item setso are the result of E moves, moves made by the handle-se~chingautomaton without consuming input. As before, theE moves are p e r f o ~ e dbecause the dot is in front of something that cannot be matched directly. The const~ctionof the complete LR item is also very similar to that of a lexical item set: the initial co~tentsof the item set are brought in from outside and the set is completed by an E closure algo~thm.An E closure algorithm for LR item sets is given in FigTo be more precise, it is the E closure al orithm for LR(0) item sets and s o is an (0) item set. Other E closure algorithms will be shown below. The E closure algorithm expects the initial contents be brought in from elsewhere. or the i ~ i ~ iitem a l set s o this consists of the item Z+* where S is the start symbol of mar and $ represents the end-of-input. The important part is the inference rule: it
A set S of LR(0) items.
S is prefilled externally with one or moreLR(0) items. If S holds an item of the form P+a*NP, then for each production rule N+y in G, S must also contain the itemN+*y. 6
E closure ~ g o r for i LR(0) ~ ~ item sets for a grammar
G
predicts new handle hypotheses from the hypothesis that we are looking for a certain nonterminal, and is sometimes called the ictio~~ ~it corresponds 1 ~ ; to an E move, in that it allows the a~tomatonto move to anotherstate without consuming input. Note that the dotted items plus the prediction rule represent a top-down component in our bottom-up algorithm. The items in an item set form one or more sets of top-down predictions about the handle, ultimately deriving from the start symbol. Since the predictions are kept here as hypotheses in a set rather than being t r a n s f o ~ e dimmediately into syntax tree nodes as they are in the LL(1) algorithm, left-recursion does not botherus here. Using the same technique as with the lexical analyzer, we can now compute the contents of the next item set s ,the one between the i and the +. There is only one item in so in which the dot can be moved over an i: T-+ i. Doing so gives us the initial contents of the new item set s l: { T+i }. Applying the prediction rule does not add anything, so this is the new item set. Since it has the dot at theend, it is a reduce item and indicates that we have found a handle, More precisely, it identifies i as the handle, to be reduced to T using the rule T+i. When we perform this reduction and construct the correspondingpart of the syntax tree,the input looks schematically as follows:
i
Having done one reduction, we restart the algorithm, which of course comes up with the same valuefor so,but now we are looking at then o n - t e ~ i n aT l rather than at the unreduced i. There is only one item in so in which the dot can be Doing so gives us the initial contents of a new value for s :{ E the prediction rule does not add anything,so this is the new item set; it contains one reduce item. After reduction by E+T, the input looks as follows:
I
m t o ~ e to ~ s y ~ t tre ~x
T
i
and it is quite satisfying to see the syntax tree grow. get a really different initial value for S the set
Rest~tingthe algorithm, we finally
&-+E.$ E"+E@+T
We now have:
T
i
The next token in the input is a +. There is one item in S that has the dot in front of a +: +T. So the initial contents of s2 are { E+E+ 'I:}. Applyingthepredictionrule yields two more items,for a total of three for s2:
Going through the same motionsas with so and again reducing thei to T,we get:
T
i
i
Now there is one item ins2 in which the dot can be carried over a T:E+E+ },which identifies a new handle, E + T, which is to be reduced to E. So we E + T is correct. finallyfind a case inwhichourhypothesisthatthehandlemightbe ~ e ~ e m b that e r this hypothesis already occurs in the construction of so. ~ e r f o ~ i ntheg reduction we get:
E
+
l l i
T
l
i
T
which brings us back to a value of S l that we have seen already: !&+E.$ E-+E*+T
Unlike last time, the next token in the input is now the end-of-input token $. Moving the dot over it gives us s2, { Z+E$ >,which contains one item, a reduce item, shows that a handle has been found, and says that E $ must be reduced to Z:
/l\+
E
l
T
I
T
1i
i
This final reduction completes the syntax tree and ends the parsing process. Note how the LR parsing process (and any bottom-up parsing technique for that matter) structures the input, which is still there in its entirety.
~ r e ~ o ~the~ i ~t etset ~i ~ g The above demonstration ofLR parsing shows two major features that need to be discussed further: the computationof the item sets and the useof these sets. We will first turn to the computationof the item sets. The item sets of an LR parser show considerable similarities to those of a lexical analyzer. Their number is finite and not e m b ~ a s s i n ~ large ly andwecandefineroutines Initial item set ( ) and Next item set ( ) with meanings co~espondingto those in the lexical analyzer. We can therefore precompute the contents of all the reachable item sets and the values of Initial it em ( ) set and Next i tem () set for all theirparameters.Eventhebodies of thetworoutines for LR(0) items, shown in Figures 2.87 and 2.88, are similar to those for the lexical analyzer, as we can see when we compare them to the ones in Figures 2.22 and2.23, One difference is that LR item sets are moved over grammar symbols, rather than over characters. Thisis reflectedinthe first parameter of Next item set ( ) ,whichnow is a Symbol.
noth her is that there is no need to test ifS is a basic pattern (compare Figure2.23). This is because we have restricted ourselves here to grammars in BNF notation. So S cannot be a non-basic p a t t e ~ if, ; however, we allow EBiW, the code in Figure 2.88 will have to take etition and combination operators into account.
F ~ C T I O NInitial item set RETURNING an item set: SET New item set TO Empty;
// Initial contents - obtain from the start symbol: S+a for the start symbol S: FOR EACH production rule SET New item set TO New item set+ item S+@a; RETURN E closure
(New
item
Figure 2.87 The routine Ini i a1t it
Set) ; S em et for an LR(0) parser.
F ~ C T I O NNext item set (Itemset, Symbol) RETURN IN^ an item set: SET New item set TO Empty;
// Initial contents - obtain from token moves: FOR EACH item N+a@SP IN Item set: IF S = Symbol: SET New item set TO New item+ set item N+aS@P; TURN E closure (New item set); igure 2.88 The routine Next
item
(set 1 for an LIX(0) parser.
Calling Initia1 it em (set ) yields So, and repeated application of Next gives us the otherreachable item sets, in anLR analog of the lexical subset ainedinSection 2.1.6.3. The re itemsets are shown,togetherwith an sit ion in Figure 2.89. The reduce items, between them, in the by a double rim, We theiternsthat indicate that a handlehasbeenaremarked ize the sets So, S s , S 6 ,SI,S 3 , S4 and S2 (in that order) from the parsingof i+i;the ill occur in parsing different inputs. The transition table is shown in Figure 2.90, This tabular version of Next i tem t ( ) is traditionallycalledth able in LR parsing. The empty entries stand for of hypotheses; if set is obtainedwhilesearching for thehandle, theemptyset there is no ~ypothesisleft, no handle can be found, and there is a syntax error. The empty set is also called the IE) ror s t ~ t e .It is quite re~resentativethat most of the GOTO table is empty; also the non-empty part shows considerable structure. SuchLR tables are excellent ~ a n d i ~ a t for e s transition table compression.
9 Transition diagram for theLR(0) automaton for the grammar of Figure 2.85.
The use of the item sets differs considerably from that in a lexical analyzer, the reason being that we are dealing with a push-down automaton here rather than with a finite-state automaton. The LR push-down automat~nalso differs from an LL push-down automaton. Its stack consists of an alternation of states and grammar symbols, starting and ending with a state. The g r a ~ a symbols r on an LR stack represent the input that has already been reduced. It is convenient to drawLR reduction stacks horizontally with the top to the right: SO
AI
SI
A2 ... A,
S,
where A, is the n-th grammar symbol on the stack and t designates the top of the stack, Like the LL automaton, the LR automaton has two major moves and a minor move, but they are different:
-Shift: ove removes the first token from the present input and pushes
it onto the
It“ state
i
0 1
5
2 3 4 5
+
”+
GOTO table symbol ( ) $
E
T
7
1
6
2
3
4
7
5
6 7
8 9
7
5
3
8 9
6
I
ACTION table
shift shift Z+E$ shift E”+E+T T+i E+T shift shift T-+(E)
Figure 2.90 GOTO and ACTION tables for theLR(0) automa~onfor the grammar of Figure 2.85.
stack. A new state is determined using the GOTO table indexed by the old state and the input token, and is pushed onto the stack. If the new state is the error state, a syntax error has been found. Reduce: ove is parameterized with the ~roductionrule N+a to be used in the reduction. The grammar symbols in a with the states following them are removed from the stack; in an LR parser they are guaranteed to be there. N is then pushed onto the stack, and the new state is determined using theCOT0 table and pushed on top of it. In an LR parser this is guaranteed not to be the error state. Te~nation: The input has been parsed successfully when it has been reduced to the start symbol. If there are tokens leftin the input though, there isa syntax error. The state on top of thestackinan LR(0) parser determi~eswhich of thesemoves is applied. The state top indexes so-called the ,which comparable is to Class of token recognized ( )in in the lexical analyzer. Like the latter, it tells us whether we have found something or shouldgo on shifting input tokens, and if we found something it tells us what it is. The ACTION table for our grammar is shown as the rightmost column in Figure 2.90, For states that have outgoing arrows it holds the entry ‘shift’; for states that contain exactly one reduce item, it holds the co~espondingrule. We can now summari~eour demonstration of the parsing of i -ti in a few lines; see Figure 2.9 1. The code for the LR(0) parser can be foundinFiure2.92. Comp~isontoFigure 2.69 shows a clear similarity to the LL push-down automaton? but there are also considerable differences. Whereas the stack of the LL automaton contains g r a m m ~symbols only, the stack of the LR automaton consists of an alternating sequence of states and grammar symbols, starting and ending with a state, as shown, for example, in Figure 2.91 andin
Action shift reduce by T + i reduce by E+T shift shift reduce by T+i reduce by E+E+T shift reduce by Z+E$ stop Figure 2.91 LR(0) parsing of the input i-+i.
many other figures, Parsing terminates when the entire input has been reduced to the start symbol of the grammar, and when that start symbol is followed on the stack by the end state; as with the LL(1) automaton this will happen only when the EOF token has also been reduced. Otherwise, the state on top of the stack is looked up in the ACTION table. This results in ‘shift’, ‘reduce using rule N+a’, or ‘erroneous’. If the new state is ‘erroneous’ there was a syntax error; this cannot happen in an LR(0) parser, but the possibility is mentioned here for compatibility with other LR parsers. For ‘shift’, the next input token is stacked and a new state is stacked on top of it. For ‘reduce’, the grammar symbols ina are popped off the stack, including the intervening states. The non-terminal N is then pushed onto the stack, and a new state is determined by consulting the GOTO table and stacked on top of it. This new state cannot be‘erroneous’ in anyLR parser (see Exercise 2.48). Above we stated that bottom-up parsing, unlike top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with rightrecursive rules, in that the stackrnay grow proportionally to the size of the input program; maximum stack size is normally proportional to the logarithm of the program size. This is mainly a problem with parsers with a fixed stack size; since parsing time is already linear in the size of the input, adding another linear component does not much degrade parsing speed. Some details of the problem are considered in Exercise2.5 1.
The above LR(0) method would appear to be a fail-safe method to create a deterministic parser for any grammar, but appearances are deceptive in this case: we selected the grammar carefully for the example to work. We can make a transition diagram for any grammar and we can make a GOTO table for any g r m a r , but we cannot make a deterministic ACTION table for just any grammar. “he innocuous-looking sentence about the construction of the ACTION table rnay have warned the reader; we repeat it here: ‘For states that have outgoing arrows it holds the entry “shift’ ’;for states that contain exactly one reduce item, it holds the corresponding rule.’ This points to two problems: some states m a y have
IMPORT Input token
[l. .l ;
// from the lexical analyzer
SET Input token index 1; TO SET Reduction stack TO Empty stack; PUSH Start state ON Reduction stack; WHILE Reduction stack/= {Start state, Start symbol,End state} SET State TO Top of Reduction stack; SET Action TO Action table [State]; IF Action = "shift": // Do a shift move: SET Shifted token TO Input token [Input token index]; + l; // shifted SET Input token index TO Input token index PUSH Shifted token ON Reduction stack; SET New state TO Goto table [State, Shifted token .class]; // can be Empty PUSH New state ON Reduction stack; ELSE IF Action= (ffreducell, N+a) : // Do a reduction move: Pop the symbolsof a from Reduction stack; // update State SET State TO Top of Reduction stack; PUSH N ON Reduction stack; NI ; SET New state TO Goto table [State, PUSH New state ON Reduction stack; // cannot be Empty ELSE Action = Empty: ; ERROR "Error at token I t , Input token [Input token index] Figure 2.92 LR(0) parsing
with a push-down
automa~on.
both outgoing mows and reduce items; and some states may contain more than one reduce item. The first situation is called a s~i~tmre~uce CO ict, the second a re ict. In both cases the ACTION table contains entries with multiple values and the algo1thm is no longer deterministic. If the ACTION table produced from a grammar in the above way is deterministic (con~ict-free),the g r a is ~called ~ an Veryfew g r ~ m a r sare LR(0). Forexample,a g r ~ a Wr LR(0). Suppose the grammar contains the production rule A+€. be predicted by any item of the form P+a*A p. The first is a reduce item, the secondhas an arrow on A, so we have a shift-reduce conflict. And €-rules are very frequent in grammas. Even modest extensions to our exampleg r ~ a cause r trouble. Suppose we extend it to allow m a y elements in expressions, by adding the productionrule T+i [E J . When we construct the transition diagram, we meet the item set co~espondingto S 5 : T+i T+i
@
@
[E]
and we have a shift-reduce conflict on our hands: the ACTION table requires both a shift and a reduce, and the grammaris no longer LR(0). Or suppose we want to allow assignments in the input by adding the rules Z-+V :=E$ and v+i, where V stands for variable;we want a separate rule for V,since its semantics differs from thatof T+i. Now we find the item set corresponding toS5 to be T+i V+i.
and we have a reduce-reduce conflict, These are very common cases. Note that states that do not contain reduce items cannot cause conflicts: reduce items are needed both for shift-reduce and for reduce-reduce conflicts. For more about the nonexistence of shift-shift conflicts see Exercise2.49, For a run-of-the-mill ~rogramminglanguage g r a m m ~ ,onecanexpecttheLR(0) automaton to have some thousandsof states. With, say, 50 tokens in the language and2 or 4 bytes to represent an entry, the ACTION/GOTO table will require some hundreds of kilobytes. Table compression will reduce this to some tensof kilobytes. So the good news is that LR(0) tables claim only a moderate amount of memory; the bad news is that LR(0) tables are almost certainlyfull of conflicts. The above examples show that the LR(0) method is just too weak to be useful. This is caused by the fact that we try to decide from the transition diagram alone what action to perform, and that we ignore the input: the ACTION table construction uses a zero-token a one-token look-ahead, hencethenameLR(0). There arebasicallythreewaystouse look-ahead, SLR(l), LR( l), andLALR( 1). Allthreemethodsuse a two-dimensional first token of the ACTION table, indexed by the state on the top of the stack and the present input. The construction of the states and the table differ, though.
(~eRemer,1971)) parsing method has little practical significance these days, but we treatit here because we can explain it in a few lines at this stage and because it provides a good stepping stone to the far more important LR(1) method. For one thing it allows us to show a two-dimensional ACTION table of manageable proportions. The SLR(1) method is based on the consideration thata handle should not be reduce to a n o n ~ t e ~ i nNa lif the look-ahead is a token that cannot followN: a reduce item N+a is a~plicableonly if thelook-ahead is in FOLLOW(^. Consequently, SLR(1) hasthe same t~~nsition diagram as LR(0) for a given grmmar, the same GOT0 table, but a different ACTION table. Based on this rule and on the FOLLOW sets ~OLLOW(Z)= { $ } FOLLOW (E) = { ) + $ } ~OLLOW(T)= { ) + $ }
we can construct the SLR(1) ACTION table for the g r a ~ a ofr Figure 2.85. The result is shown in Figure 2.93, in which a reduction to a non-terminal N is indicated only for lookahead tokens inF O L ~ O W ( ~ .
state 0
i
shift shift
4
5 6 7 8
9
$
shift
l. 2 3
-l-
look-ahead token ( 1
shift Z+E$ shift
E+E+T T+i E+T
E+E+T T+i E+T
E+E+T T+i E+T
shift T+ (E)
T+ (E)
shift
shift shift T+ (E)
Figure 2.93 ACTION table for the SLR(1) automaton for the grammar of Figure 2.85.
When we compare the ACTION table in Figure 2.93 to the GOTO table from Figure 2.90, we see that the columns marked withnon-te~inalsare missing; non-terminalsdo not occur in the input and theydo not figure in look-aheads. Where the ACTION table has ‘shift’, the GOTO table has a state number; where the ACTION table has a reduction, the GOT0 table is empty. It is customary to superimpose the ACTION and G0TO tables in the implementation. The combined ACTI0N/~OTOtable has shift entriesof the form sN, which mean ‘shift to state W ; reduce entries rN,which mean ‘reduce using rule number W ; and of course empty entries which mean syntax errors. The A C ~ I O N / ~ Otable T ~ is also called the parse la le. It is shown in Figure 2.94, in which the following numbering of the g r ~ m arules r is used: 1: Z + E $ 2: E + T 3: E + E + T 4: T + i 5: T + ( E )
Note that each alternative counts as a separate rule. Also note that thereis a lot of structure in the ACTION/GOTO table, whichcan be exploited by a compressionalgo~thm. It shouldbe e m p ~ a s i ~ ethat d inspite of theirvisual s i m i l ~ t ythe GOTO and ACTION tables are fundamentally different. The GOTO table is indexed by a state and one grammar symbol that resides on the stack, whereas the ACTION table is indexed by a state and a look-ahead token that resides in the input. That they can be superimposed in the case of a one-token look-ahead is more or less accidental, and the trickis not available for look-ahead lengths other than1. When we now introduce a grammar rule T+i C E I ,we find that the s~ift-redu~e conflict has gone away. The reduce item T+i applies only when the look-ahead is one of
sl
state
i
0
S5
l 2 3
S2
T
s6 S3 rl
S5 r3
4 r3
5 6 r2 r2 7 S5 8 9
stack symbolAook-aheadtoken + ( ) $ E
r5
r5
S7
r3 r4 r2
S4
r4
r4 s8
S7
S3 r5
s6
S9
Figure 2.94 ACTIONlGOTO table for theSLR(1) automaton for the g r a m a s of Figure 2.85.
) , + ,and $ ,sotheACTIONtablecanfreelyspecify table will now contain theline state
5
i
-I-
T+iT+i
o shift
a shiftfor
l
$
T+i
T+i
[
. The SLR(1)
Note the reduction on ],since ] is in the new FOLLOW(T). The ACTION table is deterministic and the grammaris SLR(1). It will be clear that theSLR(1)automaton has the same numberof states as the LR(0) automaton for the same grammar. Also, the ACTIONlGOTO table of the SLR(1) automaton has the same size as the GOT0 table of the LR(0) automaton, but it has fewer empty entries. Experience has shown that SLR(3 )is a considerable improvement over LR(O), but is still far inferior to LR( 1) or LALR( 1). It was a popular method for some years in the early 19’70~~ mainly because its parsing tables are the same size as those of LR(0). It has now been almost completely supersededby LALR(1).
The reason why conflict resolution by FOLLOW set does not work nearly as well as one might wish is thatit replaces the look-aheadof a single itemof a rule N in a given LR state by FOLLOW set of N, which is the union of all the look-aheads of all alternatives of N in all states.LR(1)itemsetsaremore disc~minating:a look-aheadset is keptwitheach separate item, to be used to resolve conflicts when a reduce item has been reached. This greatly increases the strengthof the parser, butalso the size of its parse tables. The LR(1) technique will be demonstrated using the rather artificial grammar shown in Figure 2.95. The grammar has been chosen because, first, it is not LL(1) or SLR(l), so
these simpler techniques are ruled out, and second, it is both LR( 1) and LALR(1), but the two automata differ.
S + A I xb A -+ aslb I B B-+x Figure 2.95 Grammar for demonstrating the LR(1) technique.
The grammarproducesthelanguage { xb, anxbn I n>O}. Thislanguagecan of coursebeparsed bymuchsimplermeans,butthat is besidethepoint: if semantics is attached to the rules of the g r a ~ a of r Figure 2.95, we want a s ~ c t ~ r i of n gthe input in terms of that g r ~ a and r of no other. It is easy to see that theg r a m ~ a is r not LL(I): x is in FIRST@), so it is in FIR and S exhibits a FIRST~IRSTconflict on x. The g r a ~ m a ris not SLR(1) either, which we can see from the SLR(1) automaton shown in Figure 2.96. Since the SLR(1) technique bases its decision to reduce using an item N+a. on the FOLLOW set of N, these FOLLOW sets have been added to each item in set braces. We see that state S2 contains both a shift item, on b, and a reduce item, {b$ }. The SLR(l ) technique tries to solve this conflictby restricting the reduction to those look-aheads that are in FOLLOW(^). ~nfortunately, however, b is in FOLLOW(A), so it is also in FOL~OW(B),resulting in an SLR(1)shift-reduce conflict. The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item. We will write an LR(1) item thus: N+a*P{ci}, in which G is the set of tokens that can follow this specific item. When the dot has reached the end of the item, as in N+aP. {G}, the item is an acceptable reduce item onlyif the look-ahead at that moment is inG; otherwise the itemis ignored. The rules for determiningthelook-aheadsetsaresimple.Thelook-aheadsets of existing items do not change; only when a new item is created, a new look-ahead set must be determined. There are two situations in which this happens.
-When creating the initial item set:
The look-ahead set of the initial items in the initial item set So contains only one token, the end-of-file token (denoted by $),since that is the only token that can follow the start symbol of the grammar. -When doing E moves: The prediction rule creates new items for the alternatives of N in the presence of items of the form P+a.NP{ G}; the look-ahead set of each of these items is FIRST(P{G}), since that is what can follow this specific item in this specific position. Creating new look-ahead sets requiresus to extend our definition of FIRST sets to include suchlook-aheadsets. The extension is simple: if FIRST@) doesnotcontain E, FIRST( P{G}) is just equal to FIRST( p);if p can produce E, FIRST@{G}) contains all the tokens in FIRST(P), excluding E, plus the tokens in G. The E closure algo~thmfor LR(1)
6
8
Pigure 2.96 The SLR(1) auto~atonfor the grammar of Figure 2.95.
items is given in Figure 2.98. Supplying the look-ahead of $ to the start symbol yields the items S-+.A{ $ } and S-+.xb {$ },as shown in So, Figure 2.97. predict in^ items for the A in the first item gives us A-+.aAb {$ } and A-+.B {$1,both of which carry $ as a look-ahead, since that is what can follow theA in the first item, The same applies to thelast item in So:B-+.x {$ }. The first time we seea different look-ahead is in S3, in which the predictionmle for A in the first item yields A-+ aAb {b } and A+.B {b }. Both have a look-ahead b, since FIRST(b {$ })= { b }, The rest of the look-ahead sets in Figure 2.97 do not contain any surprises. We are pleased to see that the shift-reduce conflict has gone: state S2 now has a shift on b and a reduce on $. The other states wereall right already and haveof course not been spoiled by shrinking the look-ahead set. So the grammar of Figure 2.95 is LR(1).
5
Figure 2.97 The LR( 1) ~ u t o ~for ~ the t o grammar ~ of Figure 2.95,
The code for the LR(1) automaton is shown in Figure 2.99. The only difference with the LR(0) automaton in Figure 2.92 is that the Act i o n table is now indexed by the state and the look-ahead symbol. The pattern of Figure 2.99 can also be usedin a straightforward fashion for LR(K) parsers for K > 1, by simply indexing the ACTION table with morelook-aheadsymbols.Ofcourse,theACTIONtablemusthavebeen const~cted accordingly. We see that the LR(1) auto~atonis more disc~minatingthan the SLR(3 ) automaton. In fact, it is so strong that any language that can be parsed from left to right with a onetoken look-aheadin linear time can be parsed using theLR( 1) method: LR( 1) is the strongest possible linear left-to-right parsing method. The reason for this is that it can be shown (Knuth, 1965) that the setof LR items implements the best possiblebread^-first search for handles.
A set S of LR(1) itemsof the formN+a niti~li~~tions: S is prefilled externally with oneor more LR(l )items. In~erencerules: If S holds an item of the form P+a NP{ G } , then for each production rule N+y in G, S must also contain the item N-+*y{ T } , where T = FIRST( P{G}).
F i ~ ~ 2.98 re
E closure
algorithm for LR(l) item sets for a grammar G.
IMPORT Input token [l. .] ;
// from the lexical analyzer
SET Input token index TO 1; SET Reduction stack TO Empty stack; PUSH Start state ON Reduction stack; WHILE Reduction stack/= {Start state, Start symbol, End state} SET State TO Top of Reduction stack; SET Look ahead TO Input token [Input token index] .class; SET Action TO Action table [State, Look ahead]; IF Action = Ifshift": // Do a shift move: SET Shifted token TO Input token [Input token index]; SET Input token index TO Input token index + 1; // shifted PUSH Shifted token ON Reduction stack; SET New state TO Goto table [State, Shifted token .class]; // cannot be Empty PUSH New state ON Reduction stack; ELSE IF Action= ( Irreduce1', N+a) : // Do a reduction move: Pop the symbols of a from Reduction stack; SET State TO Top of Reduction stack; // update State PUSH N ON Reduction stack; NI ; SET New state TO Goto table [State, PUSH New state ON Reduction stack; // cannot be Empty ELSE Action = Empty: ERROR "Error at token I!, Input token [Input token index]; Figure 2.99 LR(1) parsing with a push-down automaton.
It is possible to define an LR(k) parser, with k > l, which does a k-token look-ahead. This change affects the ACTION table only: rather than being indexed by a state and a look-ahead token it is indexed by a state and a look-ahead string of length k. The COT0
table remains unchanged. It is still indexed by a state and one stack symbol, since the symbolinthe COT0 table is notalook-ahead; it alreadyresideson the stack.LR(k > l ) 1) parsers, but only marginallyso. If a g r a m ~ a is r not LR(l), parsers are stronger than LR( chances are slim that it is LR(2). Also, it can be proved that any lan expressed by an LR(k 31) g r ~ a can r be expressed by an LR(1) g r ~ a r LR(k . 31) parsing has some theoretical significance but has never become popular. The increased parsing power of the LR( 1) technique does not come entirely free of charge: LR(1) parsing tables are one or two orders of magnitude larger than SLR(1) parsingtables.WhereastheaveragecompressedSLR(1)automaton for a p r o g r a m ~ n g language will require some tens of kilobytes of memory, LR( 1) tables may require some megabytes of memory, with perhaps ten times that amount required during the construction of the table. This may present little problem in present-day computers, but traditionally compiler writers have been unable or unwilling to use that much memoryjust for parsing, and ways to reduce the LR(1) memory requirements have been sought. This has resulted in the discovery of LALR(1) parsing. Needless to say, memory re~uirementsfor LR(k) ACTION tables with k > 1 are again ordersof magnitude larger. A different imple~entationof LR(1) that reduces the table sizes somewhat has been presented by Fortes Griilvez (1992).
When we look carefully at the states in the LR( 1) automaton in Figure 2.97, we see that some of the item sets are very similar to some other sets, More in particular, S3and Slo are similar in that they are equal if one ignores the look-ahead sets, and so are S4 and S9, S6 and S l l , and Ss and S 12. What remains of the item set of an LR(1) state when one ignores the look-ahead sets is called the care of the LR(1) state. For example, the core of state S2 in Figure 2.97is S -+
B -+
x.b X.
All cores of LR(1) states correspond to LR(0) states. The reason for this is that the contents of the cores are determined only by the results of shifts allowed from other states, These shifts are determinedby the GOT0 table and are not influencedby the look-aheads. So, given an LR(1) state whose core is an LR(0) state, shifts from the item set in it will produce new LR(1) states whose cores are again LR(0) states, regardless of loo~-aheads. ~e see that the LR(1) states are split-up versions of LR(0) states, Of course this fine split is the source of the power of the LR(1) auto~aton,but this power is not needed in each and every state. For example, we could easily combine states S s and S12into one new state Ss,l2holding one item A-+am* {b$ },without in the least compromising the disc~minatorypower of the LR(1) automaton. Note that we combine stateswiththesamecoresonly,andwe do this by adding the look-aheadsets of the corresponding items they contain. Next we lead the transitions away from the old states and to the new state. In our example, the transitions on b in S6 and SIX leading to Ss and S 12 respectively, are moved . states S 8 and S12 can then be removed, reducing the numberof states to lead to S * , J ~The by 1.
S U ~ S E C ? l O2.2.5 ~
r e a t i ~a ~ o t t o ~~-aur s~e ar u t o ~ a t i ~ a l l y 171
Continuing this way, we can reduce the number of states considerably. Due to the possibility of cycles in the LR(1) transition diagrams, the actual algorithm for doing so is much more complicated than shown here (Pager, 1977), but sinceit is not used in practice, we will not giveit in detail.
Figure 2.100 The L K R ( 1 ) auto~ato~ for the g r m a r of Figure 2.95.
It would seem thatif one goes on combining states in the fashion described above, one would very soon combine two (or more) states into a new state that would have a conflict, since after all we are gradually throwing away the look-ahead information that we have just builtuptoavoidsuchconflicts. It turnsoutthat for theaverage p r o g r a ~ n g language grammar this is not true. Better still, one can almost always afford to combineall states with identical cores, thus reducing the number of states to that of the SLR(1) -and LR(0) -automaton. The auto~atonobtained by combining all states of an LR(1) automa-
ton that have the same cores is the The LALR(1) automaton for the grammar of Figure 2.95 is shown We see that our wholesale combining of states has done no damage: the conflict-free, and the grammar is LALR(1), as promised. The item Bretained its look-ahead $, which distinguishes it from the shift on b. The item for B that ) , an item must exist) sits does have a look-ahead of b (since b is in F O L L O ~ ( ~such safely in state $7. The contexts in which these two reductions take place differ so much that the LALR(1) method can keep them apart. It is surprising how well the LALR(1) method works. It is probably the most popular parsing method today, and has been so for at least twenty years. It combines power -it is only marginally weaker than LR(1) -with efficiency -it has the same memory requirements as LR(0). Its disadvantages, which it shares with the other bottom-up methods, will becomeclearinthechapteroncontexthandling,especiallySection3.1.7. Still, one wonders if the LALR method would ever have been discovered (LaLonde et al., 19’7l) if computers in thelate 1960s had not beenso starved of memo^. One reason why the LALR method works so well is that state combination cannot a state S with a shiftcause shift-reduce conflicts. Suppose the LALR(1) automaton has reduce conflict on the token t, S contains least at two items, a shift item A+a*tp{o} and a reduceitem B to2}. The shift item is presentin all theLR( l ) states that have been combined into S, perhaps with different look-aheads. A reduce item B+ye {C T ~ ~ Cwith T ~ } a look-ahead that includes t must be present in at least one LR(1) states, or t would not be in the LALR reduce item look-ahead set of S. implies that this LR(1) state already had a shift-reduce conflict, so the conflict was not caused by co~bining,
Although the chances for a grammartobeLR(1)aremuchlargerthanthose of being SLR(1) or LL( l), one often encounters a grammar that still is not L generallythat the grammar is anibiguous,whichimpliesthatnodeterministicmethod would work. And the reason that one wants to use ambiguous g r a m m ~ is s that they often allow the properties of the language to be stated more simply. An example is the g r a ~ a r fragment for conditional statement in C: if-statement -+ ‘if’ ’(’ expression ’)‘ statement I ‘if’ / ( ’ expression. ’ ) / statement ‘ e l s e ‘ statement statement -+ ... I if-statement 1 ...
which exhibits the ~ n ~ l else in~ problem. Now the state~ent is ambiguous, since it has two syntax trees, as shown in Figures 2.101 and 2.102, The C manual states that the tree in Figure 2.101 is the correct parsing. In the LR( 1) auto~aton, we notice the problemby observing that the state that contains the two items
item 1:
if-statement 3 (' expression 'if' ' item 2: if-statement -+ 'if' ' (' expression
... }
1 ) '
statement * { ... 'else'
l)'
statement * 'else' statement
has a shift-reduce conflict on the token e 1se. Note that such a state must exist. The item marked 'item 1' must occur in at least one state, or else the corresponding if-statement could never be recognized; call this state X, Since the first part of the if-else-state~entis the same as that of the if-statement, the if-else-statement will also lead to state X, since until the e l se has been seen, the parser cannot see any difference. So the item marked 'item 2' must also be present in stateX.
if-statement
'if' ' ('
expression '1'
x > o
statement
if-statement
(' 'if' '
expression '1'
Y ' O
statement 'else' statement
p = 0;
q = 0;
A possible syntax treefor a nested if-state~ent.
~esolvi~~ s~~t-re~uc~ co~~icts Shift-reduce conflicts are adi it ion ally solved in an LR parser generator in a way that is similar to the conflict resolution used in lexical analyzers: the longest possible sequence of g r a symbols ~ ~ is~ taken for reduction.This is verysimpletoimplement:in a shiftreduce conflict do the shift. Note that if there is more than one shift-reduce conflict in the same state, this criterion solves them all. As with the lexical analyzer, this almost always does what one wants. In the caseof the dangling else, the e 1se gets shifted andas a result is paired with the latesti f without an e l S e, thus c o n f o ~ n to g the C manual. So in gen-
if-statement
’if‘
‘(’ expression ‘1’ statement ‘else’ statement
x > o
‘if’
if-statement
q = 0;
‘(’ expression ’)’statement
Y’O
p = 0;
Figure 2,102 An alternative syntax tree for the same nested if-statement.
era1 shift-reduce conflicts are not removed from the grammar but resolved by shift preference in the parser generator. Another useful techniquefor resolving shift-reduce conflicts is the use of precedences between tokens. The word ‘precedence’ is used here in the tra~itionalsense, in which, for example, the multiplication sign hasa higher precedence than the plussign; the notion may be extended to other tokens as well in parsers. This method can be applied only if the reduce item in theconflict ends in a token followed by at most one non-terminal, but many do. In that case we have the following situation which hasa shift-reduce conflict on t: P+aetP{ ...} Q +yuR e {...t ..}
(the shift item) (the reduce item)
where R is either empty or one non-terminal, Now, if the l o o ~ - ~ e is a dt, we perform one of the following three actions: 1.if symbol U has a higher precedence than symbolt, we reduce; this yields a nodeQ containing U and leavest outside of it to the right; 2. if t has a higher precedence than U, we shift; this continues with the node for P which will contain t when recognizedeventually, and leavesU out of it to the left;
3. if both have equal precedence, we also shift (but see Exercise 2.55). This method requires that the user of the parser ene era tor supply precedence information. It allows considerable control over the resolution of shift-reduce conflicts. Note that the dangling else problem can also be solved by giving the e l S e token the same precedence
as the ) token; then we do not have to rely on a built-in preference for shift in^ in a shiftreduce conflict. Reduce-reduce con~ictscorrespond to the situ est token still matches more than one pattern. the same as in a lexical ~ a l y ~ ethe r : textual1 his is easy to imple~entand i s usually
a lexical analyzer in which the lon ost c o ~ m o built-in n resolution rule r a ~ rule m ~in the parser ene era tor satisfactory.
reduction stack and an input token, such ck S, and the input tokenfx is empty: s ~ 1A S IA2...At~t
t.x
cover from the error we need to reach a situation in which this is no longer true. ies are involved, stack and the input, we can consider modify in^ either or both, but just as in Section 2.2. the stack endan~ersour chances of obtainin an LR parser, since removin a correct syntax tree. Actual1 ven worse in ar symbols from the reduction stack implies t h ~ o w i naway ~ parts of the syntax tree that have already been found tobe correct. There are many proposed techni~ue~ to do repairs, almost all of them ~oderatelysuceven search the states on the stack and the next few input tokens d most the promisin cKenzie et al., 1995;
ot to modify
the stack, but this is difficult. Several techni~ueshave
S, allows a shift or reduction on a token, say t,, one can inse and perfom the shift or reduction. ~ n f o ~ u n a t e l this y ? has a good chance of br backto a situationwith the sametopstate s t , and since therest of theinpu history will repeat itself. have seen that the acceptable-set t ~ n i ~ from u e ~Section 2.2.4.5 avoid ~ o d i f y i for L parsers too, but they are less success~ulthe est t~el~selves A naive a ~ p ~ o isa to c ~take the set of correct tokens as the acce able set. This causes the the input one by one until a token found that does have m table, so parsing can continue,butthis panic- ode error ant tokens, and yields ad results. An approach simid e ~ c r i b for e ~ LL parsers in is much more co~plicated
parsers tend to modify the stack.
The best known method is the one used by the LALR(l) parser generator yacc (Poonen, 197’7). The method requires some non-terminals to be chosen as error-recoveri~~ ~ o ~ S; these areusuallythe‘bignames’fromthegrammar: declaration, expression,etc. If a syntax error is detected while constructing a node for an errorrecovering non-terminal, say R, the idea is to give up the entire attempt to construct that node, construct a d u m y node instead that has the proper attributes, and discard tokens from the input until one is found that indicates the end of the damaged production of R in the input. Needless to say, finding the end of the damaged productionis the risky part, g r a m m ~writeraddsthealternative Thisidea is implementedasfollows.The erroneous to the right-hand sideof one or more non-terminals, thereby marking them as non-te~inalsthat are licensed to produce a d u m y syntax subtree. During the construction of the LR states, each state that containsan item of the form in which R is an e~or-recover in^ non-te~inal,is marked as ‘e~or-recovering’. When a syntax error occurs, the top of the stack exhibits a state S, and the present input starts with a token tx, such that ACTION[s,, t,] is empty. See Figure 2.103, in which we assume thatR was defined as R
-+ G
H I
I
erroneous
andthatwehavealreadyrecognizedandreducedthe G and H. The pseudo-ter~nal erroneous-R represents the d u ~ node y that is allowed as an alternative of R.
R-> .G H R -> . erroneous-R
F i ~ 2.103 ~ ~ LR e error recovery
-detecting the error.
The error recovery starts by removing elements from the top of the stack one by one until it finds an ~~ror-recover in^ state. See Figure 2.104, where the algorithm finds the e~or~recovering state S,. Note that this action removes correctly parsed nodes that could have become part of the tree for R. We now construct the d u m y node erroneous-R for R, push R onto the stack and use the GOT0 table to determine the new state on top of
-
R-> .G H I R -5 . erroneous-R G->
.....
Figure 2.104 LR error recovery -finding an error recovery state.
p-> p->
. t y ... ‘t,
...
...
~~
Figure 2.105 LR error recovery -repairing the stack.
0
~-~ p-> I
.
. t z ...
.
Figure 2.106 LR error recovery -repairing the input.
the stack. Since the error-recovering state contains the item N+a@RP, we can be certain that the new state is not empty, as shown in Figure2.105. The new state S, defines a set of
I p-> t , .
1
...
I l
f . ,
l
...
w e 2.307 LR esror recovery -restart in^ the passer.
acceptable tokens, tokens for which the row ACTION[s,,,..] contains a non-empty entry; these are the tokens that are acceptable in sz. We then discard tokens from the input until we find a token t2 that is in the acceptable set and can therefore follow R. Note that this action attempts to remove the rest of the production of R from the input; see Figure 2.106. Nowatleast one parsing step canbetaken,since ACTION[s,, tz.] is not empty" This prevents looping. The final situation is depicted in Figure 2.107. The procedure described here cannot loop, restricts the damage to the syntax tree toa known place and has a reasonable chance of getting the parser on the rails again. There is a risk, however, that it will discard an important token and derail the parser further. Also, the rest of the compiler must be based on the g r a ~ m a as r extended with the alte~atives erroneous in all error-recovering non-te~inals.In the above example that mean^ that all code that processes nodesof type R must allow the possibility that the nodeis actually a ~ u m m ynode erroneous-R.
se Probably the most widely used parser generatoris yacc, which started as a UNIX utility in the mid-~970sand has since seen more than twenty years of service in many co~pilation and coilversion projects. Yacc is an LALR( 1) parser generator. The name stands for 'Yet Another Compiler Compiler', but it is not a compiler compiler in that it generates ratherthancompilers. One recentproblemwith yacc is thatitgenerates C rath ANSI C; a yacc look-alike called ~ ~ ~provided u n , by GNU, remedies this prob~em.The yacc code shown in this section has been tested usingbison. The most striking difference between top-down and bottom-up parsing is that where determines the correct alternative right at the beginning and then works its way through it, bottom-up pars in^ considers collections of alte~ativessimultaneously andonlydecidesatthelastpossiblemomentonthecorrectalternative. Althou~hthis open~essof mind increases the stren th of the method considera~ly,it makes it much more difficult to execute code. In fact code can only be executed safelyat the end of an a l t e r n ~ ~ tive,whenitsapplicabilityhasbeenfirmlyestablished.This also rulesouttheus parameters since it would be unclear when (or even whether) to evaluate them andto pass
Yacc ' S approach to this is to associate with each ember exa whichshouldbeset by thatmemberwhen it hasbeen recognized. means that when the entire alte~ativeof a non-ter~inalN has been re place and can be used to construct the para~eterfor N. The $ 2 , .. Sn,for the n mem~ersof an alternative;thecount . The para~eterassociated with the rule non-te~inalsitself is e full yacc code for constructing parse treesfor simple expressions is shown in . The code at the end of the first alternative of expresS ion allocates a n yields its address as the parameter for express ion. Next, it sets the type and the two pointer fields to the para~eterof the first member and the third membe~,respectively.Thesecond l~emberis the t e r ~ i n a lsymbol - ;its parameter is notused.The sion and in term are similar. ments in the second alternativeof e e parameters. Since the p ~ a ~ e t e are rs All this raises questions about the types sing stack, they all have to be of the implemented as an may that parallels theLA. r will want to associate different data because t sametype. This is inco~~enient, s ~ c t ~ ~ with r e s diff~rentnon-ter~nals. A way out is provided by imp p~ametersas ~~nions of the various ata structures. Yacc is aware of this union to be ~ e f i n by e ~ the user, thr gh a %union keyword. R e f e ~ i nt ~ twostructuresdeclared inside the %union,withtags expr %type state~entsas§ociate the entry tag~edexpr in the union with the non-te~inal n. andtheentry term withthe n o n - t e ~ n a lterm. This allows yacc and
t ma in are similar to those of the g r ~ ~ pro a r ary C code. This code is very s the sameoutput as the LLgen example on correct (1)
parse error
Thisconcludesourdiscussion of the first sta e of thecompilationprocess -textual analysis: the conversion from characters in a so rce file to abstract syntax tree. ~e have seen that the conversion takes places in two major steps separated by a minor one. The aracters into tokens and then structure the sequence n thetwomajorsteps,someassortedlanguaged e p e ~ d e character ~t and token ~ a n i ~ u l a t i omay n take place, to ~ e ~ o r m p r e idenli~in~ tifier identi~ication,macro process in^, file inclusion,and conditional major steps are based on more or less automated pattern p at chin^, usin sionsand context-free rammars res~ectively. Important a l g o r i t ~ ~inboth s stepsuse
%{
#include
"tree,h"
%}
%union { struct expr *expr; struct term *term;
1
%type <expr> expression; %type
term; %token IDENTIFIER %start
main
main : expression {print-expr($l)
; printf (ll\nrl);}
I
expression: expression term {$$ = new-expro; $$->type=
I
I
-
!
;
$$->expr = $1; $$->term= $3;)
term
{$$ = new-expro; $$->type= IT1; $$->term = $l;}
I
term: IDENTIF~E~ {$$ = new-term(); $$->type = 1 1 ~ ; ) I
Figure 2.108 Yacc code for constructing parse trees.
'items', which are simple data s t ~ c t u r e sused to record partial pattern matches. We have also seen that the main unsolved problem in textual analysis is the handling of syntactically incorrect input; only ad-hoc techniques are available. A very high-level view of the relationships of the techniques is givenin Figure 2.110.
#include
'Ilex.h"
int main(void) { start-lex ( ) ; yyparse (1 ; return 0;
/* routine generated by yacc */
1
int yylex(void) { get-next-token (1 ; return Token.class;
1
Figure 2.109 Auxiliary code for theyacc parser for simple expressions.
Lexical analysis Top-down Bottom-up
Decision on reduce items: finite-state automata
Syntax analysis Decision on first Decision on first chartoken: LL(1) method acter: manual method Decision on reduce items: LR techniques
Figure 2.110 A very high-level view of text analysis techniques.
-Lexical analysis turns
a stream of characters into a stream of tokens; syntax analysis turns a stream of tokens into a parse tree, or, more probably, an abstract syntax tree. Together they undo the lineari~ationthe program suffered in being written out sequentially. -An abstract syntax tree is a version of the syntax tree in which only the semantically importantnodesareretained.What is 'semanticallyimportant' is uptothecompiler writer. -Source program processing starts by reading the entire program into a character buffer. This sim~lifiesmemory management, token isolation, file position tracking, and error reporting.
-Standardize nevvline characters as soon as you see them. -A token consists of a number (its class), and a string (its representation); it should also include position-trac~nginformation.
-The form of the tokens in a source language is described by patterns in a special fomalism; the patterns are called regular expressions. Complicated regular expressions can be simplifie~by naming parts of them and reusing theparts; a set of named re sions is called a regular description. -A lexical analyzer is a repeating pattern matcher that will cut up the input stream into tokens l~atchingthe token patternsof the source language. -Ambiguous patterns are resolved by accepting the longest match. If that fails, the order of the patterns is used to break the tie. -Lexical analyzers can be writtenby hand or generated automatically, in both cases based on the s~ecificationof the tokens through regular expressions. andwritten lexical analyzers make a first decision based on the first character of the and use ad-hoc code thereafter, xical analyzer is the only part of the compiler that sees each character of the source ~ r o ~ r a m as;a result, it performs an order of ~agnitudemore actions that the rest of the compiler phases. -Much computation in a lexical analyzer is done by side-effect-free functions on a finite domain. The results of such computations can be determined statically by precomputation and stored ina table. The co~putationcan then be replaced by table lookup, greatly increasing the efficiency. -The resulting tables require and allow table compression. -~eneratedlexical analyzers represent their knowledge as a set of items. An item is a namedfullyparenthesizedregularexpressionwith a dotsomewherein it. Thepart before the dot matches the last part of the input scanned; the part after the dot must match the first part of the rest of the input for the item to succeed. -Scanning one Character results in an item being transformed into zero, one, or more new item. This transition is called a shift. The set of items kept by the lexical analyzer is t r a n s f o ~ e dinto another setof items by a shift over a character. -The item sets are called states and thetransfo~ationsare called state transitions. -An item with the dot at the end, called a reduce item, signals a possible token found, but the end of a longer token may still be ahead. When the item set becomes empty, there are no more tokens to be expected, and the most recent reduce item identifies the token to be matched and reduced. -All this item manipulation can be avoided by preco~putingthe states and their transitions. This is possible since there are a finite number of characters and a finite number of item sets; it becomes feasible when we limit the precom~utationto those item sets that can occur in practice: thestates. -The states, the transition table, andthe transition m ~ c h ~ itogether s l ~ are called a fimitestate automaton, FSA. -~eneratedlexical analyzers based on FSAs are very efficient, and are standard, althou handvvritten lexical analyzers can comeclose. -Transitiontablesconsistmainly of emptyentries. y canbecompressed by cutting them into strips,row-wiseorcolumn-wise,and fi thevaluesinone strip into the
,by shifting one with respect to the other; the starting positions of recorded and used to retrieve entries. Some trick must be appliedto resolve the v a l ~ e ~ o l e -Inanothercompressionscheme,thestripsareoupedintoclusters,the l~embersof which do not ~ ~ t e ~ fwith e r e each other, using gr coloring techni~~es. All embers of lexical analysis have to be identified further before in the symbol table. This idenrd identification, macro processds indexed by strings. The string is the ion about the identifier. using ~ashing. ays can be im~~emented a1 compilation, and file incluck of input buffers. can be i m p l e ~ e n t eusing ~ file inclusion and macro processin a form of token ins~rtion,between the lexical and the syntax anal
ere are two ways of doing -down and bottom-up. op-down ~arsingtries gram prod~ctionprocess; bottom-up parsing tries to rollback the proa ~ t o ~ a t i c a l l yfro^ a p-downparsers can bewrittenmanuallyorbegenerated context-free grammas. -A handw~ttentop-downparserconsists of a set of recursiveroutines,eachroutine closely to a rule in the r a m a r . Such a parser is called a recursive deshis techni~ueworks for a restricted set of g r ~ a r only; s the restrictions down parsers use precomputation of the decisions that predictive recursive desce~tparsers take dynamical~y. ~ n a ~ b i g u o transition us tables are obtained for s LL( 1) g r a ~ m a r only. -~ o n s t ~ c t i of o nthe table is based on the F1 ST and FOLLOW sets of the non-terminals. kens any ~ r o d u c t i of o ~N can start with, andE if N produces the ( N ) contains all tokensthat can follow anyprod~ctionof N. -The transitloe inco~oratedin a recursive descent passer to yield a predicwhich the parsing stack coincides with the routine calling stack; or be l) push-down automaton, in which the stackis an explicit array. can be r e ~ o v e dby left-factorin substitutio~,and left-r~cursionrernomas, and can be resolved by having dynamic conflict resolvers in the 'I
can recover from syntax errors by plotting a shortest path out, deletillg tok~nsfrom the rest of the in ut until one is found that is acceptable on that path, and
then following that path until that token can be accepted. This is called acceptable-set error recovery. -~ottom-upparsers work by repeatedly identifying a handle, The handle is the list of children of the last node that was expanded in producing the program. Once found, the bottom-up parser reducesit to the parent node and repeats the process. -Finding the handleis the problem; there are many approximative techniques. -The LR parsingtechniquesuseitemsets of proposedhandles.Theirbehaviorwith respect to shift (over a token) is similar, their reduction decisioncriteria differ. -In LR(0) parsing any reduce item (=item with the dot at the end) causes a reduction. In SLR( 1) parsing a reduce item N+a causes a reduction only if the look-ahead token is in the FOLLOW set of N. In LR( 1) parsing a reduce item N+a* {CT}causes a reduction only if the look-ahead token is in CT,a small set of tokens computed especially for that occurrence of the item. -Like the generated lexical analyzer, the LR parser can perform a shift over the next token or a reduce by a given grammar rule. The decision is found by consulting the ACTION table, which can be produced by precomputation on the item sets. If a shift is prescribed, the new state can be found by consulting the GOT0 table, which can be LR parserswith a one-token look-ahead, the precomputedinthesameway.For ACTION and GOT0 tables can be superimposed. -The LALR(1) item sets and tables are obtained by combining those LR(1) item sets that differ in look-ahead sets only. This reduces the table sizes to those of LR(0) parsers, but, remarkably, keeps almost all parsing power. -An LR item set has a shiftheduce conflict if one item in it orders a shift and another a reduce, taking look-ahead into account. An LR item set has a reduceheduce conflict if two items init order two different reduces, taking look-ahead into account. -LR shif~reduceconflicts can be resolved by always preferring shift over reduce; LR reduce/reduce conflicts can be resolved by accepting the longest sequence of tokens for the reduce action. The precedence of operators can also help. -Error recovery in anLR parser is difficult, since much of the information it gathers is of a tentative nature. In one approach, some non-terminals are declared error-recovering by the compiler writer. When an error occurs, states are removed from the stack until a state is uncoveredthatallows a shift onan error-recovering non-terminal R; next, a dummy node R is inserted; finally, input tokens are skipped until one is found that is acceptable in the new state. This attempts to remove all traces of the production of R and replaces it witha dummy R.
The use of finite-state automata for lexical analysis was first described by Johnson et al. (1968)andtheuse of LL(1) was first described by Lewisand Stems (Lewis 11 and Steams, 1968), although in both cases the ideas wereolder, LR(k) parsing was invented by nuth (1965). Lexical analysis and parsing are covered to varying degrees in all compiler design books, but few books are dedicated solely to them. We mention here a practice-oriented
book by Crune and Jacobs (1990) and two theoretical books, one by Sippu and SoisalonSoininen (1990) and the other by Ah0 and Ullman (1973), both in two volumes. A book by Chapman (1987) givesa detailed treatment of LR parsing. There are a num~erof good toexcellentcommercial and publicdomain lexical analyzer generators and parser generators. Information about them can be found in the postingsin the c ~ ~ ~ne~sgroup, . c which ~ are ~ much ~ more ~ ~up toe date~thanany ~ printed text can be.
2.1
Section 2.1 .l advises to read the program text with a single system call. Actually, you usually need three: oneto find out the size of the input file, one to allocate space for it, and one to read it, Write a program for your favorite operating system that reads a file into memory, and counts the number of occurrences of the character sequence abc+abc.Try to malse it as fast as possible. Note: the sequences may overlap.
2.2
On your favorite system and programming language, time the process of reading a large file using the language-supplied character read routine. Compare this time to asking the system for the size of the file, allocating the space, and reading the file using one call of the languag~-suppliedmass read routine.
2.3
Usingyourfavoritesystemand p r o g r a ~ i n glanguage, create a file of size 256 which contains all 256 different 8-bit characters. Read it character by character, and as a block. What do you get?
2.
Somebody in a compiler construction project suggests solving the newline problem by systematically replacing all newlines by spaces, since they mean the same anyway. Why is this almost certainly wrong?
2.5
Give a regular expression for all sequences of Os and 1 s that (a)contain exactly 2 1s. (b) contain no consecutive 1s. (c) contain an even numberof 1 S.
2.6
Why would thedot pattern (.)usually exclude the newline (Figure 2.4)?
2.7
Whatdoestheregularexpression erroneous? Are they ambiguous?
2.
(from Stuart Broad) The following is a highly simplified grammar for URLs, assuming proper definitions for1 etter and digit.
a ? * mean?And
a**? Aretheseexpressions
I URL . l t label label ”+ letter ( letgit-hyphen-string? letgit ) ? letgit-hyphen-string ”+ letgit-hyphen I letgit-hyphen letgit-hy~hen-string letgit-hyphen ”+ letgit I 11-11
URL ”+ label
ewritethe skip-layout-and-comm~nt routineofFigure2.8toallow nested comments.
for
The c o ~ m e n skipping t scheme of Figure 2.8 suffices for single-character coml~entdelimiters. However, multi-character comment-deli~tersrequire some more attention. Write a ski~-layo~t-an~-comm~nt routine for C, where comments are delimited by ‘/*’and ‘*/’,and don’t nest, The end of the s~bsectionon table compression in Section 2.1 5.1 leaves single array of 256 bytes, charbit S E 1. Since programs contain onlyA acters in therange 32 through126,plusnewlineandpetab,somebodyproposes togainanotherfactor of 2andreducethearrayto a of 128. What is your reaction? Explain why is there a FOR EACH statement in Figure 2.16 rather thanjust: IF the input matchesT,”+R,
...
over Length:
The text distinguishes ‘shift items’ with the dot in front of a basic pattern, ‘reduce items’ with the dot at the end, and ‘non-basic items’ with the dot in front of a re sLlbexpressio~. What about items with the dot just before the closingp~enthesisof a p ~ e n t h e s i subexpression? ~e~ uppose you are to extend an existing lexical analyzer generator witha basic pattern which matches two consecutive occurrences of the same characters, for example aa, ==, or ,,. How would you implement this (notso) basic pattern?
g,
Argue the correctnessof some of the dot motion rulesof Figure 2.19, at use regular expressions, for exampleSCML, add a third composiwith R &R2 ~ e a ~ i that n g both R and R 2 must occur but that they may occur in any order; so R I &R2 is equivalent to R I R 2 IR 2R Show the E move rules for thiscomposiin a fashionsimilartothoseinFigure2.19,starting from the item T-+ Show that the closure algorithmfor dotted items (Figure 2.25) terminates.
that ‘our closure al orithm terminates after having erated five sets, outof a possible 64’. late keywords in a file. A keyword is any sequence of letters delophes: ‘begin‘i ekeyword begin. eware of non-letters between apose~pressiol~s for the process, and construct the F .31 using marking by state (rather than by charles to be compressed often contain many rows 0, 3, and 7 of Figure 2.90: -t-
state i 0
3 7
(
$
)
E
T
1
5 5 5
shift shift shift
6 4
7 7
6
8
empty entries -and thus more compressibility -can be obtained by assignin to one of the rows in such a group the role of ‘principal9 and reducing the others to the differen~ewith the principal. Taking row 7 for the principal, we can simplify the table to: state principal i 0 3 7
-I-
(
7
)
$
E
T
l 4
7 5
7
8
6
shift
If, upon re~ieva19an empty entry is obtained from a row that has a p~ncipal,the be obtained from that principal. Fill in the detail to turn this idea (l) table of Figure 2.94 in two ways: using rowdisplace~entwith nd using column dis~lacementwith ~ a r ~ by n state. g ilar lexical analyzer generator to generatea filter that removes r a m files. One problem is that the comment starter /* may ments may be arbitrarily long and most even if it is su~sequentlydiscarded, so ge buffers, which are not supplied by all of lex or flex to enerate~lexical analy start condition feature c o n s u ~ the e c o l ~ ~ eline n t by line. guages, for example Algol 68, feature a token class similar to is largely similar to the formats used in C p r i n t f ( ) calls.
For example, $3d$ described the formatting of an integer value in 3 digits. Additionally, numbers in formats may be dynamic expressions: integers f o ~ a t t e dunder $n.(2*a ) d$ will have 2 *a digits. Design a lexical analyzer that will handle this. int 1: the dynamic expressions can, of course, contain function calls that have formats as parameters, recursively. Hint 2: this is not trivial. Will the function Identify ( ) in Figure 2.44 still work if the hash function maps all identifiers onto the same number? Why? Suppose your symbol tablehas to accommodate between 10 and1000 identifiers as a rule, but is also required occasionally to handle10 million identifiers reasonablyefficiently.Alwaysallocating a hashtable of size 10 million is out of thequestion. Design an appropriate hash table algorithm. An adviserto a compilerconstructionproject insists thattheprogrammatically correct way to do macro processing is in a separate phase between reading the program and lexical analysis. Show this person the errors of his or her ways.
7 In the subsection on back-calls in Section 2.1.12.1, we need a back-call because the process of recognizing the identifier mac overruns the end of the identifier by one character. The handwritten lexical analyzer in Section2. 1.5 also overruns the endof an identifier. Why do we not need a back-call there? Give a code segment (in some ad hoc notation) that uses N generic items and that willcause a piece of code tobegenerated2N-1timesundergenerics by macro expansion. r easy.Forexample,thesuffixrule Constructing a suffix g r a ~ a is terminal A "+ B C D is: A-suffix "+ B-suffix C D
1
C D
I
C-suffix D
I
D
I
for the nonD-suffix:
Using this technique, construct the suffix grammar for the grammar of Figure 2.84. Try to make the resulting suffix grammar LALR( l)and check this property using an LALR( 1) parser generator. Use the resulting parser torecognize tails of productions of the grammarof Figure 2.84. Add parse tree constructing codeto the recursive descent recognizerof Figure 2.54. (a)Construct a (non-predictive) S "+ '('
S ')'
I
recursive descent parserfor the grammar
')'
Will it parse correctly? ( S t ) E. (b)Repeat for (c)Repeat for S-+' ( ' S I ) ' I ) I
~
I
I&.
Why is the correct associativity of the addition operator + (in the grammar of Figure 2.53) less important than thatof the subtraction operator-?
Exercises
I89
2.33 Naive recursive descent parsing of expressions with n levels of precedence requires n routines in the generated parser. Devise a technique to combine then routines into one routine, which gets the precedence as a parameter. Modify this code to replace recursive calls to the same precedence level by repetition, so that only calls to parse expressions of higher precedence remain.
Add parse tree constructingcode to the predictive recognizer in Figure 2.61. Naively generated predictive parsers often contain useless code. For example, the entire switchmechanismintheroutine parenthesized-expression () in Figure 2.61 is superfluous, and so is the default : error () ;case in the routine terrn ( ) . Design rules toeliminate these inefficiencies. 2.36 Answer the questions of Exercise 2.31 for a predictive recursive descent parser. 2.37 (a) Devise thecriteria for a g r a m a to allow parsing witha non-predictive recursive descent parser. Call such a grammar NE” (b) Would you create a predictive or non-predictive recursive descent parser for an NPRD grammar?
The grammar in Figure 2.1l 1 describes a simplified version of declarations in C,
declaration: decl-specifiers init-declarator? decl-specifiers: type-specifier decl-specifiers? type-specifier: lint’ I ‘long’ init-declarator: declarator initializer? declarator: IDENTIFIER I declarator ’(‘ ‘1’ I declarator ‘[’ ‘ l ’ initializer: ’ expression I initializer-list I initializer-list initializer-list: expression I initializer-listinitializer-list I initializer-list ’}‘ l
;
’
= I
l { !
‘ { l
l } ’
l } ‘
’ { l
Figure 2.111 A si~plifiedgrammar for declarations in C.
(a)Showhowthisgrammarproducesthedeclaration long inti = {1, 2 } ; (b) Make this grammar LL( 1) under the unrealistic -assumption that expression is a single token. (c) Retrieve the full grammar of the variable declaration in C from the manual and make it LL(1). (Much more difficult.)
-
(a) Construct the transition tableof the LL(1) push-down automatonfor the g r a ~ a r
epeat, but with the above definitionof B replaced by B + b
1
E
Complete the parsing started in Figure 2.70. e t e ~ i n where e exactly the prediction stackis located in a predictive parser. ~ ~ l ~ - L L~( dl )v, a npca ~r ~~i ntopic ~ : (a) The LL( l ) methoddescribedinthisbookusesthe F O L L O ~set of a nonterminal N to decide when to predict a nullable production of N. As in the SLR( L) method, the F ~ L L O set ~is too coarse an approximation since it includes any token that can ever follow N , whereas we are interested in the setof tokens that can follow N on the actual prediction stack during parsing. Give a simple grammar in which this makes a difference. (b)Wecaneasilyfindtheexacttokensetthatcanactual1 terminal 7‘ on the prediction stack f T, a 1: it is FIRST(a), exact token set to improve our prediction? (c) ~e can inco~oratethe exact follow set of each prediction stack entry into the LL( 1) ~ush-downautomaton by expandingictionstack entries to ( g r a ~ m a r symbol,tokenset)pairs.Inanalogytotheautomaton,thesetokensetsare called ‘look-ahead sets’. Design rules for computing the look-ahead sets in the predictions for the stack element(N, a)for production rulesN+P. (d) The LL(l) method that uses the look-aheads described here rather than the FOLLOW set is called ‘full-LL( l)” Show that full-LL(1) provides better error detection than strong”LL(l),in the sense that it will not incorrectly predict a nullable alternative. Give an example using the grammar from part(a). (e) Show that there is no full-LL(1) grammar that is not also strongstruct a grammar that has a F I R S T ~ O L L Oconflict ~ when set, such that the conflict goes away in all situations when using th look-ahead set. (f) Show that there are full-LL(2) g r a m m ~ sthat are not strong-LL(2). sider a non-ter~inalwith two alternatives, one producing the empty string and one producing one token.
the g r a m ~ a of r Figure 2.53 and some tables provided in the text, deter~ine p~rent~esi~e~-~xpre~s~ the acceptable set of theLL(1)parsingstack rest-expression EOF. Consider the automatic com~utationof the acceptable set based on continuations9as ex~lainedin Section 2.2.4.5. The text suggests that upon finding anerror, the parser goes through all the l~otionsit would go through if the input were exhausted. This
ise a simpler method to c o ~ p u t ethe acceptable nt 2: note that the order in which the symbols sit ~ athe ~ value of the a c c e ~ t a ~set. le on the stack is i ~ m a t e for table set of a prediction stack confi uration
a will always
rules for the con~ersio correct trees with trans for me^ conversion to bea u t o ~ a t e,or show that this c a ~ n obe t done. item sets and their transitia~s for the ote: ( ,not ) inthesecond alter~ative,) parser when the that haw calls for a ‘reduce usin N+a’, the top of the stack does indeed c o ~ t a the i ~ m e ~ ~ e of r sa in the correct when the reduce move has been ~ e ~ o r by ~ erepdlac in^ a by N, the ed on topof it cannot be ‘erroneo~s’in an L,
(1) automata for the ~ r a ~ ~ a r
cursion causes linear re in p ~ i c u l show ~, mmar S+XS I x the stack will grow at behavior in this respect between airs of items can coexist in anL
I
ro
t to
ct
Q+.P (c) Show that no look-ahead can make the item in setpart (b) conflict-free. Construct an unambiguous grammar f r a g ~ e n that t implements the C interpretation of the dangling else. Hint: since the ambiguity occurs when the then-part of an ifstatementcontains an if-state~entwithoutanelse-part,introducetwonewnont e ~ i n a l s ,short-i f-s tatement for an if-statement without an else-part, and st~tement-but-not-short-if withtheobviousmeaning; additional new non-terminals may be needed. Refer to the subsection on resolving shift-reduce conflicts in Section2.2.5.7 on making a g r a m m ~LR(l), where precedence information about operators is used to help resolve shift-reduce conflicts. In addition to having precedences, operators can be left- or right-associative. For example, the expression a+b+c must be grouped as (a+b ) +c, but a* *b* *c, in which the ** represents the ex~o~entiation operator, as a** (b* *c ) , a conventionarising from thefact that mustbegrouped (a**b)**c would simply be equal to a** (b*c ) . So, addition is left-associative and e~ponentiationis ~ght-associative. ncorporate associ~tivityinto theshiftreduce conflict-resolvi~grules stated in the text. (a) Show that the g r a ~ m a for r type in some p r o g r a ~ ~ i nlanguage, g shown in Figure 2. I 12, exhibits a reduceheduce conflict. type "+ actual-type I virtual-type actual-type -+ actual-basic-type actual-size virtual-type -+virtual-basic-type virtual-size actual-basic-type "+ 'int' I 'char' [' N U ~ '1 ~ E ~ actual-size -+ ' virtual-basic-type "+ 'int' I 'char' I 'void' virtual-size "+ ' [' ' 1 igure 2.312 Sample gra~nrnarfor type.
(b) Make the grammar LAL (I); check your answer using an LAL erator. (c) Add code that constructs the proper parse tree in spite of the transfor~a~ion.
7 Consider a variant of the g ~ ~ m frommFigure ~ 2. 5 in which A is error-recovering:
I
S - + A xb A "3 aAb B B " 3 x
I
I
erroneous
How will the LR(1) parser for this grammar react to empty input? What will the resulting parse tree be?
2.58 LR error recovery with stack modification throws away trees that have already been constructed. What happens to pointers that already point into these trees from elsewhere? 2.59 History o ~ ~ ~ Study ~ s iSamelson ~ g : and Bauer's paper (1960), which. introduces the use of a stack in parsing, and write a summaryof it.
The lexical analysis and parsing described in Chapter2, applied to a pro~ramtext, result in an abstract syntax tree with a minimal but important degree of annotation: the Token.class and Token.repr attributessupplied by thelexicalanalyzer as the a token initial attributes of theterminalsintheleafnodes of theAST.Forexample, representinganintegerhastheclass‘integer’and its valuederivesfromthetoken representation; a token representing an identifier has the class ‘identifier’, but completion of further attributes may haveto wait until the identification mechanism has done its work. Lexicalanalysisandparsingtogetherperformthe context-free processing of the source program, which means that they analyze and check features that can be analyzed and checked either locally or in a nesting fashion. Other features, for example checking the number of parameters in a call to a routine against the number of parameters in its declaration, do not fall into this category. They require establishing and checkin range relationships, whichis the domainof context ha~dling. Context handling is required for two different purposes: to check context co~ditions imposed by the language specification and to collect i n f o ~ a t i o nfor semantic processin In an extremely clean compiler, two different phases would be assigned to this: first all language-requiredcontextcheckingwouldbedone,thentheinputprogramwould be declared contextually correct, and only then would the collection of other information start. The techniquesusedare,however,exactlythesame,and it would be artificialto distinguish the two aspects on a technical level. After all, when we try to find out if a given m a y parameter A to a routine has more than one dimension, it makes no difference whether we do so because the language forbids multi-dimensionalm a y parameters and we if A hasmorethanonedimension,orbecausewecan havetogiveanerrormessage
ata needed for these analyses and checks is stored as attributes in the nodes of
sicallystoredthereoractuallyreside else~here,for in the local variables of an analyzing routine, is more or d efficienc~considerations prime focus in this book is ple~entation,we will treat
ulesarestilloftenwrittenby
d that the input values needed
s have large and often r ~ ~ e t i ~sets i v eof
The c o ~ p ~ t a t i o ~ s r e ~by ~context ired
bein ar that is already
hand1
can
bespecified
he contest-free . To d with two features, one *
*
terminal or b on-ter~~nal, zero or more attrib a type, like the fields in a record; these are ameters, they consist of a name and a type o
1
x
the actual attributes is allocated automatically in each node that is created for S in the abstractsyntaxtree.Theattributesareusedtoholdinformationaboutthesemantics attached to that specific node. So, all nodesintheASTthatcorrespondtothesame grammar symbol S have the same formal attributes, but their values -the actual attributes -may differ. -With each U N+M, ...Mrt,a set of tation rules are associated -the -whichexpresssomeattributevalues of theleft-hand side N and the members of the right-hand side Mi in terms of other attributes values of these. These evaluation rules also check the context conditions and issue warning and error messages. Note that evaluation rules are associated with production rules rather thanwithnon-terminals.This is reasonablesincetheevaluation nnles areconcerned with the attributesof the membersM i , which are production-rule-specific. *
In addition, the attributes have to fulfill the followin
-The attributes of eachgrammarsymbol
synes; theevaluationrules for all production rules of N can count on the valuesof the inherited attributes of N to be set by the parent node, and have themselves the obligation to set the synthesized attributes of N. Note that the re~uirementconcerns g r a ~ m a rsymbols rather than production rules. This is again reasonable, since in any position in the AST in which anN node produced by one production rule for N occurs, a node produced by any other produ~tionrule of N may occur and they should all have the same attribute structure.
N are divided into twogroups,called
The re~uireme~ts apply to all alternatives of all g r a m ~ a symbols, r and more in particular to all M i s in the production rule N+MI ...Ma. As a result, the evaluat~onrules for a production rule N+Ml ...M E can count on the valuesof the synthesized attributes of M ito be set by Mi, and have the obligation to set the inherited attributes of Mi, for all l G
contain three pointers which connect it to its children, the nodes for B, C, and D, and possibly a pointer that connects it back to its parent. These pointers have been omitted in Figure 3.1, to avoid clutter.
A
"-_""-**
".""_".
* " _
~ i ~ 3.1 u rData ~ flow in a node with attributes.
The evaluation rules for the production rule A -+ B C D have the obligation to set the values of the attributes at the endsof the outgoing arrows in two directions: upwards to the synthesized attributes of A, and downwards to the inherited attributesof B, C, and D, In turn the evaluation rules can count on the parent of A to supply i n f o ~ a t i o ndownward by setting the inherited attributes of A, and on A's children B, C, and D to supply information upward by setting their synthesized attributes,as indicated by the incoming mows. In total this results in data flow from the inherited to the synthesized attributes of A. Since the same rules apply toB,C, and D, they too provide data flow from their inherited to theirsynthesizedattributes,undercontrol of theirrespectiveattributeevaluationrules. This data flow is shown as dotted mows in Figure 3.1. We also observe that the attribute evaluation rules of A can cause data to flow from the synthesized attributes of B to its inherited attributes, perhaps even passing through C andlor D. Since the same applies to A, it too can expect data to flow from its synthesized attributes to its inherited attributes, through its parent. This data flow toois shown as a dotted arrow in the diagram. Itseemsreasonableto call theinheritedattributesinputparametersandthesynthesized attributes output parameters, but some caution is required. Input and output suggest some temporal order, withinput coming before output, butit is quite possible for some of the synthesized attributes to beset before some of the inherited ones. Still, the similarity of the general attribute grammars, in is a strong one, andwewillmeetbelowvariants which the terms 'input parameters' and 'output parameters' are fullyjustified. A simple example of a practical attribute grammar rule is shown in Figure 3.2; it
describes the declaration of constants in a Pascal-like language. The grammar part is in a fairly representative notation, the rules part is in a format similar to that used in the algorithm outlines this in book. The attribute grammar uses the non-terminals Def ined-identif and Expression, ier the headingsof which are also shown.
Constant-definition (INH old symbol table, SYN new symbol table)"+ 'CONST'Defined-identifier Expression ';, ATTRIBUTE RULES: SET Expression .symbol table TO Constant-definition .old symbol table; SET Constant-definition .newsymbol table TO Updated symbol table ( Constant-definition .old symbol table, Defined-identifier .name, Checked type of Constant-definition (Expression .type), Expression .value , = I
);
Defined-identifier (SYN name)
-+ ...
Expression (INH symbol table,SYN type, SYN value)"+ ... Figure 3.2 A simple attribute rule for Constant-definition.
The attribute grammar shows that nodes created for the g r a ~ a rrule Constant-def inition have two attributes, old symbol table and new symbol table. The first is an inherited attribute and represents the symbol table before the application of the constant definition, and the second is a synthesized attributere~resenting the symbol tableafter the identifier has been enteredinto it. Next comes the only alternativeof the grammar rulefor Constant-def t ini ion, followed by a segment containing attribute evaluation rules. The first evaluation rule sets t e ofab1 Express ion equal to the inherited attribute the inherited attribute symbol Constant-def inition .old symbol table,so the evaluation rules for Express ion can consult it to determine the synthesized attributes ty Express ion. We see that symbol names from the grammar can be used as identifiers in the evaluation rules: the identifier Expression stands for any node created for the rule Expression,and the attributes of that node are accessed as if they were fields in a record -which in fact they are in most implementations. The second evaluation rule creates a new symbol table and assigns it to Constant-def inition.new symbol table, It does this by calling a function, Updated symbol l e tab ( ) ,which has the declaration
F~CT~O Updated N symbol table( ymbol table, Name, Type, Value U ~ N I Na ~ symbol table;
It takesthe Symbol. t a b 1 addstoitaconstantidentifierwiththe Type,and Value,if that is possible; it then returns the new symbol table. If the consta~t annot be added to the symbol table because of context conditions -there m a y be ntifier there already with the same name and the same scope -the routine gives e and r e t u ~ the s un~odifiedsymbol table. ,note that, a l t h o u ~the ~ order of it is in fact i ~ a t e r i a lthe : execution order of the evalLlation rules is not d e t e ~ i n e dby their textual position but ratherby ability of their operands. - irather ~ than e just ~ t econd, the non-terminal ~ ~ ~ i ~ i ~i er~ is used a defindifference between the two: is that there is actually a tocontribute:itsname; an applie identifierhas only one n a wealth of inforl~ationin additio ifier, on the other band, b infor~ation,type, kind (whether it is a constant, variable, amet et er, possib~ya value, allocation i n ~ o r ~ a t i oetc. n, remark thirde ~oncerns use the of
.type. A unction call rather than the check on the type of the co~stant ~efinition, SLIC~ a check is
tifier ratherbo1 table
in the call of
s y ~ b oand l t e r ~ i n a symbols l are difIn the original theory as~ u ~ l i s h by ed t ~ ~ u t eand s , t e r ~ i n a l s y ~ b ohave l s no
about the machine for thesized attributes in th
to synthesized attribute could be controlled by an inherited attribute, so it is reasonable for terminal symbols to have inheritedattri~utes. is We will now look into means of evaluating the attributes. One problem with this the possibility of an infinite loop in the comp~ltations. ~ o r m a l l yit is the responsjbility of the programmer -in this case the compiler writer -not to write infinite loops, but when ~ , hopes to be able to givea bit more support. And one provides a high-level ~ e c h a n i sone this is indeed possible: there is an algorithm for loop detection in attribute grammars. We will have a look at it in Section 3.1.3.2; r e m a r ~ ~ bit~also y , leads toa more effective wayof attribute evaluation. Our main tool in understanding these a l g o ~ t h is ~ sthe ~ e ~ e n ~ e ~ z c y graph, which we will discuss in the following section. Roadmap 3.1 Attribute ~ r a m ~ ~ s 3. l . l Dependency graphs 3. l .2 Attribute evaluation 3, l .3 Cycle handling 3. l .4 Attribute allocation 3.1.5 Multi-visit attribute grammars 3.1.6 S u ~ ~ a ofr ythe types of attribute grammars 3. l.7 L-attributed ~ r ~ m ~ s 3.1.8 S-attributed grammars 3.1.9 Equivalence of L-attributed and S-attributed graml~ars 3.1.10 E x t e n ~ e d~~~m~ notatio~sand attribute ~ r a m m ~ s
Each node in a syntax tree corresponds to a production rule ,..M,; it is labeled with the symbol N and contains the attributes of N and n pointers to nodes, labeled with M I through M,%.It is useful and c u s t o ~ to ~y in a node for a given production simple rule abydiagram, called a . The inherited attributes of Nrepresented are named by boxes onsynthesized the attributes by named boxes on the right. The diagram for an alternative consists of two levels, the top depicting the left-hand side of the g ~ a ~ mrule a r and the bottom the right-han~side. The top level shows one non-terminal with attributes, the bottom level zero or more g r a ~ m a r symbols, also with attributes. Data flow is indicated by arrows leading from the source attribut~sto the destination attributes. Figure 3.3 shows the dependency graph for the only production rule for Cons tant-def init note that ion; the dependency graph is based on the abstract syntax tree. The short incoming and outgoingmows are not part of the dependency graph but indicate the c o m ~ u n i c a t i oof~ this node with the surroundi~gnodes. Actually, the use of the term ‘dependency graph’ for diagrams like the one in Figure 3.3 is misleading: the mows show the data flow,notthe dep~ndency,since the l in the otherdirection, If data flows from variable a to variable 6, then 6 is dependencies are so~etimesgiven in the form of pairs; a pair (a, b) on a’,but it is often more useful to read it as ‘data flows from a to b’ or as ‘a is ~~~
*
quisite to b’, ~ n f o ~ u n a t e l y‘dependency ’ graph’ is the standard term for the graph of the (1986), page 284)’ and in attribute data flow (see, for example, Aho, Sethi and Ullman order to avoid heaping confusion on confusion we will follow this convention. In short’ an attribute dependency graph contains data-flow arrows.
Constant-definition
Defined-identifier
Figure 3.3 Dependency graph of the rule for Constant-de
f inition from Figure 3.1.
Expression (INH symbol table, SYN type, SYN value) ”+ Number ATTRIBUTE RULES: SET Expression .type TO Number.type; SET Expression .value TO Number.value; Figure 3.4 Trivial attribute grammar for Express
ion.
Using the trivial attribute grammar for ExpresS ion from Figure 3.4, we can now construct the complete data-flow graph for the Constant-de f inition CONST pi =
3.14159265;
The result is shown in Figure 3.5. Nomally, the semantics of an expression depends on the contents of the symbol table, which is why the symbol table is an inherited attribute of Expression. The semantics of a number, however, is independent of the symbol table; this explains the arrow going nowhere the in middle of the diagram in our example.
Defined-identifier
Identifier
3.14159265
Figure 3.5 Sample attributed syntax tree with data flow.
To make the above approach work, we need a system that will
-create the abstract syntax tree, -allocate space for the attributes ineach node in the tree, -fill the attributes of the terminals in the tree with values derived from there~resentations of the terminals,
-execute evaluation rules of the nodes to assign values to attributes until no new values can be assigned, and do this in the right order, so that no attribute value will be used before it is available and that each attribute will geta value once, -detect when itcannot doso. Such a system is called an attri~ute evaluat~r. Figure 3.6 shows the attri~utedsyntax tree from Figure 3.5 after attribute evaluation has been performed. We have seen that a grammar rule in an attribute g r a m m ~consists of a syntax segment, which, in addition to the BNF items, supplies a declaration for the attributes, and a rules segment for each alternative, specified at the end of the latter. See Figure 3.2. The BNF segment is straightforward,but a questionarises as to exactlywhat the att~bute evaluator user can and has to write in the rules segment. The answer depends very much on the attribute system one uses. Simple systems allow only assignmentsof the form
3.14159265
The attributed syntax tree from Figure3.5 after a t ~ i b u t eevaluation.
as in the e x a ~ ~above. l e This ~
~it
~ easy e forsthe system to check that the code fula t t r i ~ ~ t of e sthe ~ e f t - ~ side a ~ dand alli ~ ~ e r i t e ~
when we visit it, and repeat this process until all synthesized attributes of the root have been given a value. An assignment is possible when all attributes needed for the assignment have already been given a value. It will be clear that this algorithm is wasteful of computer time, but for educational purposes it has several advantages. First, it shows convincingly that general attribute evaluation is indeed algorith~ical~y possible; second, it is relatively easy to implement; and third, it provides a good stepping stone to more realistic attributeevaluation.Themethodis which attributes order the in theare e the ,atof time run compiler; this is opposed to sta the which r in attributes are evaluated is fixed in advance during c o ~ p i l e generation. r (The term 'static attribute evaluation order' would actually be more appropriate, since it is the evaluation order that is static rather than the evaluation.)
Number "3 Digit-Seq Base-Tag Digit-Seq "3 Digit-Seq Digit Digit Digit "3 Digit-Token // 0 1 2 3 4 5 Base-Tag "3 'B' I 'D'
6
7 8 9
Figure 3.7 A context-free grammar for octaland decimal numbers.
The strength of attribute g r a ~ lies ~ in ~ the s fact that they can transport i n f o ~ a t i o nfrom anywhere in the parse tree to anywhereelse, in a con~olledway. To demo~stratethe attribute evaluation method, we use a simple att~buteg r a that ~ exploits ~ ~ this possibility. It is shown in Figure 3.8 and calculates the value of integral numbers, in octal or decimal notation; the context-free version is given in Fi ure 3.7. If the number, which consists of a sequence of Digits, is followe ase-Tag R it is tobe inte~retedas octal; if it is decimal.hasthevalue15, 17D hasthevalue 17 and l8B followed by a D ' is anerror.Each Digit and ase-Tag are all consideredseparatetokens for this example. The point is that the processing of the Digits depends on a token (R or D) elsewhere, which means that the infor~ationof the Base-Tag must be distributed over all the digits. This models the distribution of information from any node in the AST to any other node. The multiplicationandadditionintherulessection of the first alternative of Digit-Seq inFigure3.8 do therealwork. The index [11 in Digit-Seq [11 is needed to distinguish this Dig t-Si eq from the Dig t-Si eq in the header. A context check is done in the att~buterules for Digit to make sure that the digit found lies within the range of the base indicated. ont textually improper input is detected and corrected by passing the value of the digit thro h a testing function Checked digit value, the ieh is shown in Figure For example, the input 18E3 draws the error messa~e cannot b e a digit in b a s e 8 , andthe at~ibutesareresettoshow
Number (SYN value) -+ Digit-SeqBase-Tag ATTRIBUTE RULES SET Digit-Seq .base TO Base-Tag.base; SET Number .value TO Digit-Seq .value; Digit-Seq(1NH basel SYN value) -+ Digit-Seq [l] Digit ATTRIBUTE RULES SET Digit-Seq[l] .base TO Digit-Seq .base; SET Digit .base TO Digit-Seq .base; SET Digit-Seq .value TO Digit-Seq[l] .value * Digit-Seq .base + Digit .value;
I Digit ATTRIBUTE RULES SET Digit .base TO Digit-Seq .base; SET Digit-Seq .value TO Digit .value; "3= Digit (INK base, SYN value) Digit-Token ATTRIBUT~RULES ( SET Digit .value TO Checked digit value Value-of (Digit-Token .repr CO] - Value-of base 1;
('0'
),
Base-Tag (SYN base) -+ 'B' ATTRIBUTE RULES SET Base-Tag .base TO8;
I
'D' ATTRIBUTE RULES SET Base-Tag .base TO 10; Figure 3.8 An attribute grammar for octal and decimal numbers.
the situation that would result from the correct input 1713,thus safeguarding the restof the compiler against contextually incorrect data. The dependency graphs of Number, Digit-Seq, and Digit, Base-Tag can be found in Figures 3.10 through 3.13. The attribute g r a m a r code as given in Figure 3.8 is very heavy and verbose. in particular, many of thequalifiers(textparts like the Digit-Seq . in Digit-Seq .base)could be inferred from the contexts and many assign~entsare just copy operations between attributes of the same name in different nodes. Practical attribute g r a ~ a r s have abbreviation techniques for these and other repetitive code structures, and in such a
FUNCTION Checked digit value (Token value, Base) R E T U R ~ I Nan ~ integer: IF Token value c Ease: RETURN Token value; ELSE Token value>= Base: ERROR "Token It Token value cannot be a digit in base V Base; RETURN Base - 1; Figure 3.9 The function Checked
digit
value.
Figure 3.10 The dependency graph of Number.
base Digit-Seq ~
I base
Digit
value
F i ~ u 3.11 r ~ The two dependency graphs of Digit-Seg.
Digit-Token
~
The dependency graph
igu
of Digit.
Base-Tag
'D'
I
B'
3 The two dependency graphs of Base-Tag.
system the rulefor Digit-Seq could, for example, look as follows: Digit-Seq(IN~ base, SYN value)"+ Digit-Seq (base, value) Digit (base, value) ~TTRIBUTERULES: SET value TO Digit-Seq .value * base + Digit .value;
I
Digit
(base,
value)
"his is indeed a considerable si~plificationover Figure 3.8. The style of Figure 3.8 has the advantage of being explicit, unambiguous, and not influenced towards any pa~icular system, and is preferable when many non-terminals have attributes with identical names. But when no mis~nderstandin can arise in small examples we will use the above abbreviated notation. To implement the data-flow ~ a c h i n ein the way explained above, we have to visit all nodes of the data de~endencygraph. ~isitingall nodes of a graph usually requires some care to avoid infinite loops, but a simple solutio^ is available in this case since the nodes are also linked in the parse tree, which is loop-free. y visiting all nodes in the parse tree we auto~aticallyvisit all nodes inthe data dependency graph, and we can visit all nodes in the parse tree by aver sing it recursively. Now our algorithm at each node is very simple: try to perfom all the ass rules section for that node, traverse the children, and when return' o perform all the assign~ents rules the section. in The attribute rited values downwards; pre-visit as post-visit the
assignments harvest the synthesized attributes of the children and propagate them upwards.
PROCEDURE Evaluate for Digit-Seq alternative-l ( pointer todigit-seq node Digit-Seq, pointer todigit-seq alt-l node Digit-Seq alt-l ):
// Propagate attributes: Propagate for Digit-Seq alternative-l (Digit-Seq, Digit-Seq alt-l);
// Traverse subtrees: Evaluate for Digit-Seq (Digit-Seq a1t-l .Digit-Seq); Evaluate for Digit (Digit-Seq alt-l.Digit);
// Propagate attributes: Propagate for Digit-Seq alternative-l (Digit-Seq, Digit-Seq alt-l) ( PROCEDURE Propagate for Digit-Seq alternative-l pointer to digit-seq node Digit-Seq, pointer to digit-seq alt-l node Digit-Seq alt-l 1: IF Digit-Seq alt-l .Digit-Seq .base is not set AND Digit-Seq .base is set: SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq.base;
IF Digit-Seq alt-l .Digit .base is not set AND Digit-Seq .base is set: SET Digit-Seq alt-l .Digit .base TO Digit-Seq .base; IF Digit-Seq .value is not set AND Digit-Seq alt-l .Digit-Seq .value is set AND Digit-Seq .base is set AND Digit-Seq alt-l .Digit .value is set: SET Digit-Seq .value TO Digit-Seq alt-l .Digit-Seq .value * Digit-Seq .base + Digit-Seq alt-l .Digit .value; Figure 3.14 Data-flow codefor the first alternative of Digit-Seq.
Outline code for theevaluation of nodesrepresentingthe first alternative of Digit-Seq is given in Figure 3.14. The code consists of two routines, one, Evaluate which organizes the assignment attempts and the for Digit-Seg alternative-l, recursive traversals, and one, Propagate for Digit-Seq alternative-l, whichattemptstheactualassignments.Bothgettwoparameters: a pointer to the Digit-Seq node itself and a pointer, Digit-Seq alt-l, to a record containing the pointerstothechildren of thenode. The type of thispointer is digit-seq alt-1 node,since we are working on nodes that represent the first alternative of the grammar rule for Digit-Seq. The twopointersrepresentthetwo levels independencygraph diagrams like the one in Figure 3.3.
The routine Evaluate for Digit-Seq alternative-l is called by a roufor Digit-Seq when this routine finds that the Digit-Seq node it tine Evaluate is calledforderivesits first alternative. The codein Evaluate for Digit-Seq alternative-1 is straightforw~d. The first IF statement in Propagate for Digit-Seq alternat ive-l corresponds to the assignment SET Digit-Seq[l] .base TO Digit-Seq .base;
in the rules section expressed as
of Digit-Seq in Figure 3.8. It shows the same assign men^, now
SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq .base;
but preceded by a test for appropriateness. The assign~entis appropriate only if the destination value has not yet been set and the source value(s) are available. A more elaborate version of the same principle can be seen in the third I F statement. All this means, of course, that attributes have tobe implemented in sucha way that one can test if their values have been set. 3.15, calls theroutine Evaluate for The overalldriver,showninFigure Number repeatedly,untiltheattribute Number .valueis set.Eachsuch call will cause a ~ompleterecursive traversal of the syntax tree, transporting values down and upas available. For a 'normal' attribute g r m a r , this process converges in a few rounds. Actually, for the present example it always stops after two rounds, since the traversals work from left to right and the grammar describes a two-pass process. A call of the resulting program withinput 5 6 '7Bprints Evaluate for Number Evaluate for Number Number .value = 375
called called
The above data-flow implementation, charrning as it is, has a number of drawbacks. First, if there is a cycle in the computations, the attribute evaluator will loop. Second, the produced code may not be large, but it does a lot of work; with some restrictio~son the attribute grammar, much simpler evaluation techniques become possible. There is much theory about both problems, and we will discuss the essentials of them in Sections 3.1.3.2 and 3.1.5.
PROCEDURE Run the data-flow attribute evaluator on node Number: WHILE Number .value is not set: // report progress PRINT "Evaluate for Number called"; Evaluate for Number (Number) ;
// Print one attribute: PRINT "Number .value=
It,
Number .value;
Figure 3.15 Driver for thedata-flow
code.
There is another, almost equally naive, methodof dynamic attribute evaluation, which wewanttomentionhere, since itshowsanupperbound for thetimerequiredto do dynamic attribute evaluation. In this method, we link all attributes in the parse tree into a linked list, sort this linked list topologically according to the data dependencies, and perform the assignments in the sorted order. If there are n a t ~ b u t e and s d data dependencies, them topologically costs O(n+d);the subsequent assignments cost O(n). The topological sort will also reveal any (dynamic) cycles. For more about topological sort, see below.
The difference between normal sorting and topological sorting is that the n o ~ a sort l workswith a cornparisonoperatorthatyieldsthevalues ‘smaller’, ‘equal’, and ‘larger’, whereas the comparison operator of the topological sort can also yield the value ‘don’t care’: normal sorting uses a totalordering,topologicalsorting a partialordering.Elementthat compare as ‘don’t care’ may occurin any order in the orderedresult. The topological sort is especially useful when the comparison represents a dependency of some kind: the ordered result will be such that no element in it is dependent on a later element and each element will be preceded by all its pre~equisites.This means that the elements can be produced, computed, assigned,or whatever, in their topological order. ical sort can be performed recursively in time
~roportionalto
O(n+d), where n is the number of elements and d the number of dependencies, as follows. Take an abitrary elementnotyetintheordered
result, recursively find all elements it is dependent on, and put these in the ordered result in the proper order. Now we can append the element we started with, since all elements it depends on precede it. Repeat until all elements are in the ordered result. For an outline algorithm see Figure 3.16. It assumes that the set of nodes that a given node is dependent on can be found in a time proportional to thesize of that set.
To prevent the attribute evaluator from looping, cycles in the evaluation computations must
be detected. We must distinguish between static and dynamic cycle detection. In dynamic cycle detection, the cycle is detected during the evaluation of the attributes in an actual syntax tree; it shows that there is a cycle in a particular tree. Static cycle detection looks at the attribute grammar and fromit deduces whether any tree that it produces can ever exhibit a cycle: it covers all trees. In other words: if dynamic cycle detection finds that there is no cycle in a particular tree, then all we know is that that particular tree has no cycle; if static cycle detection finds that there is no cycle in an attribute g r ~ a r then , we know
11
F ~ C T I O NTopological sort of (a set Set) RETURNING a list: SET ListTO Empty list; ~ H I L Ethere is a Node in Set but not in List: Append Node and its predecessors List; to RETURN List; ~ R O C E ~ ~Append RE Node and its predecessors to List: // First append the predecessors of Node: FOR EACH Node-l IN the Set of nodes that Node is dependent on: IF Node-l is not in List: Append Node-l and its predecessors to List; Append Node toList; ure 3.16 Outline codefor a simple i ~ p l ~ ~ e n ~ofttopological ion sort.
that no tree producedby that g r a m m ~will ever exhibit a cycle. Clearly static cycle detection is much more valuable than dynamic cycle detection; u~surprisingly,it is also much more difficult.
There is a simple way to dynamically detect a cycle in the above data-flow implementation, but itis inelegant: if the syntax tree hasN attributes and more thanN rounds are found to be required for obtaining an answer, there must be a cycle. The reasoning is simple: if there is no cycle, each round will compute at least one attribute value, so the process will t e ~ n a t after e at most N rounds; if it does not, there is a cycle. Even though this brutein the force approach works, the general problem with dynamic cycle detection remains: end we have to givean error message saying somethinglike 'Compiler failure due to a data depende~cycycle in the attribute grammar', which is e ~ b a ~ a s s i n gIt. is far preferable to do static cycle checking; if we reject during compiler const~ctionany attribute gram^^ that can ever producea cycle, we will not be caught in the above situation,
As a first step in designing analgo~thmto detect the possibilityof an attribute dependency cycle in any tree produced by a given attribute g r ~ a r we , ask ourselves how such a
cycle can exist at all. A cycle cannot ori nate directly from a dependency graph of a productionrule P, for thefollowinheattributeevaluationrulesassignvaluestoone ted attributes of the children of P and the synthesized attributes set of attributes, the values of the syn~esized att~butes of the children of P and the inherited attributesof P. And these two sets aredisjoint, have no element in common,so no cycle can exist. For an attribute dependency cycle to exist, the data flow has to leave the node, pass through some part of the tree and return to the node, perhaps repeat this process several times to different parts of the tree and then return to the attribute it started from. It can
leave downward through an inherited attribute of a child, into the tree that hangs from this node and then it must return from that tree througha synthesized attribute of that child, or it can leave towards the parent through oneof its synthesized attributes, into the rest of the tree, after which it must return from the parent through oneof its inherited attributes. Or it can do both in succession, repeatedly, in any combination. Figure 3.17 shows a long, possibly circular, data-flow path. It starts from an inherited attribute of node N,descends into the tree below N,passes twice through one of the subtrees at the bottom and once through the other, climbs back to a synthesized attribute of N, continues to climb into the rest of the tree, where it first passes through a sibling tree of at the left and then through one at the right, after whichit returns to node N,where it lands at an inherited attribute. If this is the same inherited attribute the data flow started from, there is a dependency cycle in this particular tree. The main point is that to form a dependency cycle the data flow has to leave the node, sneak its way through the tree and return to the same attribute, It is this behavior that we want to catch at compiler construction time.
Figure 3.17 A fairly long, possibly circular, data-flow path.
Figure 3.17 shows that there are two kinds of dependencies between the attributesof a n o n - t e ~ i n aAt l from inherited to synthesized and from synthesized to inherited. The first eney and stems from all the subtrees that can be found under N; there are infinitely many of these, so we need a summary of the dependencies they can generate. The second is called an S ency andoriginatesfrom all the trees of a telymany of these,The s u m m ~ of the which N canbe a node;thereare dependencies between the attributes of a non-te~inalcan be collected in an an example of which is shown in Figure 3.18. Since IS-dependencies stem from things that happen below nodesfor N and SI-dependencies from things that happen above nodes for N, it is convenient to draw the dependencies(in data-flow direction!) in those same positions. The IS-SI graphs are used as follows to find cycles in the a t ~ b u t edependencies of a
A n example of an IS-SIgraph.
grammar. Suppose we are given the dependency graph for a production rule N+PQ (see Figure 3.19), and the complete IS-SI graphs of the children P and Q in it, then we can obtain the IS-dependenciesof N caused by N+PQ by adding the dependencies in the IS-SI graphs of P and Q to the dependency graph of N-PQ and taking the transitive closure of the dependencies. This transitive closure uses the inference rule that if data flows from attribute a to attribute b and from attribute b to attribute c, then data flows from attribute a to attribute c.
i~ur3 e.19 The dependency graph for the production ruleN"+P&.
The reason is as follows. At attribute evaluation time, all data flow enters the node through the i~heritedattributes o f N , may pass through trees produced by P and/or any order, and emerge to the node and may end up in synthesized attributes. Since SI graphs of P and Q s u ~ a r i z all e possible data paths through all possible trees produced by P and Q, and since the dependency graph of N+PQ already showed the fixed direct dependencies within that rule, the effects of all data paths in trees below N+P known.Next we take the transitive closure of thedependencies.Thishastwoeffects: first, if there is a possible cycle in the tree below N including the node for N+PQ, it will show up here; and second, it gives us all data-flow paths that lead from the inherited attributes of N in N+PQ to synthesized attributes. If we do this for all production rules for N, we obtain the complete setof IS-dependencies of N. Likewise, if we had all depende~cygraphs of all production rules in which N is a child, and the complete IS-SI graphs of all the other non-terminals in those production
rules, we could in the same manner detect anycycle that runs through a tree of which N is a child, and obtain all SI-dependencies of N. Together this leads to the IS-SI graph of N and the detectionof all cycles involvingN. Initially, however, we do not have any complete IS-SI graphs. So we start with empty IS-SI graphs and perform the transitive closure algorithm on each production rule in turn and repeat this process until no more changes occur to the IS-SI graphs. The first sweep through the production rules will findall IS- and SI-dependencies that follow directly from the dependency graphs, and each following sweep will collect more dependencies, untilall have been found. Then, if no IS-SI graph exhibits a cycle, the attribute g r a m ~ a is r noncyclic and is incapable of producing an AST with a circular attribute dependency path. We will examine the algorithm more in detail and then see why it cannot miss any dependencies.
SET the flag Something was changed TO True;
// Initialization step: FOR EACH terminal T I N Attribute grammar: SET the I S - S I graph of T TO T ' s dependency FOR
graph;
EACH non-terminalN I N Attribute grammar: SET the I S - S I graph of N TO the empty set;
~ H ~ Something L E was changed: SET Something was changed FOR
EACH
production
TO
False;
rule P = Mo+M ,...M, I N Attribute
grammar:
// Construct the dependency graph copy D: SET the dependency graph D TO a copy of the dependencyhgraphof // Add the dependencies already found Mii4 for n : FOR EACH M I N M O...M , , : FOR EACH dependency d I N the I S - S I graph of M : Insert d in D ; // Use the dependency graph D: Compute all induced dependencies Dinby transitive closure; I F D contains a cycle: ERROR ticyclefound inproductionii,P // Propagate the newly discovered dependencies: FOR EACHM I N M ~ . . . ~ , : FOR EACH dependency d I N L) SUCH THAT the attributes dinare attributesof M : I F there is no dependencyd in the I S - S I graph of M : Enter a dependency d into theI S - S I graph of M ; SET Something was changed TO True; Figure 3.20 Outline of the strong-cyclicity testfor an attribute g r a l ~ ~ .
An outline of the algorithm is given in Figure 3.20. It examines each production rule in turn, takes a copy of its dependency graph, merges in the dependencies already known raphs of the non-te~inaland its children, and takes the transitive cloncies. If a cycle is discovered, an error message is given, Then the algorithm updates non-terminal the children its and newly with any discovered ~ e ~ e ~ d e n c i process result the of this, is repeated, since still mo ures3.19 t ~ o u g h3.23showtheactions of one su step.Thedependenciesin fromtheattributeevaluation ru ven for N ~ P in Qtheattriies areimmutable, so we a w o r ~ n gcopy of themin collectedsofarareshowninFigure3.21. The ,in N,P, and Q; these may originate directly from s of these n o n - t e ~ n a l sor , they may have been found by previousrounds of the algorith~. Thediagrams also containoneSI-dependency, from N.s l to N.i2; it must originate from a previous round of the algori~m,since the dependency graphs of rules for a non-terminal do not contain assignments to the inherited attributes of that non-te~inal.The value of the synthesized attribute .S does not depend on ,so it is either generated inside Q or derives from terminal symbol in this is shown as an m o w starting from nowhere.
The IS-SI graphs of N, P,and Q collected so far.
The dotted lines in ure 3.22 show the result of mer~ingthe IS-SI graphs of N,P, ng the transitive closure adds manymore dependencies, but to drawnonlythosethat connect twoattributes of thesamenonterminal. There aretwo of theseone I -dependencyfrom N.il to path N.i L +P.i +P. .s2), SI-dependency one and the Q .i ). These added areto th (because of path and Q, respectively, ~esultingin We now want to show that ure 3.20 cannot miss cycles that might orithm may, however, some ct cycles that cannot occur in actual
igure 3.22 Transitive closure over the dependenciesof N, P, Q and D.
Figure 3.23 The new IS-SIgraphs of N,P , and Q.
trees, as we will see below. Suppose the algorithm has declared the attribute grmmar to be cycle-free, and we still find a tree T with a cyclic attribute dependency patht" in it. We shall now show that this leads to a contradiction. We first take an arbitrary node N on the path, and consider the parts of the path inside N. If the path does not leaveN anywhere, it just follows the dependencies of the dependency graph of N, since the path is circular, the dependency graph of N itself must contain a cycle, which is impossible. So the path has to leave the node somewhere. It does so through an attribute of the parent or a child node, and then returns through another attribute of that same node; there may be more than one node with that property. Now for at least one of these nodes, the attributes connected by the path leaving and returning to N are not connected by a dependency arc in the IS-SI graph of N: if all were connected they would forma cycle in the IS-SI graph, which would have been detected. Call the node G, and the attributes A and A 2. Next we shift our attention to node G. A I and A 2 cannot be connected in the IS-SI graph of G, since if they were the dependency would have been copied to the IS-SI graph of N. So it is obvious that the dependency between A l and A cannot be a direct depen-
dency in the dependency graph of C. We are forced to conclude that the path continues H,different from N,through and that C too must have at least one parent or child node which the circular path leaves C and returns to it, through attributes that are not connected by a dependency arc in the IS-SI graph of G: if they were all connected the transitive closure step would have added the dependency betweenA and A 2 . The same r e a s o n i ~applies ~ to H , and so on. This procedure crosses off all nodes as possible sources of circularity, so the hypothetical circular pathP cannot exist, which leads to our claim that the algorithmof Figure 3.20 cannot miss cycles. An attribute grammar in which no cycles are detectedby the algorithm of Figure 3.20 is called , algorithm The presented here is actually too pessi~istic about cyc t cycles where none can materialize. The reason is that the algorithm assumes that when thedata flow from an attribute of node N passes through N ' s child M kmore than once, it can find a different subtree there on each occasion. This is the result of merging into D in Figure 3.20 the IS-SI graph of M k ,which represents the data flow through all possible subtreesfor Mk. This assumption is clearly incorrect, and it occasionally allows dependencies to be detected that cannot occur in an actual tree, leading to false cyclicity messages. A correct algorithm exists, and uses a set of ZS-SI graphs for each on-terminal, rather than a single IS-SI graph. Each IS-SI graph in the set describes a combinatio~of dependencies that can actually occur in a tree; the union of the IS-SI graphs graphs for N yields the single IS-SI graph usedfor N in the algorithmof in the same way as the union of the look-ahead sets of the items for N yields the FOLLOW setof N. In principle, the correct algorithmis exponential in the maxinlum number of members in any grammar rule, but tests (Raihii and Saarinen, 1982) have shownthatcyclicitytesting for practicalattributegrammars is ite feasible. A gram^^ thatshowsnocyclesunderthecorrectalgorithm is called 110 yelic. Almost all grammars that are non-cyclic are also strongly non-cyclic, so in practice the simpler, ~euristic, algorithm of Figure 3.20 is completely satisfactory. Still, it is not difficult to construct a non-cyclic but not strongly non-cyclic attribute grammar, as is shown in Exercise 3.5. The data-flow technique from Section 3.l .2 enables us to create very general attribute evaluators easily, and the circularity test shown here allows us to make sure that they will not loop. It is, however, felt that this full generality is not always necessary and that there is room for less general but much more efficient attribute evaluation methods. We will coverthreelevels of simplification:multi-visitattribute ramm mars (Section 3,3.5), L.attributed g r ~ m (Section ~ s 3.1.'7), and S-attributed g r a ~ ~ a(Section rs 3.1.8). The latter two are specially il~portantsince they do not needthe full syntax tree to be stored, and are therefore suitable for narrow compilers.
So far we have assumed that the attributes of a node are allocated in that node, li in a record. For simple attributes -integers, pointers to types, etc. -this is satisfactory, but for large values, for example the environ~ent,this is clearly undesirable. The easiest solution is to impleme~tthe routine that updatesthe environment such thatit delivers a pointer
to the new environment. This pointer can then point toa pair containing the update andthe pointer to the old environment; this pair would be stored in global memory, hidden from the attribute g r a ~ a r The . implementation suggested here requiresa lookup time linear in the size of the environment, but better solutions are available. Another problem is that many attributes are just copies of other attributes on a higher orlowerlevelin the syntaxtree,andthatmuchinformation is replicatedmanytimes, requirin~time for thecopyingandusing up memory.Choosing a goodform for the abstract syntax tree already alleviates the problem considerably. Many attributes are used in a stack-like fashion only and can be allocated very profitably on a stack (Jazayeri and Pozefsky, 1981). Also, there is extensive literature on techniques for reducing the memory requirementsfurther (opdenAkkerandSluiman,1991;EngelfrietanddeJong,1990; Kastens, 198’7; Farrow and Yellin, 1986). Simpler attribute allocation mechanisms are possible for the more restricted attribute grammar types discussed below.
Now that we have seen a solution to the cyclicity problem for attribute grammars, we turn to their efficiency problems. The dynamic evaluation of attributes exhibits some serious inefficiencies: values must repeatedly be tested for availability; the complicated flow of control causes much overhead; and repeated traversals over the syntax tree may be needed to obtain all desired attribute values.
The above problems can be avoided by having a fixed evaluation sequence, implemented as p r o g r a ~code, for each production rule of each non-te~inalN; this implements a form of s t ~ t i c~ t ~ r ie ~v ~ ~ Zt ~ e ~ tThe io~ task . of such a code sequence is to evaluate the attributes of a node P, which represents production ruleN+MIM 2.... The attribute values needed to do so can be obtained in two ways:
-The code can visit a child C of P to obtain the values of some of C’s synthesized attriC to compute butes while supplying some of C’s inherited attribute values to enable those synthesized attributes. -It can leave for the parent of P to obtain the values of some of P’s own inherited attributes while supplying some of P’s own synthesized attributes to enable the parent to compute those inherited attributes.
ince there is no point in computing an attribute beforeit is needed, the computationof the required attributes can be placed just before the point at which the flow of control leaves the node for the parent orfor a child. So there are basically two kindsof visits: Supply a setof inherited attribute valuesto a child Mi Visit child Mi Harvest a set of synthesized attribute values supplied Mi by
and synthesized inherited
attribute attribute
values values
to
the
supplied
parent by
the
parent
This reduces the possi~i~ities for the visiting code of a production rule N+MI . . .to~ the ~
// Visit 1 from the parent: flow of control from parent enters here. // The parent has set some inherited attributes, theI N set 1. // Visit some childrenM k t M ,.. ,": Compute some inherited attributes of M k tthe set (ZMk),; 'Visit Mk for th // M~ returns W
ts synthesized attributes evaluated.
Compute some inherited attributes NI;, of the set (ZMl)I; Visit Ml for the first time; // nCr, returns with some of its synthesized attributes evaluated. rhaps visit some more children, including possibly M k or // M, again, while supplying the proper inherited attributes // and obtaining synthesized attributes in return.
// End of the visits to children. Compute some ofN ' s synthesized attributes, the set $NI; Leave to the parent; // End of visit 1 from the parent.
// Visit 2 from the parent: flow of control re-enters here. // The parent has set some inherited attributes, theI N zset . ... // Again visit some children while supplying inherited // attributes and obtaining synthesized attributes in return. $NZ; Compute same ofN's synthesized attributes, the set Leave to the parent; // End of visit2 from the parent.
... // Perhaps code for some more visits 3..n from the parent, // supplying setsIN3 to INn and yielding // sets SN3 to SNn.
Outline code for multi-visit attribute evaluation.
scheme This is called on: the flow of control pays multiple visits toeach node, accord in^ to a scheme fixed at compiler generation time. It canbe imple~entedas a tree-walker, which executes the codesequentia~lyand moves the flow of
control to the children or the parent as indicated;it will need a stack to leave to the correct position in the parent. Alternatively, and more usually, multi-visit attribute evaluation is implemented by recursive descent. Each visit from the parent is then implemented as a separateroutine,a visitin. r ~ u t i ~which e , evaluatesthe approp~ateattributerulesand calls the appropriate visit routines of the children. In this implementation, the 'leave to parent' at the end of each visit is implemented as a return statement and the leave stack is a c c o ~ o d a t e din the return stack.
c " " " " " _
\
I
f i -thvisitto
...- '
I
c - " " " " " " " " _ _
N
(ZMk)~ ,\
h'
(SMk~"""..
" " " " " " " " " "
I'
@
i
'*
igure 3.25 The i-th visit to a nodeN, visiting two children, M, and Ml.
Figure 3.25 showsadiagram
of thei-thvisittoanode
for theproductionrule
N+MrM z..., during which the routine for that node visits two of its children, M k and Ml.
The flow of control is indicated by the numbered dotted mows, the data flow by the solid arrows. In analogy to the notationINi for the set of inherited attributes to be supplied toN on the i-th visit, the notation( I ~ kindicates ) ~ the set of inherited attributes tobe supplied to ~k on the i-th visit. The parent of the node has prepared for the visit by computing the and these are supplied to the node for N (1). Assuming inherited attributes in the set INi,
that the first thing the i-th visit toa node of that type has todo is to perform the h-th visit to Mk (2), the routine computes the inherited attributes ( I M ~(3), ) ~ using the data dependencies from the dependency graph for the production rule N+M M2 .... These are passed to the node of type Mk, and its h-th visiting routine is called (4). This call returns with the ) ~ (5). One of these is combined with an attrib~tevalue synthesizedattributes ( S M ~ set ))~ for the j-th visit to M~(6). from INi to produce the inherited attributes( I M ~('7 (8) supplies back the values of the attributes in ( S M ~ (9). ) ~ Finally the synthesized attributes in SNi are computed (lo), and the routine returns (l 1). Note that during the visits to Mk and M Lthe flow of control ((4) and (8)) and the data flow (solid arrows) coincide; this is because we cannot see what happensinside these visits. An il~portantobservation about the sets INl.,ptand SN1,., is in order here. INi is associated with the start of the i-th visit by the parent and SNi with the i-thleave to the parent. The parent of the node for N must of course adhere to thisinterface, but the parent does not know which production rule for N has produced the child it is about to visit. So the sets INI.., and SNI , . n must be the same for all production rules for N. they are a property of the non-ter~inaliV rather than of each separate production rulefor N. Similarly, all visiting routines for production rules in the grammar that contain the n o n - t e ~ i n aN l in the right-hand side mustcall the visiting routines of N in the same order 1..n. If N occurs more than once in one production rule, each occurrence must get its own visiting sequence, which mustconsist of routine calls in that same orderl..n. It should also be pointed out that there is no reason why one single visiting routine could not visit a child more thanonce. The visits can even be consecutive,if dependencies in general. in other production rules require more than one visit Toobtain a multi-visitattributeevaluator, we will first showthat once weknow acceptable IN and SN sets for all non-te~inalswe can construct a multi-visit attribute evaluator, and we will then see how to obtain such sets. S
The above outline of the multiple visits to a node for a production rule N+ tions attributes the of N into a list of pairs of sets ( I N ,, S N ~ ) , ( I i V 2 , S N 2.. ).,,( IN~,SN,)for what is called an n-visit. Visit i uses the attributes in INi, which were set by the parent, visits some children some number of times in some order, and returns after having set the attributes in SNi. The sets IN I,.n must contain all inherited att~butesof N, and SNI.., all its synthesized attributes, since each attribute must in the endreceive a value some way or another. None of the INi and SNi can be empty, except IN and perhaps SN,. We can see this as follows. If an INi were empty, the visit from the parent it is associated with would not supply any new information, and the visit could be combined with the previous visit. only exception is the first visit from the parent, since that one has no previous visit. If an SiVi were empty, the leave to the parent it is associated with would not supply any new infor~ationto the parent, and the leave would be useless. An exception might be the last visit to a child, if the only purposeof that visit is an action that does not influence theattributes, for example producing an error message. But actually that is an improper use of attribute grammars, since in theory even error messages should be collected in an attrib~te and produced asa synthesized attribute of the start symbol.
Given an a ~ ~ e p t a ~ l e p a r t i t(i~~Nn i n, ~~N i ) ~it=is~ .relatively .~, simple to generate the co~esponding multi~visit attribute evaluator. We will now consider how this can be done and will at the same time see what the properties of an ‘acceptable’ partitioning are. The evaluator we are about to construct consists of a set of recursive routines. There are n routines for each production rule P N+M l ...for n o n - t e ~ i n aN, l one for each of the n visits, with n d e t e ~ n e dby N. So if there are p production rules for N,there will be a total of pxn visit routines for N. Assuming that P is the k-th alternative of N,a possible name for theroutine for thei-thvisittothatalternativemightbe V i s i t-i to N a l t e r n a t ive-k. During this i-th visit, it calls the visit routines of some of the M L ...in P. When a routine calls the i-th visit routine of a node N,it knows statically that it is called for a node of type N,but it still has to find out dynamically which alternative of N is represented by this particular node. Only then can the routine for the i-the visit to the k-th alternative of N be called. So the routine V i s i t-i t o N ( ) contains calls to the routines V i s i t-i t o N a1 t erna t ive-k ( ) ,as shown in Figure 3.26, for all required values of k. PROCEDURE Visit-i to N (pointer to an N node Node): SELECT Node .type: CASE alternative-l: Visit-i to N alternative-l (Node); I . .
CASE alternative-k: Visit-i to N alternative-k
...
(Node);
Figure 3.26 Structure of an i-tb visit routinefor N.
We will now discuss how we can determine which visit routines to call in which order inside a visiting routine V i s i t-i t o N a1 t ernat ive-k ( ) ,based on i n f o ~ a t i o n gatheredduringthegeneration of theroutines V i S i t - h t o N () for Ish
The j-th visit to M, requires the inherited attributes in (IM,)~to be available. Part of them may be in E, part of them must still be computed using the attribute evaluation rules of P. These rules may require the values of other attributes, and so on. If all these attributes are in E or can be computed from attributes that are inE, the routine Vis i t_j t o M, ( ) can be called. If so, we generate code for the evaluation of the required attributes and for the call to Vi sit_j t o Mx ( ) . The routine Vi s i t j to Mx () itself has a form similar to that in Figure 3.26, and has to be generated too. When the code we are now generati is run and the call to the visit routine returns, it will have set thevalues oftheattributesWecanthereforeaddtheseto E, and E. repeat the process with the enlarged When no more code for visits to children can be generated, we are about to end the generation of the routine Vi s i t-i to N a1 t ernat ive-k ( ) ,but before doing so enerate code to evaluate the attributes in SNi to return them to the parent. here we meet a problem: we can do so only if those evaluations are allowed by the dependencies in P between the attributes; otherwise the code generation for the multi-visit attribute evaluator gets stuck. And there is no a priori reason why all the previous eva1uation.s would allow the attributes inSNi to be CO This leads us to the definitionof ac if the attribute evaluator generation pro stuck. So we haveshownwhatweclaimedabove:havingan acceptable partitionin allows us to generate a multi-visit attribute evaluator. Unfortunately, this only solves half the problem:it is not at all obvious how to obtain an acceptable partitioning for an attribute g r a m a r . To solve the other halfof the problem, we start by observing that the heart of the problem is the interaction between the attribute dependencies and the order imposed by the given partitioning. The partitioning forces the attributes to be evaluated in a particular order, and as such constitutes an additional set of data dependencies. More in particular, all at~ibutesof N in INi must be evaluated before all those in. SNi, and all attrib~tesin SNi must be evaluated beforeall those in INi+l . So, by using the given partitioning,we effectively introduce the co~espondingdata depen To test the acceptability of the given p~itioning,we add the data dependenc thepartitioningtothe data dependencygraphsoftheproductions of N, for of Figure 3.20 again, to see if the t e ~ n a l N, s and then run the cycle-testing algorithm overall attribute system is still cycle-free. If the algorithm finds a cycle, the set of partitionings is not acceptable, but if there are no cycles, the corresponding code adheres both to the visit sequence requirements and to the data dependencies in the attribute evaluation ind of code we are after, a ~ a r have s at least one acceptable set of partitionings. This is not N usually represents some kind of semantic unit, cons u ~ ~ s i since n g a g r a m ~ ~symbol ar necting some input concept to some output concept, and it is to be expected that in each production rule for N the i~formationflows roughly in the same way. Now we know what an acceptable partitioning is and how to recog tion is how to get one, since it is fairly clear that a random partitioni tainly cause cycles. Going through all possible p~itioningsis possibl sets are finite and non-empt but the algorithm would take many times the lifetime of the universeeven for a simple r a m ~ a r ;itonlyshowsthattheproblem is solvable.For-
rs
tunately, there is a heuristic that will in thelarge majority of cases find an acceptable partitioningandthatrunsinlineartime: the const~ction algo~thmfor ordered ~ t t r i ~ ~ t e evaZ~~tors. This construction is based on late evaluation ordering of the attributes in the IS-SI graphs we have already computed above in our test for non~cyclicity.
ws from inherited to synthesized attributes and vice versa, it is already close to a partitioning for N. Any pa~itioningfor N must of course conform to the IS-SI graph, but the IS-SI graph does not generally determine a partitioning completely. For example, the IS-SI graphof Figure 3.18 allows twopa~itionings: ( { I 1 ,I3 >,{SI11, ( ( 1 2 1, {S,}> and ( ( 1 1 1, {S111, ({I2113}, {s2})* Nowthe idea behind ordered attribute grammars is that the later an attribute is evaluated, the smaller the chance that its evaluation will cause a cycle. This suggests that the second partition in^ is preferable. This late evaluation idea is used as follows to derive a partitioning from an IS-SI graph. We want attributes to be evaluated as late as possible; the attribute evaluated last cannot have any other attribute being dependent onit, so its node in the IS-SI graph cannot have outgoing data-flow arrows. This observation can be used to find the synthesized attributes in S N ~ a snote ~ ; that we cannot write SN;, since we do not know yet the value of rz, the number of visits required. S ~ ~ contains ~ ~ , s all t synthesized attributes in the IS-SI graph on which no other attributes depend; these are exactly those that have no outgoing arrows. Next, we remove the attributes in S N ~ ~from ~ s t the IS-SI graph. This exposes a layer of inherited attributes that have no outgoing data-flow arrows; these make up INlasL,and are removed from the IS-SI graph. This processis repeated for the pair (lNlasf-l,SNEast-l), and so on, until the IS-SI graph has been consumed completely. Note that this makes all the sets in the partitioning non-empty except perhapsfor INI ,the last set to be created:it may find the IS-SI graph empty already. We observe that this algorithm indeed produces the partitioning ( { I I }{S,}), , ({12,13}, { S 2 } )for the IS-SI graphof Figure 3.18. The above algorithms can be performed without problems for any strongly cycle-free attribute g r ~ m a r and , will provide us with attribute ~artitioningsfor all symbols in the grammar. Moreover, the pa~itioningfor each non-terrninal IV conforms to the IS-SI graph for N since it was derived from it. So, adding the data dependencies arising from the partiIS-SI graph of N will not cause any direct cycle inside that IS-SI graph to be ut still the fact remains that dependencies are added, and these may cause larger cycles, cycles involving more than one non-terminal to arise. So, before we can start genIfthe test does not find any erating code, we have to run lgorithm again. r and the ~artitioningsused can be to cycles, the grammar is an or generate att~buteevaluation code. This code will
-not loop on any parse tree,since the final set of IS-SI graphs was shown to becycle-free; -never use an attribute whose value has not yet been set, since the moment an attributeis
~ the IS-SI graphs used is determined by the ~ a ~ i t i o n iS nand the partitionings c o n f o to and so to the dependencies; -evaluate the correct values before each visit to a node and before each return from it, partition in^. since the code scheme in Figure 3.24 obeys the
'Very many, not to say almost all, attribute g r ~ a r that s one writes naturally turn out to be ordered, which makes the notion of an ordered attribute grammara very useful one. We have explained the technique using terns like the k-th visit out of y2 visits, which somehow suggests that considerable numbers of visits may occur. We found it advantageous to imagine thatfor a while, while trying to understand the algorithms,since thinking so made it easier to focus on the general case. ut in practice visit numbers larger than 3 are rare; most of the nodes need to be visited only once, some may need two visits, a small minority may need three visits, and in most attribute grammars no node needs to be visited four times. Of course it is possible to construct a grmmar with a non-terminal X whose nodes require, say, 10 visits, but one should realize that its partition consists of 20 nonoverlapping sets, IN1,.lo and SNI..zo,and that only set IN1 may be empty. So X will have to have at least 9 inherited attributes and 10 synthesized attributes. This is not the kind of non-terminal one normally meets during compiler construction.
The ~ r ~ e uttri~~te r e ~ ~ r a ~ ~thea~ rc t~a l~/ r~ e c ~ ~ a l
e~a~~le
We will now apply the ordered attribute g r a ~ a technique r to our attribute grammar of Figure 3.8, to obtaina multi-visit attribute evaluatorfor that g r a ~ ~ aWe r . will at the same timeshowhowtheorderofthe calls tovisitroutines inside one Visit-i to N a1 t ive-k ernat ( ) routine is determined. Number, Digit-Seq, The IS-SI graphs of the non-te~inalsinthegrammar, Digit,and Base-Tag, are constructed easily; the results are shown in Figure 3.27. We find no cycles during their construction and see that there are no SI-dependencies: this reflects the fact that no non-ter~inalhas a synthesized att~butewhose value is propagated through the restof the tree to return to the node it originates from. The next step is to construct the partitionings. Again, this is easy to do, since each IS-SI graph contains at most one inherited and one synthesized attribute. The table in Figure 3.28 shows the results. As we have seen above, thej-th visit to a node of type Mx in Figure 3.24is a building block for setting the attributes in( S ~ x ) ~ :
// Require the attributes needed to compute the // attributes in( ~ ~ xto) ~be set;
Compute the set( ~ M ~ ) ~ ; Visit child &lx for the j-th time; // Child M x returns with the set evaluated.
but it can only be applied inan environment in which the valuesof the attributes in are available or can be evaluated. With this knowledge we can now construct the code for the first (and only) visit to nodes of the type Number. Number has only one alternative Number lat e - rna t ive-l, so the code we are about to generate will be part of a routine Visit-1 to Number alternative-l ( ). The alternative consists of a Digit-Seq and a Base-Tag. The set E of attributes that have already been evaluated is empty at this point and ne~~-visi2_nuwtberDigitSeq and ne~t_visit-numherBas,-Tag are both zero. The building block for visiting Digit-Seq is
7 The IS-SIgraphs of the on-ter~i~als from g r ~ 3.8. m ~
value
base
IN1 Number Digit-Seq base value value Digit base Base-Tag
SN1
Figure 3.28 P~itioningsof the attributesof g
r
a 3.8.~
~
// Requires Number alt-l .Base-Tag .base to be set. // Compute the attributes in IN, of Digit-Seq O r the set { base } SET Number alt-l .Digit-Seq .base TO Number alt-l .Base-Tag .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Number alt-l .Digit-Seq); // Digit-Seq returns with its$ N I , the set { value // it supplies Number alt-l .Digit-Seq .value.
m
7
and the one for Base-Tag is
// Requires nothing. // Compute the attributes in IN1 of Base-Tag 0 the set { } : // Visit Base-Tag for the first time: I
Visit-l. to Base-Tag (Number alt-l .Base-Tag); // Base-Tag returns with its S N I I the set { base evaluated; // it supplies Number alt-l .Base-Tag .base.
Their data requirements have been shown as comments in the first line; they derive from the set IN1 of Digit-Seq and Base-Tag, as t r a n s f o ~ e dby the data dependencies of the attribute evaluation rules. For example, INI of Digit-Seq says that the first visit The attributeevaluation requires Number alt-l .Digit-Seq .basetobeset. rule for this is SET Number alt-l .Digit-Seq .base TO Number alt-l .Base-Tag .base;
whosedatadependencyrequires Number alt-l . Base-Tag .base tobeset. thevalue of Number alt-l .Base-Tag .baseis notin E atthemoment, so the eq be generatedat this moment. building block for visitingDig t-Si cannot Next we turn to the building block for visiting Base-Tag, also shown above. This building block requires no attribute values to be available, so we can generate code for it. of Base-Tag showsthatthebuildingbsetsthevalue of Number to ase-Tag .base,so Number alt-l . e-Tag .baseis added E. This frees the way for the building block for visiting Digit-Seq, code for which is generated next. The set SNjof Digit-Seq consists of the attribute value,so we can add Number alt-l .Digit-Seq .value to E. There are no more visits to generate code for, and we now have to wrap up the routine Visit-l to Number alternative-l( ) . The set SN1 of Number containsthe attribute value,so code for setting Number .value must be generated. The attribute evaluationrule in Figure3.8showsthat Number .valueis just a copy of Number alt-1 .Digit-Seq .value,which is available, since it is in E. So the code can be generated and the attribute ramm mar turns out to be an ordered attribute grammar, at least as far as Number is concerned. All these considerations result in the code of Figure 3.29. Note that we have effectively been doing a topological sort on the building blocks, using the data dependencies to compare building blocks. For good measure, and to allow comparison with the co~espondingroutine for the data-flowmachineinFigure3.14,wegivethecode for visitingthefirstalternative of Digit-Seq in Figure 3.30. In this routine, the order in which the two children are visited is im~aterial,since the data dependencies are obeyed both in the order (Digi t-Seq, Digit) and in the order (Digit,Digit-Seq). Similar conflict-free constructions are possible for Digit and Base-Tag, so the grammar of Figure 3.8 is indeed an ordered attribute grammar, and we have constructed automatical~yan att~buteevaluator for it. The above code indeed visits each node of the integer ~ u ~ bonly e r once.
PROCEDURE Visit-l to Number alternative-l ( pointer to number node Number, pointer to number alt-l node Numberalt-l 1: // Visit 1 from the parent: flow of control from the parent enters // The parent has set the attributesIN, inof Number, the set{ }.
// Visit some children: // Compute the attributes in IN1 of Base-Tag 0 , the set { }: // Visit Base-Tag for the first time: Visit-l to Base-Tag (Number alt-l .Base-Tag); // Base-Tag returns with its$ N I , the set { base }, evaluated.
// Compute the attributes inI N 1 of Digit-Seq 0 , the set { base }: SET Number alt-l .Digit-Seq .base TO Numberalt-l .Base-Tag .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Number alt-l .Digit-Seq); // Digit-Seq returns with its$ N I / the set { value }, evaluated. // End of the visits to children.
// Compute the attributes in $NI of Number, the set{ value }: SET Number .value TO Number alt-l .Digit-Seq .value; Figure 3.29 Visiting code for Number nodes.
Of course, numbers of the form [0 -9 ]+ [BD] can be and normally are handled by the lexical analyzer, but thatis beside the point. The point is, however, that
-the grammar for Number is representative of those language constructs in which information from further on in the text must be used,
-the algorithms for ordered attribute evaluation have found out automatically that no node needs to be visited more than once in this case, provided they are visited in the right order. See Exercises 3.6 and 3.7 for situations in which more than one visit is necessary. The above construction was driven by the contents of the ~ a ~ i t i o n i nsets g and the data dependencies of the attribute evaluation rules. This suggests a somewhat simpler way of constructing the evaluator while avoiding testing the ~artitioningsfor being a~~ropriate:
-Construct the IS-SI graphs while testing for circularities.
-Construct from the IS-SI graphs thep~titioningsusing late evaluation.
-Construct the code for the visiting routines, starting from the obligation to set the attributes in SNk and working backwards from there, using the datade~endenciesand the IN and SN sets of the building blocks supplied by the other visit routines as our guideline. If we can construct all visit routine bodies without violating the data dependencies, we haveprovedthatthegrammarwasorderedandhaveatthesametimeobtainedthe
PROCEDURE Visit-l to Digit-Seq alternative-l ( pointer todigit-seq node Digit-Seq, pointer to digit-seq alt-l node Digit-Seq alt-l 1: // Visit l from the parent: flow of control from the parent enters here. // The parent has set the attributes Iin N 1 of Digit-Seq, the set{ base }.
// Visit some children: // Compute the attributes in IN, of Digit-Seq
(), the set { base }: SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Digit-Seq alt-l .Digit-Seq); // Digit-Seq returns with itsS N z , the set { value }, evaluated.
// Compute the attributes inINI of Digit 0 , the set { base }: SET Digit-Seq alt-l .Digit .baseTO Digit-Seq .base; // Visit Digit for the first time: Visit-l to Digit (Digit-Seq alt-l .Digit); // Digit returns with itsSNI,the set { value }, evaluated.
// End of the visits to children. // Compute the attributes inSN1 of Digit-Seq, the set { value }: SET Digit-Seq .value TO Digit-Seq alt-l .Digit-Seq .value * Digit-Seq .base Digit-Seq alt-l .Digit .value;
f
Figure 3.30 Visiting code for Digit-Seq alternative-l nodes.
multi-visit attribute evaluation code. This technique is more in line with the usual compiler construction approach: just try to generate correct efficient code;if you can you win, no questions asked. Farrow (1984) discusses a more complicated technique that creates attribute evaluaor not. tors for almost any non-cyclic attributeg r a m ~ a rordered ,
There are a series of restrictions that reduce the most general attribute grammars to ordered attribute grammars. The important point about these restrictions is that they increase considerably the algorithmic tractability of the g r a l ~ ~ a but r s are almost no obstacle to the compiler writer who uses the attributeg~ammar. The first restriction is that all synthesized attributes of a production and all inherited attributes of its children must get values assigned to them in the production. Without this restriction, the attribute grammaris not even well-formed. The second is that no tree produced by the grammar may have a cycle in the attribute dependencies. This property is tested by constructing for each non-terminal N, a s u ~ m a r y ,
the IS-SI graph set, of the data-flow possibilities throughall subtrees deriving fromN. The test for this property is exponential in the numberof attributes in a non-terminal andidentifies non-cyclic attribute grammars. In spite of its exponential time requirement the test is feasible for ‘normal’ attribute grammars on present-day computers. The third restriction is that the grammar still be non-cyclic even if a single ISgraph is used per non-terminal rather than an IS-SI graph set, The test for this property is linear in the numberof attributes in a n o n - t e ~ i n aand l identifies strongly non-cyclic attribute gram~ars. s thattheattributescanbeevaluatedusin The fourthrestrictionrequ Thisleadstomulti-visit attri~utegra multi-visitscheme of Figure 3. rammars have a partitioning for the attributes of each non-terminal, as described above. Testing whether an attribute grammar is mu~ti-visitis exponential in the total number of attributes in the worst case, and therefore prohibitively expensive (in the worst case). The fifth restriction is thatthepartitioningconstructedheuristicallyusin evaluation criterion turn out to be acceptable and not create any new cycles. Th ordered att~butegrammars. The testis O(n2) where rz is thenumber of attributesper non-te~inalif implemented naively, and O(n In n ) in theory, but since n is usually small, this makes little difference. Each of these restrictions is a real restriction, in that the class it defines is a proper subclass of the class above it. So there are g r a ~ m a r sthat are non-cyclic but not stro non-cyclic, strongly non-cyclic but not multi-visit, and multi-visit but not ordered. these ‘difference’ classes are very small and for all practical p u ~ o s e sthe above classes form a single class, ‘the attribute grammars’. In the following two sections we will discuss two classes of attribute g r a m ~ a r sthat result from far more serious restrictions: the ‘L-attributedg r a m ~ a ~ sin, ,which an inherited attribute of a child of a non-terminal N may depend only on synthesized attributes of children to the left of it in the production rule for N and on the inherited attributes of N itself; and the ‘S-attributed grammars’, which cannot have inherited attributes at all.
The parsing process constructs the nodes in the syntax tree in left-to-right order: first the parent node and then the children in top-down parsing; and first the children and then the parent node in bottom-up parsing. It is interest in^ to consider attribute grammars that can match this behavior: attribute g r a m m ~ swhich allow the attributes to be evaluated in one t traversal of the syntax tree. Such grammars are called L-attributed grammar is characterized by the fact that no any of its ~roductionrules has a data-flow arrow that points froma child to that child or to a child to the left of it. Many programming languageg r ~ m a r are s L-attri~ut~d; this is not su~rising,since the left-to-right i n f o ~ a t i o nflow inherent in them helps pro~rammersin reading and understan~ingthe resulting programs. An example is the dependency graph o f the rule for Constant-def inition in Figure 3.3, in which no i n f o ~ a t i o nflows from Expression to Def ined-identif The ier. human reader, like the parser and the attribute evaluator, arrives at a Constant-def ini ion with t a symbol table, sees the
defined identifier and the expression, combines the two inthe symbol table, and leaves the c ~ n s t ~ n t - ~ e ~i oinnbehind. it An example of an attribute grammar that is not Lattributed is the Number grammar from Figure 3.8: the data-flow arrow for base points to the left, andin principle the reader has to read the entire digit sequence to find the B or r) which tells how to interpret the sequence. Only the fact that a human reader can grasp the entire number in one glance saves him or herfrom this effort; computers are less fortunate. The ~ - a t t ~ b u t property ed has an important consequence for the processing of the syntax tree: once workon a node has started,no part of the compiler will need to return to one of the node's siblings on the left to do processing there. The parser is finished with them, and all their attributes have been computed already. Only the data that the nodes contain in the form of synthesized attributes are still important. Figure3.31 shows part of a parse tree for an L-attributed g r a m m ~ .
1 Data tlow in part of a parse tree for an L-attributed ~
r
a
~
~
~
We a s s u ~ ethat the attribute valuator is working on node C 2 , which is the second child of node B,, which is the third child of node A; whether A is the top or the child of another node is i~material.The upward arrows represent the data flow of the synthesized attributes of the children; they all point to the right or to the synthesized attributes of the parent. All inhe~itedattributes are already available when work on a node starts, and can be passedtoanychildthatneedsthem.Theyareshown as d o ~ n w a r darrowsinthe diagrm. Figure 3.31 shows that when the evaluator is working on node C 2 , only two sets of attributes play a role:
-all attribut~sof the nodes that lie on the path from the top to the node being processed: C2,B3, and A,
-the synthesized attributes
of the left siblings of those nodes: Cl, B1, B2, and any left siblings of A not shownin the diagram.
More in particular, no role is played by the children of the left siblings of C2, B,, and A, since all co~putationsin them have already been pe~ormedand the results are summarized in their syn~esizedattributes. Nor do the right siblings of C 2 , B,, and A play a role,
,
since their synthesized attributes haveno influence yet. The attributes of C 2 , B3, and A reside in the corresponding nodes; work on these nodes has already started but has not yet finished. The sameis not true for the left siblings of C 2 , B3,and A, since the work on them is finished; all that is left of them are their synthesized attributes. Now, if we could find a place to store the data synthesized by these left siblings, we could discard each node in left-to-right order, after the parser has created it and the attribute evaluator has computed its attributes. That would mean that we do not need to construct the entire syntax tree but can restrict ourselves at all times to the nodes that lie on the path from the top to the node being processed. Everything to the leftof that path has been processed and, except for the synthesized attributes of the left siblings, discarded; everything to the right of it has not been touched yet. A place to store the synthesized attributes of left siblings is easily found: we store them in the parent node. The inherited attributes remain in the nodes they belong to and their values are transported down along the path from the top to the node being processed. This structure is exactly what top-down parsing provides. This correspondence allows us to write the attribute processingcode between the various members, to be performed when parsing passes through,
Anexample of a system for handlingL-attributed g r a ~ a r sis LLgen; LLgen was explained in Section 2.2.46, but the sample code in Figure 2.76 featured synthesized attributes only, representing the values of the expression and its subexpressions. Figure 3.32 includes an inherited attribute as well:a symbol table which containsthe representations of some identifiers, together with the integer values associated with these identifiers. Thissymboltable is producedas a synthesizedattribute by the n o n - t e ~ i n a l declarations in the rule main,which processes one or more identifier declarations. The symbol table is then passed as an in he rite^ attribute down through expression and express ion-t ai l-opt to be usedion, finally in term to look up the value of the identifier found. Thisresultsinthesynthesizedattribute *t, which is thenpassed on upwards.Forexample,theinput b = 9, c = 5 ; b- c,passedtotheprogramproduced by LLgen from the grammar in Figure 3.32, yields the output result = 4. Note that synthesized attributes in LLgen are implemented as pointers passed as inherited attributes, but this is purely an implementation trickof LLgen to accommodate theC language, which does notfeature output parameters. The coordination of parsing and attribute evaluation is a great si~~lification compared to multi-visit attribute evaluation, butis of course applicable to a much smaller class of attribute g r a ~ a r s .Many attribute grammars can be doctored to become L-attributed g r ~ m ~and s ,it is up to the compiler const~ctorto decide whether to leave the grammar intact and use an ordered attribute evaluator generator or to modify the grammar to adapt it to a system like LLgen. In earlier days much of compiler design consistedof finding ways to allow the -i ~ p l i c i t-attribute grammar to be bandied by a handwritten left-to-right evaluator, to avoid handwritten multi-visit processing, The L-attributed technique allows a more technical definition of a narrow compiler than the one given in Section l .4.1. A er is a compiler, based formally or informally on some form of L-a~tributedg r a m m ~ that , does not save substantially more
{ #include "symbol-table.hlf
1
%lexical get-next-token-class; %token IDENTIFIER; %token DIGIT; %start Main-Program, main; main {symbol-table sym-tab; int result;}: {init-symbol-table(&sym-tab) ;} declarations(sym-tab) expression(sym-tab, &result) (printf('lresu1t = %d\n", result) ;} I
declarations(symbo1-table sym-tab) : declaration(sym-tab) [ declaration(sym-tab) l *
I;'
I
declaration(symbo1-table sym-tab) {symbol-entry *sym-ent;}: repr) } IDENTIFIER {sym-ent = look-up (sym-tab, Token. ; I DIGIT {sym-ent->value = Token-repr - IO';} " I
I
expression(symbo1-table sym-tab; int *e) {int t;}: = t;} term(sym-tab, {*e&t) expression-tail-option(sym-tab, e) I
expression-tail-option(symbo1-table sym-tab; int *e) {int t;}: - I term(sym-tab, &t) {*e -= ti> expression-tail-option(sym-tabf e)
I
I
term(symbo1-table sym-tab; int *t) : IDENTIFIER {*t = look-up(sym-tab, Token.repr)->value;} I
LLgen code for an L-attributed
mar for simple expressions.
infor~ationthan that which is present on the path from the top to the node being processed. In most cases, the length of that path is proportional to In n, where n is the length of the program, whereas the sizeof the entire AST is proportional to n. This, and the intuitive appeal of L-attributed grammars, explains thep o p u l ~ t yof narrow compilers,
We have seen that the attribute evaluation in L-attributed g r a ~ a r can s be inco~orated conveniently in top-down parsing, but its implementation using bottom-up parsing is less obvious. In fact, it seems impossible. The problem lies in the inherited att~butes,which must be passed down from parent nodes to children nodes. The problem is that in bottomup parsing the parent nodes are identified and created only q-fter all of their children have been processed, so there is just no place from where to pass down any inherited attributes when they are needed. Yet the most famous LALR(1) parser generator yncc does it anyway, and it is interesting to see howit accomplishes this feat. A s explained in Section 2.25.2, a bottom-up parser has a stack of shifted terminals and reduced non-terminals;we parallel this stack with an attribute stack which contains the attributes of each stack element in that same order, The problem is to fill the inherited attributes, since code has to be executed for it. Code in a bottom-up parser can only be executed at the end of an alternative, when the corresponding item has been fully reco nized and is being reduced. But now we want to execute code in the middle: A "+ B {C-inh-attr := f (B.syn-attr);} C
.syn-at tr is a synthesized attribute of B and C .inh-at tr is an inherited of C. The trick is to attach the code to an E rule introduced for the purpose, say A-actionl: A -+B A-action1 C A-action1 "+ E {C.inh-attr := f (B.syn-attr) ;} Yncc doesthisautomaticallyand also remembersthecontext of A-act ionl, so B .syn-a ttr1 and C .inh-at tr1 can be identified in spite of them havin out of their scopes by the above t r a ~ s f o ~ a t i o n . Now the code in A-ac tionl is at the end of an alternative and can be executed + E is reduced, This works, but the problem is that after when the item A-act ionl this ~ a n s f o ~ a t i othe n grammar may no longer be LALR(1):introducing E rules is bad for botto~-upparsers. The parser will work onlyif the item
is the only onein the set of hypotheses at that point. Only then can the parser be confident that this is the item and that the code can be executed. This also ensures that the parent node is A,so the parser knows already it is going to construct a parent node A. These are severe re~uirements. Fortunately, there are many grammars with only a small number of inherited attributes, so the method is still useful. There are a number of additional tricks to get cooperation between attribute evaluation and bottom-up parsing. One is to lay out the attribute stack so that the one and only synthesized attribute of one node is in the same position as the one and only inherited attribute of the next node. This way no code needs to be executed in between and the problem of executing code inthemiddle of a grammarrule is avoided. See the yacc or is^^ manual for details and notation.
If inherited attributes are a problem, let's get rid of them. This gives S, which are characterized by havingnoinheritedattributes at all. It is remarkable how much can still be done within this restriction. In fact, anything that can be done in an , we will show in Section L-attrib~~ted grammar can be done in an S-attributed g r ~ m a ras 3.1.9. Now life is easy for bottom-up parsers, Each child node stacks its synthesized attributes, and the code at the end of an alte~ativeof the parent scoops them all up, processes them, and replaces them by the resulting synthesized attributes of the parent. A typical example of an S-attributed ramm mar can be found in the yaec code in Figure 2.108. The ion: code at the endof the first alternative of express {$$ = new-expro; $$->type =
$$->expr =
$1;
$$->term = $3;)
picks up the synthesized attributes of the children, $1 and $3, and combines them into the synthesized a t ~ i b ~of t e the parent, $$. For historical reasons, yacc grammar rules each have exactly one synthesized attribute; if more than one synthesized attribute has to be returned, they have to be combinedinto a record, which then foms the only attribute. This is comp~ableto functions allowing only one return value in most programming languages.
It is relatively easy to convert an L-attributed g r a m ~ a into r an S-attributed grammar, but, as is usual with grammar transfo~ations,this conversion does not improve its looks, The basic trick is to delay any computation that cannot be done now toa later moment when it can be done. More in particular, any computation that would need inherited attributes is replaced by the creation of a data structure specifying that co~putationand all its synthesized attributes. This data structure (or a pointer to it) is passed on as a synthesized attribute up to the level where the missing inherited attributes are available, either as constants or as synthesized attributes of nodes at that level. Then we do the co~putation. The ~aditionalexample of this technique is the processing of variable decl~ationin a C-like language; an example of such a declaration is int i , j ;. When inherited attributes are available, this processing can be described easily by the L-attributed g r ~ m a in r Figure 3.33. Here the rule Type-Declarator produces a synthesi~edattribute type, which is then passed on as an inherited attribute to Declared-Idf-Sequence and Declared-I .df It is combined in the latter with the representation provided by Idf, and the combination is added to the symbol table. In the absence of inherited attributes, Declared-Idf can do only one thin repr as a synthesized attribute, as shown in Fi ure 3.34. The various reprs result in^ from the occu~encesof Declared-Idf in Declared-Idf-Sequence are collected into a data structure, which is yielded as the synthesized attribute repr 1 is Finally t. this list reaches the level on which the type is known and where the delayed computations can bep e ~ o ~ e d . It will be clear that this technique can in p~nciplebe used to eliminate all inherited attributes at the expense of in~oducingmore synthesi~edattributes and moving more code
Declaration -3 Type-Declarator (type) Declared-Idf-Sequence (type)
;I
I
Declared-Idf-Sequence(1NH type) -+ Declared-Idf (type)
I
Declared-Idf-Sequence (type)
I
I
' Declared-Idf (type)
l
Declared-Idf(1NH type) -+ Idf (repr) ATTRIBUTE RULES: ; Add to symbol table (repr, type) I
Figure 3.33 Sketch of an L-attributed g r a l for ~ Declaration. ~
Declaration -3 Type-Declarator(type) Declared-Idf-Sequence(repr list) ATTRIBUTE RULES: FOR EACH repr IN repr list: Add to symbol table (repr, type) ;
I
;
'
Declared-Idf-Sequence(SYN repr list) -+ Declared-Idf (repr) ATTRIBUTE RULES: SET reprlist TO Convert to list (repr) ;
Declared-Idf-Sequence(o1d repr list) Declared-Idf(repr) ATTRIBUTE RULES: SET repr list TO Append to list (old repr list, repr); I t t
I
Declared-Idf ( S Y N repr) Idf (repr)
-+
l
Figure 3.34 Sketch of an S- attribute^ g r a ~ a for r Declaration.
up the tree. In this way, any L-attributed gram ma^ can be converted into an ~ - a t ~ i b u t e ~ one. Of course, in some cases, some of the attribute code will have to be moved right to the top of the tree, in which case the conversion automatically creates a separate postprocessing phase. This shows that in principle one scan over the input is enough.
The transformationfromL-attributed to S-attribl~tedgrammarseemsattractive: it allowsstronger,bottom-up,parsingmethodstobeused for themoreconvenient Lattributedgrammars.Unfortunately,thetransformation is practically feasible for small problems only, and serious problems soon arise. For example, attempts to eliminate the entire symbol table as an inherited attribute (as used in Figure 3.2) lead to a scheme in which at the end of each visibility range the identifiers used in it are compared to those declared in it, and any identifiers not accounted for are passed on upwards to surrounding visibility ranges. Also, much info ation has to be carried around to provide relevant error messages. See Exercise 3.12 for a possibility to automate the process. Note that the code in Figures 3.33 and 3.34 dodges the problem by having the symbol table as a hidden variable, outside the domainof attribute g r ~ m a r s .
Notations like E .att r for an attribute deriving from g r a m a s symbol E break down if there is more than one E in the g r a m ~ a rule. r A possible solution is to use E [l] ,E [2 ], etc., for the children and E for the non-ter~inalitself, as we did for Digit-Seq in Figure 3.8. More serious problems arise when the right-hand side is allowed to contain regular expressions over the grammar symbols, as in E B W notation. Given an attribute grammar rule Declaration-Sequence(SYN symbol table) -+ Declaration* ~ T T R ~ RULES: ~ ~ T E
it is less than clear how the attribute evaluation code could access the symbol tables produced by the individual Declarations,to combine them into a single symbol table. Actually, it is not even clear exactly what kind of node must be generated for a rule with a variable number of children. As a result, most general attribute g r a ~ a systems r do not allowEBNF-likenotations. If the systemhasitsownattribute rule language,another option is to extend this language with data access operations to match the EBNF extensions. L-attributed and S-at~ibuted gram~ars have fewer problems here, since one can just write the pertinent code inside the repeated part. Thisis the approach taken inLLgen and a possible form of the above rulefor Declaration-Sequence in LLgen would be Declaration-Sequence(struct Symbol-Table *Symbol-Table) {struct Symbol-Table st;}: {Clear-Symbol-Table (Symbol-Table) ;} [ Declaration(&st) {~er~e-S~bol-Tables(Symbol-Table, &st);}
l* I
given proper declarations of the routines Clear-Symbol-Table ( ) and ~ e r ~ e - S y ~ o l - T a(~ ) .lNote e ~ that LLgen uses square brackets [1 for the group-
ing of g r ~ a t i c a constructs, l to avoid confusion withthe parentheses ( ) used for passing at~ibutesto rules.
This concludes our discussion of gramm~-basedcontext handling. In this approach; the context is stored in attributes, and the gramatical basis allows the processing to be completelyautomatic(forattribute g r ~ m a r s or ) largely auto ma ti^ (for L- and S-att~buted g r ~ m ~ sFigure ) , 3.35 s u m m ~ i ~ the e s possible attribute value flow through the AST for orderedattribute g r ~ m a r s L-att~buted , grammars,andS-attributed ~ r a r n m ~ sValues . may flow along branches from anywhere to anywhere in ordered attribute onebranchandthendownthenextinL-attributedgrammars,andupw a t ~ b u t e dgrammars.
L-attributed
Ordered Pictorial comparison of three types of attribute ~ r a ~ ~ s .
We will now discuss some manual methods, in which the context is stored in ad-hoc data st~ctures,not intimately connected with the grammar rules. Of course most of the data structuresare still associatedwithnodes of theAST, since theAST is theonly representation of the program thatwe have.
Although attribute grammars are the only means we have at the moment for gene~~ting contextprocessingprogramsautomatically,themoreadvanced Ute evaluationtechniques for them are still in their infancy, and much context proce done at a lower level, by writing code in a traditional language like C or C++. We will give here two methods to collect context information from the AST; one is co~pletely manual and the other uses some reusable software. Whether this collected i n ~ o ~ a t i oisn then stored in the nodes (as with an attribute g r a ~ ~stored ) , in compiler tables, or consumed imediately is immaterial here: since it is all handy-work, it is up to the compiler
writer to decide where to put the information. The twomethodsare s y ~ ~ o l i c i ~ t ~ r ~ and r ~ t ~at^-^^^ t i o n e~u~tions. from the AST as produced by the syntax analysis, possibly alread extent, but both require more flow-of-control i n f o ~ a t i o nthan the particular, we need to knowfor each node its possible ow-of-control s~ccessoror succesCO sors. Although it is in principle possible to determine these successors while checking the context infor~ation,it is much more convenient to have the flo available in each node in the €om of successor pointers, These ~ointerslink the nodes i the AST together inan a~ditionaldata structure, the c o ~ t ~ u l g~ Z u ~~ ~ ~ ~ Roadmap 3.2 ~~u~ methods 3.2.1 Threading the AST 3.2.2 Symbolic inte~retatio~ 3.2.3 Data-flow equations 3.2.4 Inte~rocedur~ data-flow analysis 3.2.5 Carrying the info~ationupstream -live analysis 3.2.6 Comparing symbolic i~ter~retation and data-flow equations
c m be constructed statically by t e tree, as follows. for each node type; the threading pointer to the node to be processedas a p ~ a ~ e t e r , d e t e ~which i n e s p~oductionrule of N describes the node, and calls the threading routines of its children, in a r of theAST. The set of routinesmaintains a globalvariable Last n which points to thelast node processed on the control flow path, the dyn When a new node N on the control path is met during the recursive traversal,its a ~ d ~ e' ss storedin Last node pointer.successar and Last node to point toN. Using this technique, the threading routinefor a binary expression have the following form: ~ R O C E ~ U RThread E binary expression (Expr node pointer): Thread expression (Expr node pointer.left o Thread expression (Expr node pointer.right operand); // link this node to the dynamically last node SET Last node pointer .successor TO Expr node pointer; // make this node the new dynamically last node SET Last node pointer TO Expr node pointer;
This makes the present node the successor of the last node of the ri registers it as the next dynmically last node. ure 3.36 showstheASTandthe control flowgraph for the e ~ ~ r e s s b*b io~ 4 *a*c, wherethepointersthat make uptheASTareshown as solidlinesand 01 flow graph is shown using arrows. Note that the fxst node visited by the thread
.
Last
node
pointer
Last
node
pointer
I \
l
I I I
\
I
i /
X
A ’
/
l\ I
h
\
*
4
a
situation
initial
C
finalsituation
~ i ~3.36 ~ Control r e flow graph for the expression b *b -
4 *a *c .
is the -node at the top but that thefirst node in the control flow graphis the leftmost b,as it should be. Statically, the -node is the first node, but dynamically, at run time, the leftmost b is the first node. Threading code in C for the demo compiler from Section 1.2is shown in Figure 3.37. The threading code for a node representing a digit is trivial, that for a binary expression node derives directly from the code for Thread binary expression given above. Since there is no first dynamically last node, a dummy node is used to play that role temporarily. At the end of the threading, the thread is terminated properly; its startis retrieved from the dummy node and stored in the global variable Thread-start,to be used by a subsequent interpreter or code generator. A complication arises if the flow of control exits in more than oneplace from the tree below a node. This happens, for example, with the if-statement. Actually, there are two problems here, The first is that the node that corresponds to the run-time thedelse decision has two successors rather than one, and the second is that when we reach the node dynamically following the entire if-statement, its address must be recordedin the dynamically last So a singlevariable Last node nodes of boththethen-partandtheelse-part. pointer is no longer sufficient. The first problem can only be solved by just storing two successor pointers in the ifnode; this makes the if-node different from the other nodes, but in any graph that is more complicated than a linked list, some node will have to store more than one pointer. One way to solve the second problem is to replace Last node pointby a er set of last nodes, each of which will be filled in when the dynamically next node in the control flow path is found. But it is often more convenient to constructa special join node to merge the diverging flow of control. Such a node is then part of the control flow graph without being part of the AST; we will see in Section 3.2.2 that it can play a useful role in context checking.
#include"parser. #include"thread.
h" h"
/* fortypesAST-nodeandExpression /* for selfcheck */ /* PRIVATE */
*/
static AST-node *Last-node; static void Th~ead-expression(Expression*expr) { switch (expr->type) { case 'D' : Last-node->successor = expr; Last-node = expr; break; case 'Pr: Thread-expression(expr->left); Thread-expression(expr->right); Last-node->successor = expr; Last-node = expr; break;
}
1
/* PUBLIC */
AST-node *Thread-start; void Thread-AST(AST-node *icode) { AST-node Dummy-node;
1
Last-node = &Dummy-node; Thread-expression(icode); Last-node->successor = (AST-node *IO; Thread-start Dummy-node.successor; I :
Figure 3.37 Threading codefor the demo compiler from Section 1.2.
The threading routine for an if-statement could then have the form shown. in ure 3.38. The if-node passed as a parameter has two successor pointers, true succe sor and false successor. Note that these differ from the then part and el part pointers; the part pointers point tothe tops of the co~espondingsyntax subtrees, the successor pointers point to the dynamically first nodes in these subtrees. The code st by threading the expression which is the condition in the if-statement; next, the if-n itself is linked in as the dynamically next node, Last node pointer having been set by Thread expression to point to the dynamically last node in the expression. prepare for processing the then- and else-parts, an End-if node is created, to be use combine the control flows from both branches of the if-statement and to serve as a link to the node that dynamically follows the if-statement. Since the if-node does not havea single successorfield, it cannot be used as a last tocatch the pointerst node, so weuse a localauxiliarynode Aux last node dynamically first nodesinthethen-and else-parts. The call of Thread b l o c k node .then part ) will put the pointer to its dynamically first node in Aux l node,fromwhereit is picked up andassignedto If node .true success
0 IO^ 3.2
~ a n ~~ e at ~ o ~ s
the next state~ent.Finally, the endof the then-part will have the end-if-join node set as its successor.
OCEDURE Thread if statement (If node pointer) : Thread expression (If node pointer .condition); SET Last node pointer .successor TO If node pointer; 0 ; SET End if join node TO Generate join node
SET Last node pointer TO address of a local node Auxnode; last Thread block (If node pointer .then part); ^. SET If node pointer .true successor TO Aux last node .successox SET Last node pointer .successor TO address of if Endjoin node; . I
T Last node pointer TO address of Thread block (If node pointer .else SET If node pointer .false successor SET Last node pointer .successor TO
Aux node; last part); TO Aux last node .successor address of if Endjoin node;
SET Last node pointer TO address of End if node; join Figure 3.38 Sample threading routine for if-statements.
~~
Last node I
pointer
l I
0.
i.
I
If-statement
i
I
V
x Condition
Thensart
Elsesart
igure 3.39 AST of an if-statement before threading.
Given the AST from Figure3.39, the routine will threadit as shown in Figure3.40. ote that the Last node p o i n t er pointer has been moved to point to the end-ifjoin node. ~ ~ athe dASTican~also ~be expressed by means of an attribute grammar. The suc-
Last
pointer . --------"""""""~
node
" " "
If-statement
\ I I I
X
0 AST and control flow graph of an i f - ~ t a t e ~ eafter nt t ~ e a d i ~ g .
cessorpointersarethen i ~ ~ l e m e n t easd inherited att~butes. oreover, each node additional synthesized attribute that is set by the evaluatio~rules to the pointer to the first node to be executed inthe tree.
If-statement(1NH successor, SYN first) "+ 'IF' Condition 'THEN' Then-part 'ELSE' Else-part 'END' 'IF' ATTRIBUTE RULES: SET If-statement .first TO Condition .first; TO Then-part .first; SET Condition .true successor SET Condition .false successor TO Else-part .first; SET Then-part .successor TO If-statement .successor; SET Else-part .successor TO If-statement .successor; Figure 3.41 Threading an if-statement using attribute rules.
The t ~ e a d i n grules for an if-statement are given in Figure 3.41. In this e ~ ~we~ assume that there is a special node ty Condition (as sug sema~ticsof which is toevaluate the oleanexpressionand to true successor or false S U C C ~ S S O as ~ ,the case may be. It is often useful to i~plementthe control flow graph as a ~ o u b l ~ - l i graph in which each link consists of a pointer pair: one from the node to thesucces~orand
~
l
e
one from the successor to the node. This way, each node contains a set of pointers to its dynamic successor(s) anda set of pointers to its dynamic predecessor(s). This ~ a n g e m e n t gives the algorithms working on the control graph great freedomof movement, which will prove especially useful when processing data-flow equations. The doubly-li~edcontrol flow graph of an if-statement is shown in Figure3.42.
Last
. pointer
node
""-----"
" " " " " " " " "
If-statement
\ I l
~ i ~ u 3.42 r e AST and doubly-lin~ed control flow graph of an if-statement.
No threading is possible in a narrow compiler, for the simple reason that there is no AST to thread. ~orrespo~dingly less context handling can be done than in a broad compiler. Still, since parsing of programs in imperative languages tends to follow the flow of control, some checking can be done. Also, context handling that cannot be avoided, for example strong type checking, is usually based on information collected in the symbol table. Now that we have seen means to construct the complete control flow graph of a program, we are in a position to discuss two manual methods of context handling: ~ y ~ ~ o Z i c inter~retation,which tries to mimic the behavior of the program at run time in order to col~ a ist i a~ semi-automated n ~ , restricted lect context infor~ation,and ~ a t a - ~ o ~ e ~which form of symbolicinterpretation.Othermethodsarepossible.Anunusualapproachto data-flow analysis, based on a g r a ~ a t i c a paradigm l different from that of the attribute g r a ~ a r sis, given by Uhl and Horspool (1 994), in which the kinds of information to be gathered by data-flow analysis must be specified and the processing is automatic.
When a program is executed, the control follows one possible path through the control flow graph. The code executed at the nodes is not the rules code of the attribute which represents (compile-time) context relations, but code that represents the semantics of the node. For example, the attribute evaluation code in the if-stat Figure 3.41 is mainly concerned with updating the AST and with passing around tion about the if-statement. At run time, however, the code executed by an if-state~ent node is the simple jump to the then- or else-part depending on a condition bit c o ~ p ~ t e just before. The run-time behavior of the code at each node is determined by the values of t variables it finds at run time upon entering the code, and the behavior d e t e ~ i n e sth values again upon leaving the code. Much contextual information about variables can deduced statically by simulating this run-time processat compile time ina techni~ue calk n or simu~~tion on the st Todo so, we attach a re the arrow in control flow g . In principle, compil this representation of the run-time stack holdsan entry for each identifier visible at that point in the program, regardlessof whether the c o ~ e s p o n d i nentity ~ will indeed be put on the stack at run time. In practice we are mostly interested in variables and constants,so most e~tries will concern these, The entry summarizes all compile-time infomation we have about the variable or the constant, at the moment thatat run time thecontrol is following the arrowi the control graph. Such infor~ationcould, for example, tell whether it has been initiali~e or not, or even what its value is. The stack representations at the entry to a node and at its exit are connected by the semanticsof that node. Figure 3.43 showsthestackrepresentationsin the control flowgraph of an ifstatement similar tothe one in Figure 3.40. We assume that we arrive with a stack containing two variables, x and y, and that the stack representation indicates that x is initializ and y has the value 5; so we can be certain that when the program is run and the flow control arrives at the if-statement, x will be initialized and y will have the value 5. also assume that the condition is y > 0. The flow of control arrives first at the node and leaves it with the value of y put on the stack. Next it comes to the 0, which stacked, and then to the operator>,which unstacks both operands and replaces them value true. Note that all these actions can be perfomed at compile time thanks to the fact that the value of y is known. Now we arrive at the if-node, which unstacks the condition and uses the value to decide that only the then-part will ever be executed; else-part the can be marked as unreachable and nocode will need tobe generated for it. Still, we depart for both branches, armed with the same stack representation, and we check them bo since it is usual to give compile-time error messages evenfor errors that occur in unrea able code. The outline of a routine Symbolically interpret an if statement ' given in Figure 3.44. It receives two parameters, describing the stack represe~tationan the 'if' node, First it symbolicallyinterpretsthecondition.Thisyields a newstac representation,whichholdstheconditionontop. The condition is unstacked,andthe resulting stack representation is then used to obtain the stack representations at the ends of the then-andtheelse-parts.Finallytheroutinemergesthesestackrepresentationsand yields the resulting stack representation. ~~
3 Stack representatio~sin the control flow graph of m if-statement.
F ~ C T I O NSymbolically interpret anif statement ( Stack representation,If node ) R E T U R N ~ Na~ stack representation: SET New stack representation TO Symbolically interpret a condition ( Stack representation,If node .condition );
Discard top entry from New stack representation; RETURN Merge stack representations( Symbolically interpret a statement sequence ( New stack representation,If node .then part )
);
l
Symbolically interpret a statement sequence ( New stack representation,If node .else part 1
Outline of aroutine Symbolically
interpret
if an statement.
The actual code will contain more details. For example, it will have to check for the i presence of the else-part, since inal if-statementmayhavebeen dependingonhowthestackrepion is imple~entedit mayneed t one copy toeach branch of the if-statement. It will be clear that many properties canbe propagated in thisway t h r o ~ flow graph, and that the i n f o ~ a t i o nobtained can be very useful bo checks and for doing optimizations. In fact, this is how some imple context c h e c ~ n g p r o g rlint m operate. Symbolic interpretation in one form or another was already used in example, Naur (1945) used symbolic inte~retationto do type checkin was not described in them~nstreamliterature until the mid-1970s (Ki one of those things onedid. We will now consider the check for uninitialized variables in more detail v ~ i ~oft symbolic s inte~retation.The first, s i ~ ~ ~sy~~oZic Ze inte scan from routine entrance to routine exit and applies to structure properties only; a program is structured when it consists of flowone entry point and one exit point only, The second variant, f u Z works in the presenceof any kindof flow of control and for a wi is that si The fundamental difference between the two follows the AST closely: for each node it analyzes its child they occur in the syntax, and the stack representations are processed as Lrestricts the method to structured programs only, and to simple properties, but allows It to be applied in a narrow compiler. Full symbolic inte~retation,on the o the t ~ e a d i n gof the AST as com~utedin Section 3.2.1. This obviously AST and since the threading of the AST may and usually will contain cycles, a closure algorithm is neededtocomputethe full required i n f o ~ a t i o n , In short,the d i f f ~ r e ~ c e between full and simple symbolic inte~retationis the same as that between ~eneralattribute g r a ~ and ~ L-attributed s grammars.
io To check for the use of uninitialized variables usin
make a compile-ti~erepresentation of the local st parameterstack)andfollowthis represen~ation representation can be i~plementedconveniently as a linked list of names and roper ties pairs, a ‘property list’. The list starts off as empty, or, if there are para~eters,as initialized with the ~ ~ a m e Init ial ized for INand I p~ametersand Unini terswiththeirproperties: t ial ized for OUTparameters.We also maintain a r ist, inwhichwecombin the stack rep~esentationsas found at return statements and routineexit. We then follow the arrows in the control flow graph, all the while The precise actions required at each node type depend of course on the source languag~,butareusuallyfairly obvious, Wewilltherefore in briefly here. When a declaration is met, the declared rrarne is added to the list, w i
status: Initialized if there was an initialization in the declaration, andUninit iali zed otherwise. When the flow of control splits, for example in an if-statement node, a copy is made of theoriginal list; onecopy is followedonitsroutethroughthethen-part,theother through the else-part; and at the end-if node the two lists are merged. Merging is trivial except in the case in which a variable obtained a value in one branch but not in the other. In that case the status of the variable is set to May be ini t The ial status May i zed. be initialized is equal to Uninitialized for most purposes since one cannot rely on the value being present at run time, but a different error message can be given for its use.Notethatthestatusshouldactuallybecalled May be initial i zed a also may not be initialized. The same technique applies to case statements. When an assignmentis met, the status of the destination variable is set to Initialized,after processing the source expressionfirst, since it may contain the same variable, When the value of a variable is used, usually in an expression, its status is checked, and if it is not Initialized,a message is given: an error message if the status is Uninit ial since i thezed, error is certain to happen whenthe code is executed and a warning for May be initialized, since the code may actually still be all right. An example of C code with this property is /* y is still uninitializedhere */ if (x >= 0 ) ( y = o;} if (x =. 0 ) { z = y ; } Here the status of y after the first statement is May be initialized. This causes a warning concerning the use of y in the second statement, but the error cannot materialize, since the controlled part of the second statement will only be executed if x 0. In that case the controlled partof the first statement will also have been executed, initializing y. When we meet a node describing a routine call, we need not do anything at all in principle: we are considering information on the run-time stack only, and the called routine cannot touch our run-time stack. If, however, the routine has IN andlor INOUT parameters, these have to be treated as if they were used in an expression, and any INOUT and OUT parameters have to be treated as destinations in an assignment. When we meet a for-statement, we pass through the computations of the bounds and the initialization of the controlled variable. We then makea copy of the list, which we call -exit list. This list collects the information in force at the exit of the loop. We pass the original list through the body of the for-statement, and combine the result withthe loop-exit list, as shown in Figure 3.45. The combination with the loop-exit list represents the possibility that the loop body was executed zero times. Note that we ignore here the back jump to the beginning of the for-statement -the possibility that the loop body was executed more than once. We will see below why this is allowed. When we find an exit-loop statement inside a loop, we merge the list we have collected at that moment into the loop-exit list. We then continue with the empty list. When we find an exit-loop statement outside any loop, we give an error message. When we find a return statement, we merge the present list into the return list, and continue with the empty list.We dothe same when we reach the end of the routine, since a return statement is implied there. When all stack representations have been computed, we
expr From-expr To-expr
5 =: v
-
Forstatement
V
from to
' 1 L rom to I
I
F i ~ ~ 3r.4e5 Stack representations in the controlflow graph of a for-state~ent.
check the return list to see if all OUT parameters have obtained a value, and give an error message if they have not. And when we reach the end node of the routine, we check all variable identifiers i the list. If one has the status Uninit ial. i zed, it was never initialized, and a W can be given. The above technique can be refinedin many ways. Bounds in for-state~entsare ofte constants, either literal or named. If so, their values will often prove that the loop will performed at least once. In that case the original list should not be mergedinto the exit list, to avoid inapprop~atemessages. The same applies to the well-knownC idioms for i n f i ~ i t ~ loops:
for (;;) ... while ( 1 ) ...
Once we have a system of symbolic inte~retationin place in our compiler, we can easilyextend it to fit specialrequirements of andpossibilitiesoffered by thesource language. One possibility is to do similar accounting to see if a variable, constant, field selector, etc. is used at all. A second possibility is to replace the status Initialized by or even the set of values the variable may hold, a technique called n. This information can be used for at least two purposes: to identify variables that are actually usedas constants in languages thatdo not have constant declarations, and to get a tighter grip on the tests in for- and while-loops. Both may improve the codethatcanbegenerated.Yetanother,moresubstantial,possibility is to do last-def analysis, as discussed in Section3.2.2.3. When we try to implement constant propagation using the above technique, however, we run into problems. Consider the segment of a C program in Figure 3.46. Applying the i hasthevalue 0 at the ifabovesimple s y ~ b o l i cinterpretationtechniqueyieldsthat statement, so the test i > 0 can be evaluated at compile time and yields 0 (false). Consequently, an optimizermightconcludethatthebody of theif-statement,the call to print f () ,can be removed sinceit will not be executed. Thisis patently wrong.
int i =
0;
while (some condition) { if (i 0) printf (iiLoop reentered: i = %d\nfi,i) ; i++ ;
1
Figure 3.46 Value set analysisin the presenceof a loop statement.
It is therefore interesting to examine the situations in which, and the kind of properties for which, simple symbolic inte~retationas explained above will work. Basically, there arefour require~entsfor simplesymbolicinterpretationtowork;motivation for these requirements will be given below. 1. The p r o ~ r must a ~ consist of flow-of-control structures with one entry point and one exit point only. 2. The values of the property must form a lattice, which means that the values can be ordered in a sequence v l ..v, such that there is no operation that will transformvj into vi with i <j; we will writevi
The first requirement allows each control structure to be treated in isolation, with the property being analyzed well-defined at the entry point of the structure and at its exit. The
other three requ~ementsallow us to ignore the jump back to theb trol st~ctures,as we can see as follows, We call the value ofth operty at the entrance anteesthat v ~ ~ ~ v O ~ of theloopbody vin andthat at theexit is v,,~. ~equirement uarantees that when we merge the vOUtfrom of the first roun though the loop back into vin to obtain a value vneW at the st v , ~ ~ If~ we v were ~ ~ now . to scan the loop body for the second time, we would unde actions based on v,,~. But it follows from requirement4 that ous because of the actions already perfomed during the there is no point in p e ~ o ~ i an second g scan through theloop body, nor is there a need to consider thejump back to theb~ginningof the loop construct. i v 2=zed, May b The initialization property with valuesv 1 = uninit ial tial ized, and v 3 = Init ialized fulfills these requirements, since the initialization status can only progress from left to right over these values and the actions on U t ial i (error zedmessages)renderthoseon Nay be ( w ~ i n messages), g which again supersede those on Ini t i If these four requirements are not fulfilled, it is necessary t inte~retation,whichavoidstheabove short-cuts, Wewil using the presenceof jumps as an example.
Goto statements cannot be handled by simple symbolic inte~retation,since they violate requir~ment1 in the previous section. To handle goto statements, we need full s y ~ b o l i ~ inte~retation. Full symbolic inte~retationconsists of p e ~ o ~ i nthegsimp1 inte~retation algo~thm repeatedly until no more changes in the values ofth occur, in closure algorithm fashion. We will now consider the details of our e ~ a ~ ~ l e . We need an additional separate list for each label in the routine; these lis empty.Weperform the simplesymbolicinterpretationalgorithm as usual, a1 actions needed at jumps and labels. Each time we meet our present list into L’s list and continue with thee tself, we merge in our present list, and continue This assembles in the list for L the merger of the situations at all ~ositionsfrom where L can be reached; this is what we can count on in terms of statuses of variables at label L but not quite! If we first meet the label L and then a jump to it, the list at L was not complete, since it may be going to be modi~edby that jump. So when we are at the end of the ro~tine,we have to run the simple symbolic i n t e ~ r e t a t i o n a l g o ~ at ~ain, m using the lists we have already assembled for the labels, We have to repeat this, until nothi Only then can we be certain that we have foundall paths by which a tialized at a given label, There are several things to note here. The first is that full symbolic i n t e ~ r e t ~ t i oisna closure a l g o r i t ~ an , outline of which is shown in Figure 3.47; actually it is a family of closure a l g o ~ t h ~the s , details of which depend on the node types, source Ian etc. Note that the inference rules allow information to be inferred backwards, ing arrow to incomingmow; an example is ‘there is no function callon a to the end of the routine.’ Imple~entednaively, such inference rules le
Stack representations, with entries for every item we are interested in.
nitia~izations: l. Empty stack representations are attached to all arrows in the control flow graph residing in the threaded AST. 2. Some stack representations at strategic points are initialized in accordance with properties of the source language; for example, the stack representations of input parameters areinitialized to 1mi t i al i zed. For each node type, source language dependent rules allow inferences to be made, adding information to the stack representation on the outgoing arrows based on those on the incoming arrows and the node itself, and vice versa. 7 Full symbolic interpretationas a closure algorithm.
inefficiency, and the situation is re-exanined in Section 3.2.5. The second is that in full symbolic interpretation we have to postpone the actions on the initialization status until all information has been obtained, unlike the case of simple symbolic interpretation, where requirement 4 allowed us to act immediately. A separate traversal at the end of the algorithm is needed to perform the actions. Next we note that the simple symbolic interpretation algorith without jumps can be run. in one scan, simultaneously with the rest of the processing in a narrow compiler and that the full algorithm wit^ the jumps cannot: the tree for the routine has to be visited repeatedly. So, checking initialization in the presence of jumps is fundamentally more difficult than in their absence. But the mostimportthing to note is thatalthough full symbolicinterpretation removes almost all the requirements listed in the previous section, it does not solve all problems. We want the algorithm to terminate, but it is not at all certain it does. When trying naively to establish the set of values possible for i in Figure 3.46, we first find the then turns this into the set { 0, 1 }, Merging this with the { 0 set { 0 }. The statement i++ } at the loop entrance yields { 0, l }. The statement i ++ now turns this into the set { 0, l , 2 },and so on, and the process never terminates. The formal requirements tobe imposed on the property examined have been analyzed by Wegbreit (1975); the precise requirements are fairly complicated, but in practice it is usually not difficult to see if a certain property can be determined. It is evident that the property ‘the complete set of possible values’ of a variable cannot be determined at compile time in all cases. A good approximation is ‘a set of at most two values, or any value’. The set of two values allows a source language variable that is used as a Boolean to be recognized in a language that does not feature Booleans. If we use this property in the analysis of the code in Figure 3.46, we find the property values { 0 }, { 0, 1 }, and ‘any value’ for i. Thislastpropertyvaluedoesnotchangeanymore,and the processter-
minates. A sophisticated treatment of generalized constant propagation, both intraprocedur~l and interprocedural, is given by Verbrugge, Go and Hendren (1996), with special attentio to convergence. See Exercise 3.21 for an analysis of constant propagation by symbolic interpretation and by data-flow equations. ”
~ n ~ l y sattaches i§ to each use of a variable V pointers to all the places where the of V has due of ‘vcould have comefrom; these are the last places where the value been defined before arriving at this use of V along any path in the Hence the term ‘last def‘, short for ‘last definition’. It is also called re ~ n ~ l y § i The § . word‘definition’ is usedhereratherthan‘assignment, other language constructs besidesassignmentsthatcausethevalue of a variabletobe changed: a variable can be passed as an OUT parameter to a routine, it can occur in a rea statement in some languages, its address can have been taken, turned into a pointer and n definition of the value under that or a similar pointer can take place, etc. All these rank as ‘definitions’ A definition of a variable V in a node n is said to reach a node p where V is used, if there is a path through the control flow graph on which the value of V is not redefined. This explains the name ‘reaching definitions analysis’ :the definitions reachin are determined. Last-def i n f o ~ a t i o nis useful for code generation, in particularfor register allocation. The infomation can be obtained by full symbolic interpretation, as follows. A set of last defs is kept for each variable V inthe stack representation. If an a s s i g n ~ e ~tot V is encountered at a node n, the set is replaced by the singleton {n } ; if two stack re~resentations are merged, for example in an end-if node, the union of the sets is formed, an pagated as thenewlast-def i n f o ~ a t i o nof V. Similar rules apply for loopsand flow-of-control constructs. Fullsymbolicinterpretation is required since last-definformationviolatesrequirement 4 above: going through a loop body for the first time, we may not have seen all lastdefs yet, since an assignment to a variable V at the end of a loop body may be part of the last-def set in the use of V at the beginning of the loop body, and actions taken onins~fficient information do not make later actions superfluous.
Data-flow equations area half-way automation of full symbolic interpretation, in which stack representation is replaced by a collection of sets, the semanticsof a node is describe more formally, andtheinterpretation is replaced by a built-inandfixedpropa mechanism. Two set variables are associated with each node N in the control flow graph, thei ~the)outputset OUT(N). Together they replace the stac set I ~ (and both start off empty and are computed by the propagation mecha~ism. two constant sets ~ ~ Nand( KILL(N) ~ ) are defined, which describe the sem~nticsof th
node. “heir contents are derived from the i n f o ~ a t i o nin the node. The IN and OUT sets contai~static information about the run-time situation at the node; examples are ‘Variable x is equal to 1 here’, ‘There has not been a remote procedure call in any path from the routine entry to here’, ‘Definitions for the variable y reach here from nodes N I and N Z ’ ,and lobal variable l ine-count has been modified since routine entry’. We see that the sets can contain any i n f o ~ a t i o nthat the stack representations in symbolic inte~retation can contain, and other bits of information as well. Since the inte~retationmechanism is missing in the data-flow approach, nodes whose se~anticsmodify the stack size are not handled easily in setting up the data-flow equaPrime examples are the nodes occurring in expressions: a node + will remove two S from the stack and then push one entry onto it. There is no reasonable way to express this in the data-flow equations. The practical solution to this problem is to combine groups of control flow nodes into single data-flow nodes, such that the data-flow nodes have no net stack effect. The most obvious exaxnple is the assignment, which consists of a control flowgraphresultingfrom the sourceexpression,avariablenode represent in^ the destination, and the assignment node itself. For data-flow equation purposes this entire set of control flow nodes is considered a single node, with one IN, OUT, ~ ~andNKILL, set, Figure 3.48a showsthe control flow graph of the assignment x :=y + 3;Figure 3.4% shows the assignment as single a node. 9
An a s s i g ~ e n as t a full control flow graph and as a single node.
Tra~itionally,IN and OUT sets are defined only at the beginnings and ends of basic blocks, and data-flow equations are used only to connect the output conditions of basic blocks to the input conditionsof other basic blocks. (Abasic block is a sequence of assign~ e ~with t s the flow of control enteringat the beginning of the first assignment and leaving theendofthelastas ent; basicblocksaretreatedmoreextensivelyinSection 4.2.5.) ntmechanism is usedtocombinetheinformationabout the inside the basic block, andsince that mechanism has to deal with assignments
only, it can be simpler than general data-flow equations. Any such mechanism is, however, a simplification of or equivalent to the data-flow equation mechanism, and any combination of information about the assignments can be expressed in IN, OUT, GEN, and KILL sets. We will therefore use the more general approach here and consider the AST as the unit of data-flow information specification. node rather than the basic block
N at run time, the state of the program is probably When control passes through node changed. This change corresponds at compile time to the removal of some i n f o ~ a t i o n items from the set at N and the addition of some other items. It is convenient to keep these two sets separated. The set K I ~ ~ (contains N) the items removedby the node N and the set GEN(N) contains the items added by the node. A typical example of an information item in a GEN set is ‘Variable x is equal to variable y here’ for the assignment node x :=y. The same node has the item ‘Variable x is equal to * here’ in its KILL set, which is actuare used will be ally a finite representation of an infinite set of items. How such items shown in the next paragraph. The actualdata-flowequationsarethesame for all nodesandareshowninFigure 3.49.
IN(N) =
v
Mdynamic predecessor of N
OUT(M)
OUT(N) = (I~(N) \ ~ ~ ~ ~U(G E N ~)( N)) Figure 3.49 Data-flow equations for a node N.
The first equation tells us that the information at the entrance to a node N is equal to the union of the i n f o ~ a ~ oat nthe exitof all dynamic predecessorsof N, This is obviously true, since no information is lost going from the end of a predecessor of a node to that node itself. More colorful n m e s for this union are themeet or join operator. The second equation means thatthe information at the exitof a node N is in principle equal to that at the entrance, except that all information in the KILL set has been removed from it and all i n f o ~ a t i o nfrom the GEN set has been added to it. The order of removing and adding is important: first the information being invalidated must be removed, then the new i n f o ~ a t i o nmust be added. Suppose, for example, we arrive at a node x :=ywith the IN set { ‘Variable x is equal to 0 here’ }. The KILL, set of the node contains the item ‘Variable x is equal to *here’, the GEN set contains ‘Variable x is equal to y here’. First, all items in the IN set that are also in the KILL, set are erased. The item ‘Variable x is equal to * here’ represents an infinite number of items, including ‘Variable x is equal to 0 here’, so this item is erased. Next, the items from the GEN set are added; there is only one item there, ‘Variable x is equal to y here’. So the OUT set is { ‘Variable x is equal to y here’ }. The data-flow equations from Figure 3.49 seem to imply that the sets are just normal
sets and that the U symbol and the\symbol represent the usualset union and set difference operations, but the above explanation already suggests otherwise. Indeed the U and \ symbols should be read more properly as information union andi n f o ~ a t i o ndifference operators, and their exact workings depend very much on the kind of information they process. For example, if the information items are of the form ‘Variable V may be uninitialized here’, the U in the first data-flow equation can be interpreted as aset union, since V can be uninitialized at a given node N if it can be uninitializedat the exit of even oneof N‘s predecessors. But if the information items say ‘Variable V is guaranteed to have a value here’, the U operatormust be interpretedas set intersection, since for the value of V to be guaranteed at node N it must be guaranteed at the exits of aEE its predecessors. And merging infor~ationitems of the type ‘The value of variable x lies between i and j’ requires special code that has little to do with set unions. Still, it is often possible to choose the semantics of the information items so that U can be implemented as set union and \ as set difference, as shown below. We shall therefore stick to the traditional notation of Figure 3.49. For a more liberal interpretation see Morel (1984), who incorporates the different meanings of information union and information difference in a single theory, extendsit to global optimi~ation,and applies it to suppress some run-time checks in Ada. There is a third data-flow equation in addition to the two shown in Figure 3.49 although the term‘zerothdata-flowequation’wouldprobably be moreappropriate. It defines the IN set of the first node of the routine as the setof infomation items established by the parameters of the routine. More in particular, each IN and INOUT parameter gives rise to an item ‘Parameter Pi has a value here’. It is convenient to add control flow arrows from all return statements in the routine to the end node of the routine, and to make the OUT sets of the return statements, which are normally empty, equal to their IN sets. The KILL set of the end node contains any item concerned with variables local to the routine. This way the routine has one entry point and one exit point, and all information valid at routine exit is collected in theOUT set of the end node; see Figure3.50. This streamlining of the external aspects of the data flow of a routine is helpful in interprocedural data-flow analysis, as wewill see below. The combining, sifting, and adding of information items described above may look cumbersome,buttechniques exist tocreatevery efficient implementations.Inpractice, most of the i n f o ~ a t i o nitems are Boolean in nature: ‘Variable x has been given a value here’ is anexample. Such itemscanbestoredinonebit each, packedefficientlyin machine words, and m~ipulatedusing Boolean instructions. This approach leads to an extremely efficient implementation, an exampleof which we willsee below. Morecomplicateditemsaremanipulatedusingad-hoccode.If it is, for example, decided that information items of the type ‘Variable x has a value in the range M to N here’ are required, data representations for such items in the sets and for the ranges they refer to must be designed, and data-flow code must be written that knows how to create, merge, and examine such ranges. So, usually the IN, OUT, KILL, and GEN sets contain bit setsthat are manipulatedbyBooleanmachine inst~ctions,and,inadditiontothese., perhaps some ad-hoc items thatare manipulated by ad-hoc code.
d, IN :=a11 vdue parameters have values
I \
\ \
\ \ \
\
0 Data-flow details at routine entry and exit.
S
The first data-flow equation tells us how to obtain the IN set of all nodes when we h o w the OUT sets of all nodes, and the second data-flow equation tells us how to obtain the OUT set of a node if we know its IN set (and its GEN and KILL sets, but they are constants). This suggests the almost trivial closure algorithm for establishing the values of all IN and OUT sets shown in Figure 3.5 1, The closure algorithm can be implemented by traversing the control graph repeatedly and computi~gthe IN and OUT sets of the nodes visited. Once we have performeda comIN or OUT set changed, we have plete traversal of the control flow graph in which no found the solution to the set of equations. We then know the values of the IN sets of all nodes and can use thisi n f o ~ a t i o nfor context checking andcode generation. Note that the predecessors of a node are easy to findif the control graphis doubly-linked, as described in Section 32.1 and shown in Figure 3.42. thr~u~ anhif-statement, Figures 3.52 rough Figure 3.54 showd~ta-flow ~ro~agation using bit patterns to represent the information. The meanings of the bits shown in Fighosen so that the information union in the data-flow equations can be olean OR, and the information difference as a set difference; the set difference is in turn implelnented as a oolean AND NOT. The initialization status of a variable is coded in two bits; thefirst means 'may be uninitialized,, the second means 'may be initialized'. Figure 3.53 gives examples of their application. For example, if the first bit is on and the secondis off, the possibility of being uninitialized is left open but the possibility of being initialized is excluded; so the variable is g u ~ ~ t e to e dbe uninitialized.
ta ~ e ~ n i t i o n s : l. Constant KILL and GEN sets for each node. 2. VariableIN and OUT sets for each node.
l . The IN set of the top node is initialized with information established externally. 2, For all other nodesN, IN(N) and OUT(N) are set to empty.
~ ~ e r e nrules: ce l. For any nodeN, IN(N) should contain ~
~
U
~predecessor n a of ~ N
OUT(^)* ~
c
2. For any nodeN, OUT(N) should contain (IN(N) \~ILL(N))U G ~ N ( N ) . Figure 3.51 Closure algorithm for solving the data-flow equations.
This corresponds to the status Uninit ialized in Section 3.2.2.1. Note that the negation of ‘may be initialized’ is not ‘may be uninitialized’ nor ‘may not be initialized’ -it is ‘cannot be initialized’; trivalent logic is not easily expressed in natural language. If both bits are on, both possibilities are present; this corresponds to the status May be ini ti l izaed in Section 3.2.2.1. Both bitscannot be off at the same time:it cannot be that it is impossible for the variable to be uninitialized and also i~possibleto be initialized at the same time; or put more simply, there is no fourth possibility in trivalent logic.
T T T ~
y may be initialized
l , , “ - y may be uninitialized
x may be initialized x may be uni~itiali~~d
Figure 3.52 Bit patterns for properties of the variables x and y
Figure 3.54 shows how the bits andthe information they carry are propagated through both branchesof the if-statement
x is guaranteed to havea value y may or may not have a value
x is guaranteed not to havea value the combination00 for y is an error
igure 3.53 Examples of bit patterns for propertiesof tbe variables x and y .
if y =.
0
then x := y else y :=
0
end if
Admittedly it is hard to think of a program in which this statement would occur, since it does not have any reasonable effect, but examples that are both illustrative and reasonable are much larger. We assume that x is uninitialized at the entry to this statement and thaty is initialized. So the bit pattern at entry is 1.0 0 1. Since the decision node does not affect either variable, this patternis still the same at the exit. When the first data-flow equation is used to construct theIN set of x :=y,it combines the sets from all the predecessorsof this node, of which there is only one, the decision node. So the IN set of x :=y is again 1 0 0 1. Its KILL and GEN sets reflect the fact that the node represents an assignment to x; it also uses y, but that usage does not affectthe bits for y. So its KILL set is 1 0 0 0, which tells us to remove the possibility that x is uninitialized, and does not affect y; and its GEN set is 0 1.00, which tells us to add the possibility thatx is initialized. Using the second data-flow OUT set of the node, 0 1 0 l, in which both x and y are equation, they yield the new guara~teedto be initiali%ed. Similar but slightly different things happen in the right branch,since there the assignment is to y. The first data-flow equation for the end-if node requiresus to combine the bit patterns at all its predecessors. The final result is the bit pattern 11 0 1, which says that x may or may not be i ~ i t i ~ i z and e d that y is initialized. The above description assumes that we visit all the nodes by traversing the control flow graph, much in the same way aswe did in symbolic interpretation, but it is important to note that this is in no way necessary and is useful for efficiency only. Since all actions are purely local, we can visit the nodes in any p e ~ u t a t i o nwe like, as long as we stick to the rule that we repeat our visits until nothing changes any more. Still, since the data-flow equations transport information in the direction of the control flow, it is convenient to follow the latter. Note that the data-flow algorithm in itselfcollects i n f o ~ a t i o nonly. It does no checkingandgivesnoerrormessages or warnings. A subsequenttraversal, or morelikely several subsequent traversals are needed to utilize the info~ation.One such traversal can check for the use of uninitialized variables. Suppose the if-statement in Figure 3.54 is followed by a node 2; :=x;the traversal visiting this node will then find the IN set to be the bit pattern 1.10 1, the first two bits of which mean that x may or may not be initialized. Since
c Figure 3.54 Data-flow propagation through an if-state~ent,
the node uses the value of x,a message saying something like ‘Variable x may not have a value in assignment z :=x’can be issued.
Inter~rocedura~ data- ow is the data-flow between routines, as opposed to that inside routines. Such data flows in two directions, from the caller to the callee in a routine call, and from callee to caller in a return statement. The resulting i n f o ~ a t i o nseldom serves context checking andis mostly usefulfor o p t i ~ ~ a t i purposes. on Symbolic interpretation can handle both kinds of infor~ation.One can collect information about the parametersof all calls to a given routineR by extracting it from the stack re~resentationsat the calls. This i n f o ~ a t i o ncan then be used to set the stack representations of the IN and INOIJT parameters of R, and carried into the routine by symbolic interpretation of R. A useful piece of i n f o ~ a t i o nuncoveredbycombiningthestack representations at all calls to R could, for example, be that its second parameter alwayshas
the value 0. It is almost certain that this information can be used in the symbolic interpretation of R, to simplify the code generated for R. In fact, R can be instantiated for the case that its second parameteris 0. Now one might wonder why a p r o g r a ~ ~ would er endow a routine with a parameter and then always supplythe same value for it, and whether it is reasonable for the compiler writer to spend effort to detect such cases. Actually, there are two good reasons why such a situation might arise. First, the routine may have been written for a more general application and be reused in the present sourcecode in a more restricted context. The second is that the routine may have served abstraction only andis called only once. About the only i n f o ~ a t i o nthat can be passed backwards from the called routine to the caller by symbolic interpretationis that an INOUT or OUT parameter is always set to a given value, but thisis less probable. The same techniques can be applied when processing data-flow equations. Routines usually have a unique entry node, and the set-up shown in Figure 3.50 provides each routine with a unique exit node. Collected i n f o ~ a t i o nfrom the IN sets of all calls can be entered as theIN set of the entry node, and theOUT set of the exit node canbe returned as the OUT set of the calls. I n f o ~ a t i o nabout global variables is especially interesting in this case. If, for example, an information item ‘No global variable has been read or written’ is entered in the IN set of the entry node of a routine R and it survives until its exit node, we seem to have shown that R has no side effects and that its result depends exclusively on its parameters. But our conclusion is only correct if the same analysis is also done for all routines called directly or indirectly by R and the results are fed back to R. If one of the routines does access a global variable, the i n f o ~ a t i o nitem will not show up in its OUT set of the exit node, and if we feed back the results to thecaller and repeat this process, eventuallyit will disappear from the OUT sets of the exit nodes of all routines that directly or indirectly access a global variable. One problem with interprocedural data-flow analysis is that we may not know which routine is being called in a given call. For example, the call may invoke a routine under a pointer, or a virtual function in an object-oriented language; the first type of call is also known as an ‘indirect routine call’. In both cases, the call can invoke any of a set of routines, rather than one specific routine. We will call this set the ‘candidate set’; the smaller the candidate set, the better the quality of the data-flow analysis will be. In the case of an indirect call to a routine of type T, it is a safe approach to assume the candidateset to conlistaof all tain the set of all routines of type T, but often we can do better: if we can obtain routines of type T whose addresses are ever takenby the program, we can restrict the candidate set to these. The candidate set for a call to a virtual function V is the set of all functions that override V. In both cases, symbolic execution may be able to restrict the candidate set even further. A second problem with interprocedural data-flow analysis is that it works best when we have all control flow graphs of all routines in the entire program at our disposal; only then are we certain that we see all calls to a given routine. Having all control flow graphs available at the same moment, however, conflicts with separate compilationof modules or packages. After all, the point in separate compilation is that only a small part of the program needs to be available. Also, the control flow graphsof libraries are usually not avail-
able. Both problems can be solved to a certain extentby having the compiler producefiles with control flowgraphinformationinadditiontotheusualcompileroutput.Most libraries do not contain calls to user progrms, which reduces the problem, but some do: a memory allocation package might, for example, call a user routine Report insuff i cient memory when it runs irreparably out of memory.
Both symbolic interpretation and data-flow equations follow information as it flows ‘forward’ through the control flow graph; they collect i n f o ~ a t i o nfrom the preceding nodes andcandeposit it at the present node. Mathematically speaking, this statement is nonsense, since there is no concept of ‘forward’ in a graph: one can easily run incircles. Still, control flow graphs are a special kind of graph in that they have one specific entry node and one specificexit node; this does give them a general notion of direction. There are some items of interest that can be d e t e ~ i n e dbest (or only) by following the control flow backwards. One prominent exampleof such infor~ationis the liveness of variables. A variable is live at a given nodeN in the control flow graph if the value it holds is used on at least one path further through the control flow graph from M, otherwise it is dead. Note that we are concerned with the use of a particular value of V , not with the use of the variable V itself. As a result, a variable can have more than one live range, each a node at which the value starting at anassign~entof a value to the variable and ending at is used for the last time. During code generation it is important to know if a variableis live or dead at a given node in the code, since if it is dead, the memory allocated to it can be reused. This is especially important if the variable resides in a register, since from that given node on, the register can be used for other purposes. For another application, suppose that the variable contains a pointer and that the compiled program uses garbage collection in its memory m ~ a g e ~ e n Itt . is then advantageous to generate code that assigns a null pointer to the variable as soon as it becomes dead, since this may allow the garbage collector to free the memory the original pointer pointed to. The start of the live range of a variable V is marked by a node that contains a definition of V, where‘definitjon’ is usedin the sense of defining V‘s value,asinSection 3.2.2.3. The end of the live range is marked by a node that contains the last use of the value of V, in the sense that on no path from that node will the value be used again. The problem is that this node is hard to recognize, since there is nothing special about it. We only know that a node contains the last useof the value of V if on all paths from that node we either reach the endof the scope of V or meet the startof another live range. I n f o ~ a t i o nabout the future use of variable values cannot be obtained in a straightforward way using the above methods of symbolic interpretation or data-flow equations. Fortunately, the methods can be modified so they can solve this and other ‘backward flow’ problemsand we willdiscussthesemodificationsinthefollowingtwosections.We demonstrate the techniques usingthe C code segment from Figure 3.55. The assignments x = ...and y = ...define the values of x and y; the p r i n t statements use the values of the variables shown. Code fragments indicated by ...do not define any values and are subjectto the restrictions shown in the accompanying comments.
{
int x; x = ...; if ( . . . I {
/* code fragment 1, does not use x */
...
print (x);
... } else {
int y;
...
print
...
(x) ;
y = ...;
...
print (y);
1
. . I
x = ...; ... print (x);
1
...
/* code fragment 2, does not usex */ /* code fragment 3 , uses x */ /* code fragment 4, does not usex */ /* code /* code /* code /* code /* code /* code /* code
fragment 5, fragment 6, fragment 7, fragment 8, fragment 9, fragment 10, fragment 11,
does uses does does does uses does
not usex,y */ x, but not y */ not usex,y */ not usex,y */ not usex,y */ y but not x */ not usex,y */
/* code fragment 12, does not usex */ /* code fragment 13, does not usex */ /* code fragment 14, uses x */ /* code fragment 15, does not usex */
Figure 3.55 A segment of C code to demonstrate live analysis.
Where such a restriction is attached to an assignment, it concerns the expression in the assignment.
y ~ y m ~ o linter~retation ic Since symbolic interpretation follows the control flow graph, it hasnoway of looking ahead and finding outif there is another use of the value of a given variable V, and so it has no way to set some 1 S l a s t u s e of V attribute of the node it is visiting. The general solution to this kindof problem is to collect the addresses of the values we cannot compute t c hand the and to fill them in when we can. Such lists of addresses are called ~ a c ~ ~ alists activity of filling in values whenthe time is ripe is called ~ a c ~ ~ a t c h i ~ ~ . Inthis case backpatchingmeansthat for each variable V wekeepinourstack representation a set of pointers to nodes that contain the latest, most recent uses of the value of V; note that when looking backwards from a node we can have more than one most recent use, provided they are along different paths. Now, when we arrive at a node that uses the value of V, we set the attributes IS l a s t u s e of V of the nodes in the backpatch list for V to fa1 se and set the same attributeof the present node tot r u e . The rationale is that we assume that each use is the last use, until we are proven wrong by a subsequent use. It is in the nature of backpatching that both the pointer sets andthe attributes referred to by the pointers in these sets change as the algorithm progresses. We will therefore sup-
ply a few snapshots to demonstrate the algorithm. Part of the control flow graph for the block from Figure 3.55 with live analysis using backpatch lists for the first few nodes is given in Figure 3.56. It shows an att~buteLu-x for ‘Last Use of x’ in node 1; this attribute has been set to t r u e , since we know of no later use yet. The stack representation contains a variable BPLU-x for ‘ tbe Last Use of X’,Initially its value is the empty set, but when the symbolicinte~retation passes node 1, it is set equal to a singleton containing a pointer to node 1, For the moment we follow the ‘true’ branch of the if-statement, still carrying the variable BPLU-x in our stack representation.When the symbolicinterpretation reaches the node p r i n t (x) (node 3) and finds a new use of the variable x,the attribute Lu-x of the node under the pointer in BPLU-x (node 1) is set to fa1 se and subsequently BPLU-x itself is set equal to the singleton{3 }. The new situation is depicted in Figure 3.57.
l
Figure 3.56 The first few stepsin live analysis for Figure3.55 using backpatch lists.
Figure 3.58 shows the situation after the symbolic inte~retationhas also finished the ‘false’ branch of the if-statement. It has entered the ‘false’ branch with a second copy of thestackrepresentation,the first onehavinggonedownthe‘true’branch.Passing the node p r i n t (x+3 ) (node 6) has caused the Lu-x attribute of node 1 to be set to fa1 se for the second time. Furthermore, the Lu-y attributes of nodes 8 and 10 have been set in a fashion similar to that of nodes 1, 2, and 3, using a backpatch list BPLU-y. The two stack representations merge at the end-if node, which results in BPLU-x now holding a set of two pointers, {3, 6 } and in BPLU-y being removed due to leaving the scope of y. If we were now to find another use of x,the Lu-x attributes of both nodes 3 and 6 would be cleared. But the next node, node 12, is an assignment tox that does not usex in the source
LU-x =
LU-x
7 Live analysis for Figure3.55, after a few steps.
expression. So thestackrepresentationvariable BPLU-x getsresetto {12}, andthe LU-x attributes of nodes 3 and 6 remain set to t r u e , thus signaling two last uses. The rest of the process is s t r a i g h t f o ~ ~and d is not shown here. Live analysis in the presence of jumps can be performed by the same technique as used in Section 3.2.2.2 for checking the use of uninitialized variables in the presence of jumps. Suppose there is a label r e t r y at the if-statement, just after code fragment 1,and suppose code fragment l 1 ends in got o r e t r y ;. The stack representation kept for the label r e t r y will contain an entry BPLU-x,which on thefirst pass will be set to {l}as in Figure3.56.Whenthesymbolicexecution reaches the g o t o r e t r y ;,thevalue of BPLU-x at node 3 l will be merged into that in the stack representation for r e t r y , thus setting I3PLU-x to {1, 6 }. A second round through the algorithm will carry this value to nodes 3 and 6, and finding new usesof x at these nodes will causethe algorithm to set the LU-x attributes of nodes 1 and 6 to f a1 se. So the algorithm correctly determines that the use of x in node 6 is no longer the last use.
I n f o ~ a t i o nabout what happens further on in the control flow graph can only be obtained by following the graph backwards. To this end we need a backwards-operating version of the data-flow equations, as shown in Figure 3.59. Basically, the sets IN and OUT have changed roles and ‘predecessor’ has changed to ‘successor’. Note that the KILL information still bas to be erased first, before the GEN information is merged in, Nodes that assign a value to V have KILL = { ‘Vis live here’ }
LU-x =
LU-x
true
false
BPLU-X =
(31
true
1
BPLEf
=
Figure 3.58 Live analysis for Figure 3.55, merging at the end-if node.
and an empty GEN set, and nodes that use the value of V have an empty KILL set and a GENset { ‘Vis live here’ }. The control flow graph for the block from Figure 3.55 with live analysis using backward data-flow equations is given in Figure 3.60. We implement the OUT and IN sets as two bits, the first one meaning ‘xis live here’ and the second meaning ‘y is live here’. To get a unifom representatio~of the information sets, we m ~ n t a i na bit for each variable declared in the routine even in nodes where the variable doesexist. not We start by setting the bits in the OUT set of the bottommost node in Figure 3.60 to
Figure 3.59 Backwards data-flow equationsfor a node N.
00: the first bit is 0 because x certainly is not live at the end of its scope, and the second bit is 0 because y does not exist there. Following the control flow graph backwards, we find that the first change comes at the node for p r i n t (x*x>. This node uses x,so its GEN set contains the item ‘xis live here’, or, in bits, 10. Applying the second data-flow equation shown above, we find that the IN set of this node becomes 10. The next node that effects a change is an assignment to x; this makes its GEN set empty, and its KILL set contains the item ‘xis live here’. After application of the second data-flow equation, the IN set is 00, as shown in Figure 3.60. con ti nu in^ this way, we propagate the bits upwards, splitting them at the end-if node and merging themat the if-node, until we reach the beginning of the block with thebits 00. Several observations are inorder here. The union in the data-flow equations canbe implemented as a simple BooleanOR, since if a variable is live along one pathfrom a nodeit is live at that node. Normally, when we reach the top node of the scope of a variable, its liveness in the IN set is equal to 0, since a variable should notbe live at the very top of its scope; if we find its liveness to be 1, its value may be used before it has been set, and a warning message can be given. The nodes with thelast use of the value of a variable V can be recognized by having a 1 for the liveness of V in the IN set and a 0 in the OUT set. If we find a node with an assignment to a variable V and the liveness of V is 0 both in the IN set and the OUT set, the assignment can be deleted, since no node is going to usethe result. Note that the source expression in the a s s i g ~ e ncan t only be deleted simultaneously if we can proveit bas no sideeffects. It is i m p o r t ~ to t note that the diagram does not contain the bit combination 11, so there is no node at which both variables are live. This means that they can share the same register or memory location. Indeed, the live range of x in the right branch of the ifstatement stops at the statement p r i n t (x+3) and does not overlap thelive range of y.
It is interesting to compare the merits of symbolic interpretation and data-flow equations. Symbolic inte~retationis more intuitive and appealing in simple cases (well-s~ctured flow graphs); data-flow equations can retrieve more i n f o ~ a t i o nand can handle complicatedcases(‘bad’flowgraphs)better.Symbolicinterpretationcanhandletheflow of
Figure 3.60 Live analysis for Figure3.55 using backward data-flow equations.
information inside expressionsmoreeasilythandata-flowequationscan.Symbolic interpretation fits in nicely with narrow compilers and L-attributed g r ~ a r s since , only slightly more is needed than whatis already available in the nodes fromthe top to the node where the processing is going on. Data-flow equations require the entire flow graph to be in memory (which is why it can handle complicated cases) and require all attributes to be presentin all nodes(which is why it collects more info~ation). Althoughsymbolic r ~ graphs7the algorithm begins to approach that of inte~retationcan handle ~ b i t flow data-flow equations andloses much of its attraction.
This concludes our discussion of context-handling methods. We have seen that they serve to annotate the AST of the source program with information that can be obtainedby combining bits of i n f o ~ a t i o nfrom arbitrarily far away in the AST, in short, from the context. These annotations are required for checking the contextual correctness of the source program andfor the intermediate and low-levelcode generation processes that mustfollow. The automatedcontext-handlingmethodsarebasedonattributegrammars,which comemainlyinthreeflavors:multi-visitordered,L-attributed,and S-attri~uted. The manual context-handling methods are based on abstract symbolic inte~retation,and come mainlyinthreevariants: full symbolic inte~retationon thestack,simplesymbolic inte~retation,and data-flow equations.
Lexicalanalysisestablisheslocalrelationshipsbetween characters, syntaxanalysis establishes nesting relationships between tokens, and context handling establishes longrange relations~psbetween AST nodes. Conceptually, the data about these long-range relationships is stored in the attributes of the nodes; implementation-wise, part of it may be stored in symbol tables and other tables. All context handling is based on a data-flow machine and all context-handling techniques are ways to implementthat data-flow machine. The starting i n f o ~ a t i o nfor context handlin is the AST and the classes and representations of the tokens that are its leaves. Context handlers can be writtenby hand or generated automatically from attribute grammars Each non-terminal and terminal in an attribute g r a ~ a has r its own specific set of formal attributes. A formal attribute is a named property. An (actual) attribute is a named property and its value; it is a (name, value) pair. *
-Each node for a non-terminal and terminal
S in an AST has the formal attributes of S; their values may and usually will differ. -With each production rule for S, a set of attribute evaluation rules is associated, which set the synthesized attributesof S and the inherited attributesof S’s children, while using the inherited attributesof S and the synthesized attributesof S’s children. -The attribute evaluation rules of a production rule P for S determine data dependencies between the attributes of S and those of the children of P. These data dependencies can be represented ina dependency graphfor P. -The inherited attributes correspond to input parameters and the synthesized attributes to output parameters-but they neednot becomputed in that order. -Given an AST, the attributerules allow us to compute more and more attributes, starting from the attributes of the tokens, until all attributes have been computedor a loop in the attribute dependencies has been detected. -A naive way of implementing the attribute evaluation process is to visit all nodes repeatedly and execute at each visit the attribute rules that have the property that the attribute values they use are available and the attribute values they set are not yet available. This is dynamic attribute evaluation. -Dynamic attribute evaluation is inefficient and its naive implementation does not terminate if there is a cycle in the attribute dependencies. -Static attribute evaluation determines the attribute evaluation order of any AST at compiler construction time, rather thanat compiler run time. It is efficient and detects cycles at compiler generation time, butis more complicated. -Static attribute evaluation order d e t e ~ n a t i o nis based on IS-SI graphs and late evaluation by topological sort. All these properties of the attribute grammar can be d e t e ~ n e d at compiler construction time. -The nodes in the IS-SI graph of a non-terminal N are the attributes of N,and the arrows in it represent the s u m m ~ z e ddata dependencies between them. The mows are summaries of all data dependencies that can result from any tree in which a node for N occurs. The important point is that this s u m a r y can be determined at compiler construction time, long before any AST is actually constructed. -The IS-SI graph of N depends on the dependency graphs of the production rules in which N occursandthe IS-SI graphs of the other non-terminalsintheseproduction rules. This defines recurrency relations between all IS-SI graphs in the g r ~ a r ?”he . recurrency relations are solvedby transitive closure to determineall IS-SI graphs. -If there is an evaluation cycle in the a t ~ b u t e g r a ~ mana r attribute , will depend on itself, and at least one of the IS-SI graphs willexhibit a cycle. This provides cycle detection at compiler construction time; it allows avoiding c o n s ~ c t i n gcompilers that will loop on some programs. -A multi-visit attribute evaluator visits a node for non-terminal N one or more times; the number is fixed at compiler const~ctiontime. At the start of the i-th visit, some inherited attributes have been freshly set, the set INi; at the end some synthesized attributes have been freshly set, the set SNi. This defines an attribute partitioning ( I ~ ~ , S N ~ ) ~ = ~ for each non-ter~inalN, leading to an n-visit. The INi together comprise all inherited attributes of N, the SN,all synthesized attributes. I
e following items, i~dicatewhether it belon e of a ~ o n - t e ~ i ~ ~ l .
S
to a n o n - t e ~ i n a or l to a
d and inherited attri~utesis presented as a r e ~ ~ i r ~ ~ e n t of this chapter. Explore what happen^ when
ogical sort a l ~ o ~ of t h igure ~ 3.16 whenthere odify the a ~ g o r i t hso~th
is a
er the a t t r i ~ ~ gt e r a of~Figure ~ 3.61. ~ ~ o n ~ tthe ~ cI t-SI graph of A and r a ~ ~contains a r a cycle.
S(SYN S ) "+ A(i1, sl) ~ T T R ~ ~ URULES: TE SET il TO sl; SET S TO Sl;
A(TNH il, SYN sl) "+ A(i2, s2) 'a' ATTRIBUTE RULES: SET i2 TO il; SET sl TO s2;
1
B(i2, s2) ATTRI~UTERULES: SET i2 TO il; SET sl TO s2;
~ ( ~ i, N H SYN S ) -+ 'b' ATTRIBUTE RULES: SETS TO i; for Exercise 3.4.
1 Attribu~~ ~~~~~~
d
S
for T and U.
for the i f - ' - s t ~ t e ~ ~ ~ t .
x
Example program for a very busy expression.
Largely language-
and paradigm-incJependent
: La1~guag~and : : ~aradigIn- : : d~~end~nt
he status of the ~ ~ i m o~ ~~u lin es scorn
ind~~~nde~t
The interpreter has to manipulatedata values defined in the program ypes and sizes of these values are not known at the time the makes it n e c e s s ~ yto implement these values in the interpreter s that specify the type of the ~ n - t i value, ~ e its size, inter to such a record can then serve as ‘the value’ duri t1on.
e 4.2 shows a value of type ~Omplex-Numberas the pro3 shows a possible representation of the same value in a recurs that c o ~ e s ~ o ntodrun-time values are marked with a V in the em is self-identifying through its type field. The data representathe val~e-speci~ic part and the part that is common to all values umber. The first partprovidestheactualvalues,which v from entity to entity, the second part describes the typeof the value, which is the same for all values of type ~Omplex-N~mber. These data structures will containadditional infor~ sline an actual interpreter, specifyi~ for example source p r o g r a ~file n a ~ and n u ~ ~ b ate ~which s the valueor the type ori~inated.
A value of type ~ o r n ~ l e x - ~ ~ rasnthe b e~~ r ~ ~ rsees a it. ~ ~ e r
ointer is part o f the value and should never be copied separately: if a copy is re uired, tbe e ~ t i r erecord must be copied, and the poi~terto the result is the new value. If recordcontainsothervalues,thesemust also copied. In a recursive inte~reter, ich is slow anyway, it i s probably worthwhi~eto ck to this representation even for the most basic values, as for example integers and Booleans. Doing so makes processin much more easy and uniform. ther i~portantfeature is the st i c ~ tit~is~used ; to direct theflow of control. Its p r i com~onent ~ ~ is themode of operation of theinte reter. This is an e n ~ ~ e r a t i ovalue; n its normal value is something like Normal mod rial flow of control, but other values are available, to indicate jumps, t ~ ~and s ,possibly other forms of flow of control. Its second compo~entis a value in the wider sense of the word, to supply more i ~ f o ~ a tabout i o ~the non-se~uentialflow of trol. This may be a value for the mode Return mode, an exception name plus pos t ion mode,and a label for themode Jump mode. The ain the file namean number of thetext in which and possibly other d info~~ation. hecks the status indicat~rafter each call to another routine, rry on. If the status indicator is Normal mode, the routine carries on erwise, it checks to see if the mode is one that it s~ouldhandle; if it is, it *
value :
next :
s e ~ t ~ t of i othe ~ value of
~ O ~ E Elaborate ~ ~ R E return with. T Result TO E ~ a l ~ a teex tatus .mode /= Normal MO ET Status .mode TO Return mode; ET Status .value TO Result;
rate if statement (If node) : SET Result TO Evaluate condition (If node .condition);
boolean. .value = True: te statement (If node .then part) ; .boolean .value= False: // Check if thereis an else-part at all: /= No node: IF If node .else part Elaborate statement (If node. e l s e part); = No node: ELSE If node .else part T Status .mode TO Normal mode; utline of a routine for r~cursivelyi n t ~ r p r ~ an ~ i if-state~eIlt. ~l~
computation. herea as in ecomputation we compute the results of a function F for all its possiarameters before the function F has ever been called in reality, in record the p~ametersand the result, and find the result by table lookup when a call for F with the same p ~ a ~ e t ecomes rs along again. In both cases we restrict ourselves to pure functions -functions whose results do not depend on external d that have no sideeffects. e usual implementation is such that the function remembers the values of the p ~ a ~ e t ite has ~ s been called with, together with the results it has yielded for them, using a hash table or some other efficient data stmcture. Upon each call, it checks to see if these p~ametershave already occurred before, andif so it immediately returns the stored answer. lue in a dynamically created datastmcture may not be as exing, but the point is that looking up an answer can be time, whereas the time neededfor evaluatin zation is especiallyvaluableinalgorithmson ies of nodes depend in^ on those of other nodes a ve frequent case. Such algorith~scan store the property of blis~ed9 ode itself, once it has been established by the algorithm. is needed again, it can be retrieved from the node rather histechniquecanoften turn analgorithmwith c o ~ p l e ~ iinto t y a linear one, as is shown, for exam
l e the instruction pointer in a CPU, except that it is set explicitly ointer is c o ~ p a r a ~ to rather than incrementedimplicit~y. re 4.5 shows the outline of the main loop of an iterative inte~reter.It contains a case statement that selects the proper code se ment for the active state~ent9 We see node, based on its type. One code se ment is shown, the one for if-st~t~ments. pler than the correspondi g recursive code in Figure 4.4: the condition code een evaluated since it precedes the if node in the threaded AST; it is not heck the type of the condition code since the full annotation has done full ing;andcallingthe i n t e ~ ~ e t efor r theproperbranch of theif-statement is the active-node pointer correctly. Code se ments for the other nodes areusuallequallystraightforward. t ~ c t u r e sinside an iterative interpreter resemble much more those insid com~iledp r o ~ r than a ~ those inside a recursive interpreter. There will be an m a y hold in^
7
WHILE Active node .type/= En of program type: SELECT Active node .type: CASE .., CASE If type: // We arrive here after the condition has been evaluated; // the Boolean result is on the working stack. T Value TO Pop working stack 0 ; IF Value .boolean .value= True: SET Active nodeTO Active node .true successor; = False: ELSE Value .boolean .value IF Active node .false successor /= No node: T Active node TO Active node .false successor; = No node: ELSE Active node .false successor SET Active node TO Active node .successor; CASE .., Sketch of themain loop of an iterative interpreter, showingthe code for ani f - s t a ~ ~ ~ e n t .
the global data of the source program, if the source lang~ageallows these. If the source
reter starts by thread-
a ~ o u nof t infor~ationfor a variable by storp ~ a l l e to l the m e ~ o r yarrays ~ a i ~ t a i n eby d the interpreter. Each byte in the shaarray holds ~ r o ~ e r toifethe ~ co~espondingbyte in the ~ e ~ o array. r y E x a ~ p l e sof
S,
6
.
byte belongs to a read-only array’, ‘This byte is part of the routine call linkage’, etc. 56 different values ~rovidedby one byte for this are usually enough but not ample’ The shadow data can be used for i n t e ~ r e t e r - t i checking, ~e for example to detect the
for types AST-node and Expression */ for Thread-ASTO and Thread-start */ for Push0 and Pop( ) */ forselfcheck */ /* PRIVATE */ static AST-node *Active-nodesointer;
#include "parser.h" thread.h" #include "stack.h" #include #include"backend. h"
/* /* /* /*
static void Interpret-iteratively(v0id) { while (Active-nodesointer != 0 ) { /* there is only one node type, Expression: */ Expression *expr = Active-node-pointer; switch (expr->type) { case 'D': Push (expr->value) ; break; case 'P' : { int e-left = P o p 0 ; int e-right = Pop 0 ; switch (expr->oper) { case Push(e-left -I- e-right); break; case Push(e-left * e-right); break; /-I-':
11
1
1
1
/ * l :
break;
Active-node-pointer
= Active-node-pointer->succ
printf ( ii%d\n" Pop (1 ) ; I
/* print the result */ /* PUBLIC */
void Process (AST-node *icode) { Thread-~ST(icode); Active-node-pointer Interpret-iteratively ( ) ;
= Thread-start;
1
.6 An iterative interpreter for the demo compiler of Section 1.2.
use of uninitiali~ed memo^, inco system data, and other misha~s,in contextchecking.Anadvanta when faster processingis desired, Some iterative inte~retersalso store sons for doing so, actually none of them in a single amay makes it e d more than once w i t ~ o ~ t A n o ~ e is r that a more corn putsthe s~ccessorof a omes ~ r o f i t a to ~ ~omit e t
t than r e c ~ ~ s interiv~
Storing the AST in an m a y (a) and as ~ s e u d ~ ~ i ~ s t ~ (b). ctions
omp pi la ti on produces object code from the i n t e r ~ e ~ i acode t e tree throu
. The basis of code generation is the systematic replac f the AST by target code segments,in such a way that the se It is followed by a l i n e ~ i ~ a t i ophase, n whichproduces a linear seq from the rewritten AST. The replaceme~tprocess is called 1; olled by the data-flow and ow-of-co~trol requi mental image of the gradual transformation fro whichat each stage the semantics r e ~ a i n s u n ~ h ~ n gconceptually, ed is a ~ o w e aid ~ uin~ rrectcodegenerators.Treerewritingis als ied in otherfileconversion example for the conversion of SGML and texts to displayable format ist tense^, 1998). As a demonstration of code ene era ti on by tree rewriting, suppose we have c o n s t r ~ c t ~ d the AST for the expression a := ( b [ 4 * ~+ dl
* 2) +
9;
integer variables and b is a byte array in memory. The AST is compiler has decided that the variablesa, c, and d are in that the may inde~ing operator [for byte arrays has been ~ e m o r yaccess mem. The AST in that situation is shown
I n t ~ r l ~ e d iCode a ~ e Generation and Register Allocation
. _-
===+
/\ a
*
/+\
+
*
9
2
b
R,
mem
l
/+\
l
9
2
+
Two ASTs for theexpression a := ( b[ 4 * c
+ dl *
2 ) I- 9 .
n the ~ a c h i side ~ e we assume that we have two machineinst~ctions: i ] ,C , Rd, which loads the address of the Ri-th e l e ~ e notf the array the size of the elements of M is C bytes; ("+R,) [R;] ,C , Rd, which loads the byte contentsof the Ri-th elem~nt M plus offset R, into Rd, where the other parameters have the same mean-
hese i~structio~s are re~resentativeof the Pentiurn instructions l e a l and rnovxbl. We represent these instructions in theform of ASTs as well, as shown in Figure 4.10. st replace the bottom right part of the original AST by Load-Byte ,obtained from the second instruction by equating M with b, R, with h 4, and using a tel~poraryregister Rt as the result register:
i'
i" +
mem
/\ R2I
l -+
M
C
/\
Two sample i n s t ~ ~ t i owith ~ s their A
t
/\ 9
/*\ R,
2
Load-Byte (b+Rd)[R,] 4 /R, I
Next we replace the top part by the i ~ s t ~ c t i oLoad n from the first i n s t ~ c t i oby~ e ~ ~ a Mt with i ~ 9 , R; with R R, which holds the variable a as the result re~ister:
Load-Address 9 [R,1 , 2 , R, Load-Byte
/
(b+R, 1 [R, 1, 4 R, I
Note that the fixed address of the ( ~ s e ~ ~ o - ) a r in r a th y Loa fied e ~ ~ l ias~9.i After t ~ ~ i ~ e a r i ~ a tthis i o n y i e ~ the ~ so
s t e affect ~ ~ the
tion than others.
resultsin expressions.Each o f thes rest~ctionscuts away a very la search space, thusmakingthe code eneration processmore m a n a ~ e clear that in each case we may lose o o ~ u ~ i t i for es o~ti~zation. ms existfor many combinations of rest~ctions, some o f them with t sophistication. We will discuss a repres sample inorithms Sections form,basic in their tion containsmany, many more algorith~s,often of a very ap~licationof the techni~uesdescri in thesesectionswill yield a reasonably opti~izingcompiler butnot more thanthat.production of a really enerator is a subject that could easily fill an entire book. S and is by ~ u c h n i (c1 997). ~ An extreme appli~ationof the first restriction is s~perco~pilation, in which the size o f the code to be translated is so severely l i ~ i t e dthat eshauStive searc This techni~ueand its r e m a r ~ a ~results le are discussed briefly in When ene era tin^ code, i t is often profitable to preprocess order to doefficiency-increasi program transfor~ations,an erated code to removeremaininefficiencies. The latter is called peepho~e opti~ization.
1 Overall s ~ c t u r eof a code
ator. or.
In s u ~ a r ycode : t en era ti on is per~ormedin three phases, as shown in Fi
- reprocessi in^, in which nerationproper, se~uences,and
A T nodepatterns are replaced by other
inwhich
A T nodepatterns
are re
('
1 l:
node
Roadrnap 4.2 Code gene~tion 4.2. l Avoiding code g ~ n ~ r a t altogether io~
ode generation and dynamicp r o g r a ~ ~ i n g
4.2.9 Evaluatio~of code g ~ ~ ~ r atechniques tio~ ging of code optirnizers Around code generation: 42.11 preprocess in^ the intermediate code 4.2.12 Postprocessing the target code 4.2.13 ~ a c h i n ecode generation
the AST of the source pro. This can be achieved by ing the AST, if the operats ~ g ~ eand ~ tdata s struc-
piler interface, rig
piler is a valid option.
tions, etc. ~ o r ~ ~little l ~ory , may co~tainexpressions thathave may require code at run time
if-then state~ents, multi-~a tion calls, e s c e ~ t i o ~ ~ a ~ d l i n cedure calls), etc. If we are trans~ating to real
in library ro~tinesthan m a olog clause the hea
constructs, for e x a ~ ~ array l e indexi~g.
ough 42.7); the division betwe~n
are ore ea
I1parser. h" "thread.h" "backend.1.1"
#include #include #include
/* for types AST-node and E~pression*/ /* for Thread-ASTO and Thread-start */ /* for self check*/ /* P~IVATE*/
static AST-node *Active-node-pointer; static void Trivial-code-generation(v0id) { \llstack.h\ll\nint rnain(void) { \ n l I ); de-pointer != 0) { /* there is only one node type, Expression: */ Expression *expr= Active-node-pointer; switch (expr->type) { case 'D': printf ('lPush ;\nS1 (%d)expr->value); break; case , P t: printf ( l t {\n\ int e-left = Pop(); int e-right Po switch (%d) {\n\ case . Push(e-left + e-right); break;\n\ case Push(e-1eft * e-right); break;\n\ 11\m" exprPoper I
1:
' + l
' * l : I
):
1
1
1
break;
Active-node-pointer
= Activ~-node-pointer->s~ccessor;
printf (flprintf(\ll%%d\\n\", printf (llreturn ~ ; } \ n l f :)
Pop()) ; /* print
the
result* / \ n f l );
/* P U ~ L I*/~ void Process (AST-node *icode) { Thread-AST(icode); Active-node-pointer = Thread-start; Trivial-codemgeneration();
1
.l2 A trivial code generator for the demo compiler of Section 1.2.
in the inte~reter,and Expression-P for the case ly those interpreter routines that are actually usedby a p included in the threaded code. The characteristic advantage of threaded code is that it is small. is ~ ~ i used ~ inl y process control andembed~edsystems, to control hardware with iimite for example ~ a l ~ t and o ~tele~hones. s code by hand,but t~readedcodecan also be g e n e ~ ~ very t e ~ well
#include Iistack.hi1 int main(void1 { Push(7) ; Push(1); Push(5); -left = pop () ; int e-right = Pop 0 ; switch (43) { case Push(e-left -I- e-right); break; * e-right); break; case Push(e-left /-I-/:
{
1)
int e-left = Pop ( ) ; int e-right = Pop 0 ; switch (42) { case Push(e-left -t e-right); break; Push(e-left * e-right); break; case /-I-/:
>> printf
return
/*/:
(rt%d\nli,
o ;}
Pop() ) ; /* print
the
result*/
.l3 Code for (7 * ( 11-5)) generated by the code generator of Figure4.12.
int main(void) { Expression-D ( 7 ) ; E~pres~ion-D ( 1 ); Expression-D ( 5 ) ; xpression-P (43); /* 43 = ASCII valueaf Expression-P (42); /* 42 = ASCII value of Print (1 ; return 0;
/*/
*/ */
1
Possible threaded code for (7 * (1 + 5 1 .
code was first rese a variety of contexts
the PDP-3 1 (1973) er and Walker, 3980;
code size reduction is desired, the routines can be numbered and the laced by an array of routine n ~ ~ b e rifs there ; are no more than 256 iffer~ntroutines~one byte per call suffices (see xercise 4.5). Since each routine has a knownnumber of p ~ a ~ e t and ~ r sis all p ~ a ~ e t e derive rs from fields inthe AST and are thus con~tantsknown to the code erator, the ~arameterscan be incor~oratedinto the
#include
li
stack.h"
{ void Expression-D (int digit) Pus~(digit);
1
void Exp~ession-~(intoper) int e-left = Pop 0 ; int switch (oper) { case "I-': Push(e-left + case / * / : Push(e-left *
void Print (void) { printf ( r l % d \ n " ,Pop (1
1
{
e-right = Pop 0
e-right); break; e-right); break;
;
Routines for the threaded code for (7 * (1 -t5 1 ) .
en era ti on has become c o ~ p ~ e t e~l yl ~ ~ e d . Actually, the above t e ~ ~ n i ~ u e ince the code s ~ ~ m e nfrom t s the in
operations, effect jumps, etc. the ~rimitive routine^ need to be e i ~ t ~ ~ r eist ~ e ra c ~ i n
out.
;
Pwtial evaluation in a s e g ~ ~ eofn tthe code ~euerator,
7 Code for (7 * (l+5 ) ) ~ e ~ l eby ~ the ~ ~code ~ e~d~ n e r a tof o rFigure 4.16.
'
M.C. ( ~ a u r i t Cornelis) s Escber (1898-1972), Dutch artistknown for his intri ous drawings and ~ a i n t i ~ g s .
ore ground ( ~ n - n o wview ) of partiallye v a l u ~ tcode. i~~
.l9 ~ackground(run-later) view of partially ~ v ~ l u a t icode. n~
ram generation9and the more co~piler-constructionoriented book extensive example of eneratin an object code se~mentby ~ a n u a partial l eval~ationcan be found in Section 8.
In simple code gene ratio^, a fixed t~anslationto the target codeis chosen for each ~ o s s i ~ l e code ~eneration9 the nodes in the AST are then replacedby their translations, and the AST is linearized by following the data flow in flow of controlelsewhere. Since thecorrectness of this corn d~pendsvery rnuch on the inte~faceconventions between each i ~ ~ o ~ toakeep n tthese interface conventionssimple;but, as e conventions allow more efficient transla~ions. lmple code generation requires local decisions only, and is ther~fore able for narrow compilers. With respect to machine types,it is p
somewhatsimi
models,thepurestackmachineandthepure r e uses a stack to store and manipulate values; It has two typesof inst~ctions,those that moveor copy values between the top of the stack and elsewhere, and those that do o~erationson the top element or e l e ~ e n t sof the sta The stack machine has two important data administration pointers: the stack pointer which points to the topof the stack, and the basepoi~ter P, which points to thebeginni~ of the region on the stack where the local variables are stored; see Figure 4.20. It ma have other data ad~inistrationpointers, for example a pointer to th lobal data area and a stack area limit pointer, but these play no direct role in simple code *
I I
I
I I
I
stack
Data a ~ ~ n i s ~ a tini oa nsimple stack machine.
For our explanation we assume a very simple stack ~ a c ~ i none e , in which all stack e integer and which features only the machine i n s ~ c t i o n s u ~ ~ ~ inz e d many overflow stack with on here; We ~achines Figure 4.21. stack o v e ~ o wis results inand a synchronous interr~pt,which allows the operati Push-Const c pushestheconstant c (incorporated in the m ontothetop of thestack;thisactionraisesthestackpointer by 3. pushes a copy of the value of the i-th local variable on the top of th porated in the machine inst~ction,but BP is add stack element; this raises the stack pointer by 1. ment from the stack and stores its value in the pointer by 1. Add-Top2 removes the top two andpushes the resultbackontothestack;thisactionlowersthestackpointer by l . Subtr-Top2 and Mult-Top2 do similarthin S ; notetheorder of the o p e r ~ d sin Subtr-Top2: the deeper stack entryis the left operand since it was pushed first. Suppose p is a local variable; then the codefor p :=p+5 is
nstruction
Push-Const Push-Local Store-Local
Actions
e i i
SP:=SP+1; stack [SPJ:=c; SP:=SP+1; stack [SPI :=stack [BP+i]; stack [BP+il :=stackESP] ; SP:=SP-1; stack FSP-11 :=stack[SP-l]+stack [SP] ; SP:"sp-1; stack [SP-11 :=stack[SP-l]-stack [SP]; SP:=SP-I; SP:= s P - ~ ; stack ESP-l1 :=stack[SP-11 *stack ; [SP]
Add-Top2 Subtr-Top2 Mu1 t-Top2
Stack machine instructions.
Push-Local Push-Const Add-Top2 -Local
#p 5
#p
// // // //
Push value of #p-th local onto stack. Push value 5 onto stack. Add top two elements. Pop and store result back in #p-th local.
is the position number of p among the local variables. ote that the operands ne instructions are all compile-time constants: the operand of Push-Local and Store-Local is not the value of p -which is a ~ n - t i quantity ~e -but the ~ u k machine model has been made popular
bythe ce all modern machines, with the exception of RI instr~ctionsthismodelstillhaswide applicabi~ity.Its maindisadvantage is that on a emory to store values in, a set of registers to perne set contai~sinstructions to copy he instructions in the othe set perform operathe result in one of them. n our simple regisstore values of type integer; the instructions are here consist of two parts. The first part can be ll of whichimply a register as thetarget,or on as thetarget. The secondpartspecifiesthe Reg, or Mem. For e~ample,an inst~ction d-Const 5 , R 3 would add theconstant 5 to thecontentsister 3. The above i ~ s t ~ c tnames i o ~ havebeenchosen for theirexplanatoryvalue; do not derive from ge. Each assembler has its own set ofinst~ctionnames, most of them emarks are in order here. The first is that the rightmost operand in the edestination of theoperation, in accordancewithmostassembler at this is a property of those assembly lan uages, not of the machine instructions themselves. In two-register inst~ctions,the destination register doubles as the
~
Inst~ction
Actions
Load-Cons Load-Mem Store-Reg
tc ,R, X R, R, ,x
R,t:=fc ; R,,:= X ; X :=RIz ;
Add-Reg Subtr-Reg Mu1t-Reg
R, I R, R, ,R, R, ,R,
R, :=R,+R, ; -R, ; R, :=Rtz ; R, :=R, *R,,
I
Figure 4.22 Register machine instructions.
first source register of the operation during execution; this is a property of the m a c ~ i ~ inst~ctionsof a pure register machine. Load-Mem x ,R, with sema~tics The secondremark is thattheabovenotation R, :=xis misleading. We should actually have written Instruction
Load-Mem
Actions
&x, R,R,:
=* ( U );
in which &x is the address of x in memory. Just as we have to write Push-Local in which #b is the variable number of b, to push the value of b onto the stack, we sho in principle, write Load-Mem &x,R1 to load the value of x into R1. course that machine inst~ctionscan contain constants only: the load-co contains the constant value directly, the load-memory and store-memory in tain constant addresses that allow them to access the values of the variables. ally assembly languages consider the address indication & to be implicit in store instructions, making forms like Load-Mem x,R1 the normal way value of a variable into a register; its semantics is R, :=* { &x),in which the address operator & is provided by the assembler or compiler at compile time and the ~ e r ~ f e r e ~ c e operator * by the inst~ctionat run time. The code for p :=p+5 on a register-memory machine would be: Load-Mem Load-Const Add-Reg Store-Reg
p,R1
5 ,R2 R2 ,R1 Rl,p
in which p represents the addressof the variable p. The most pro~inentrepresentative of thismodel is theI modern machines have registers, the model is very relevant. Its e main problemis that the number of registers is limited.
we can generate stack machine code for arithmetic expressions. As the ex~ressionb*b - 4 * (a*c ) ;its AST is shown in Figure 4.23.
W
*
*
a
/\ C
The abstract syntax tree for b * b - 4 * (a *c ) .
ext we ~ o n s i ~ the e r ASTs that belong to the stack machine instructions from Fig-
Push-Const C:
Add-TopZ
:
/+\
l
Push-Local i:
l
Subtr-Top2
l
Mult-Top2:
l
i
:
Store-Local i:
i
l /-\ .-
/"\
/*\
The abstract syntax trees for the stack machine i n s t ~ c t i o ~ s .
erface convention that operands are supplied to and retrieved from the e trivial: each machine instruction correspondsexactly to one ee Figure 4.24. As a result, the r ~ ~ r i t i nofgthe tree is also is replaced by its straightforw~dtranslation; see Figure 4.25, in which the variable n u ~ b e r (stack s positions) of a, b, and c.
Subtr-Top2
/ l /\ /\ /\ Mult-Top2
Mu1t-Top2
Push-Local #b
Push-Local #b
Push-Const
4
Mult-Top2
Push-Local #a Push-Local
#c
The abstract syntax tree for b*b - 4 * (a*c)rewritten.
The only thing that is left to be done is to order the inst~ctions.The conventions that an operand leaves its result on the top of the stack and that an operation may only be issued when its opera~d(s)are on the top of the stack i ~ e d i a t e l ysuggest a simple evaluation order: depth-first visit. ~epth-firstvisit has the property that it first visits all the children of a node and then immediately afterwards the node itself; since the chil~renhave put their results on the stack (as per ~ o n v e ~ t i the o ~p)nt can now find them there and can use themtoproduceitsresult.Inotherwords, dept~-firstvisitcoincideswiththedataflow mows inthe of an ex~ression.So wearrive at thecode ~ e ~ ~ r a ta1i o n shown in Figure 4.26, in which the procedure E m i t ( ) produces its p ~ a ~ e t e r ( in s )the proper inst~ctionformat.
P ~ O C E ~ U RGenerate E code (Node) : SELECT Node .type: CASE Constant type: Emit (ltPush-Consts' Node .Value); CASE LocalVar type: Emit (tlPush-Locall' Node .number); CASE StoreLocal type: Emit (lrStore-Localsl Node .number) ; CASE Add type: Generate code (Node .left) ; Generate code (Node .right) ; Emit ( "Add-Top2'I ) ; CASE Subtract type: Generate code (Node ,left) ; Generate code (Node .right) ; Emit ( Subtr-Top2 ) ; CASE Multiply type: Generate code (Node .left) ; Generate code (Node .right) ; Emit ( "Mu1t-Top2 ) ; It
.26 Depth-first code~eneratio~l for a stack machine.
N 4.2
~ p p l y this i ~algorithm ~ to thetopnodeinFigure4.25yieldsthefollowing sequence:
code
Push-Local #b Push-Local #b Nult-Top2 4 Push-Const Push-Local #a. Push-Local #c Mult-Top2 Mu1 t-Top2 Subtr-Top2
successivestack CO urationsthatoccurwhenthissequenceisexecutedareshown igure 4.27, in which values appear in their symbolic form. The part of the stack on h e~pressionsare evaluated is called the ‘working stack’;it istreated more extensively
7 Successive stack c o l l f i ~ u r a t i ofor ~ ~b*b - 4* (a*e )
was said about code ene era ti on for the stack ~ a c h i n applies e to the register XI. The ASTs of the e chine i~st~uctions from Figure 4.22 can be found in utputs are mentioned except for the result
Load-Const c R,
Add-Reg R,,
:
Load-Mem x R,,:
,R, :
Subtr-Reg R,fi, R,
:
+
i"
/\
/\ Store-Reg R,, x: *
X/ :
= \
R,
The abstract syntax treesfor the register machine instructio~s.
of the top instruction, the output register of an inst~ctionmust be used im~ediatelyas an input register of the parent instruction in the AST, and that, for the moment at least, the two input registersof an instruction must be different. Notethat as a result of theconventiontonamethedestinationlast in a s s e ~ ~ l ~ instructions, the two-operand instructions mention their operandsin an order reversed that which appears in the ASTs: these i n s t ~ c t i o ~mention s their second source re first,sincethefirstregister is thesame as the destination,whichismentioned s ~nfortunately,this m a y occasionally lead to some confusion.
R1 are available as auxiliary registers.
Registerallocation is noweasy;seeFigure 4.29, inwhich Target is a re number and Aux is a set of register numbers. Less accurately, we will refer toTarget as a register and toAux as a set of registers. The code for the leaves in the expression tree is strai~htforward: justemit the using the target register. The code for an operation node starts with code for the left using the same parameters as the parent: all auxiliary registers are still available and th result must arrive in the target register. For the right child the situation is dif~erent:on register, Target,is now occupied, holding the result of the left tree. We therefore pick a register from the auxiliary set, Target 2, and generate code for the right child wit
~ R O C E ~ U RGenerate E code (Node, a register Target, a register set Aux): SELECT Node .type: CASE Constant type: Emit ("Load-Const Node .value I I I R i l Target); CASE Variable type: Emit ( l1Load-Mem Node .address , R t 1 Target); CASE ... CASE Add type: Generate code (Node.left, Target, Aux) ; SET Target 2 TO An arbitrary element of Aux; SET Aux2 TO Aux \ Target 2; // the \ denotes the set difference operation Generate code (Node .right, Target 2, Aux 2); Emit ( I'Add-Reg R" Target 2 , R i 1 Target) ; CASE .. igure 4.29 Simple code ene era ti on with register allocation.
eaves the result in Target and frees Target 2. So when we leave the routine, all auxctuall~,no set ~anipulationis necessary in this case, because the set can be impleally a stack of the numbers l to the numberof available registers, a
shown in Figure 4.31, Figure 4.32 shows the ont tents of the of this code. The simil~itywith Figure 4.27 is i ~ e d i a t e :
ding b twice is dealt with in the subsection on common n 4.2.5.1.) The reason is that one register gets tied up e a*c is being computed. If we had treated the right
result of the first child. So it is advantageous to gen-
e can be computed from the weights of its children,
c
PROCE~UREGenerate code (Node, a register number Target): SELECT Node .type: CASE Constant type: Emit ("Load-Const Node .value l l t RTarget) " ; CASE Variable ty mit ( I1Load-Mem Node .address ,R" Target) ; CASE ... CASE Add type: Generate code (Node left, Target); Generate code (Node .right, Target+-l) ; Emit (liAdd-RegRIi Target+-lll,Rsi Target) ; CASE ... I
uce 4.30 Simple code genera~ionwith register n u ~ b e ~ n ~ .
Load-Mem Load-Mem Mult-Reg Load-Const Load-Mem Load-Mem Mu1t-Reg Mult-Reg Subtr-Reg
b,R1 b,R2 R2,R1 4,R2 a,R3 c,R4 R4 ,R3 R3 ,R2 R2 ,R1
.31 Register machine code for the expression b* b
- 4 * (a*c ) .
the weight of a subtree can be d e t e r ~ i n esi~ ly by a depth-first prescan, as shown in Fi ure 4.34. If the left tree is heavier, we compile it first. hold in^ its result CO doing the second tree costs Required right re~isters, tog rightt-l,but since Required left > Required ri left, so Required right+l cannot be larger than Required fice. The same applies vice versa to the right tree if it is heavier. If bot require one extra register. This techni~ueis someti~escalled ,after its designers (Sethi and~ l l ~ a 1n ,~ ~ 0 ~ . b*b - 4 " ( a * c ) , withthe ee that the tree a* c is heavier see that this leads to the code sh Theabove computatio~s~ e n e r a l i % to ~o~e~ations with n operands. An e x a ~ p l eof such an operation is a ro~tinecall with y1 ~ ~ a ~ e t eunder r s , the not unusual conven~io~ that
R2 : R1 :
R4 : R3 : R2 : R1 :
R4 : R3 : R2 :
I
R1 : b*b-4" (a*c)
Successive register contentsfor b*b - 4* (a*c ) .
Load-Mem Load-Mem Mult-Reg Load-Mem Load-Mem Mult-Reg Load-Const Mu1 t-Reg Subtr-Reg
b,R1 b,R2 R2 ,R1 a,R2 c,R3 R3 R2 4,R3 R3,R2 R2 R1
Weighted register machine codefor the expression b*b - 4* (a*c 1.
all ~ ~ a m e t e must r s be passed in registers (for rz smaller than some reasonable n u ~ b e r ) . e ~ g u ~ ethat n t each f i ~ i s h operand e~ takes away one register, registers will be used most e c o ~ o ~ i c a if ~ lthe y ~ ~ a ~ etrees t e are r sorted ~ c c o r d i ~tog weight, the heaviest
FUNCTION Weight of (Node)RETURN IN^ an integer: SELECT Node .type: CASE Constant type: RETURN1; CASE Variable type: RETURN 1; CASE . . CASE Add type: SET Required left TO Weight of (Node .left) ; SET Required right TO Weight of (Node .right); IF Required left> Required right: RETURN Required left; IF Required left Required right: RETURN Required right; // Required left = Required right RETURN Required left+ l; CASE .,. Figure 4.34 Register require~ents(weight) of a node.
AST for b*b - 4 * (a*c)with register weights.
first, and processed in that order (Appel and Supowit, 1987). If the sorted order is then the compilation of tree 1 requires E +O registers, that of tree 2 requires E? ters, and that of tree y2 requires En+n--l registers. The total number of required n
for the node is the m a ~ i m uof~these terms, in a formula max(Ek+k-l). k= 1
reduces to the ~F-statementsin Figure 4.34. Suppose, for example, we have a routine with three parameters, to be ~ e l i v e r ein~ registers R1, 112, and R 3 , with actual parameters of weights livi=l, ?V2=4., and sorting the weights, we conclude that we must process the parameters in the order 2, 3, 1. The computation
~ r a ~ e tnumber er N upied when starting parameter
N
2 4 0
4
3 2 l
3
1 1 2
3
4 that we need 4 registers for the code generation of the parameters. Since we now the first e ~ ~ r e s s i otondeliver its result in register 2, we can no longer use a simple the code of Figure 4.30, but must rather use a set, as in the origi~alcode of Fighe process and its results are shown in Figure 4.36.
com~utationorder
parameter second parameter third
first parameter Evaluation order of three parameter trees.
venthemost i~attentivereader will bynow have noticed that we have swept a very o r t a ~ prob~em t under the rug: the expression to be translated may require more regisrs than the machine possesses or than the compiler designer has set aside for expression valuation. If that happens, oneor more values from registers have to be stored in memory locations,calledtemporaries,toberetrieved later. Onesaysthatthe contents of these spilled, or, less accurately but more cornmonly, that the registers are spilled; and a techniqu~of choosing which register(s) to spill is called a re gister spilling technique (except for exhaustive search), and new tooldtechniquesare still beingdeveloped. The simple d on the observation that the tree for a very complicated a top region in which the weights are higher than the number of registers we on a number of trees dangle, the weights of which are equal to or of registers, We can detach these trees from the original tree and to tempor~yvariables. This leaves us with a set of temporary variables r which we can generate code since we have enough registers, plus a d original tree, to which we repeat the process. An outline of the code
n
is shown in Figure4.37.
~ R O C E ~ U RGenerate E code for large trees (Node, Target register): SET Auxiliary register set TO Available register set\ Target register; WHILE Node /= No node: Compute the weights of all nodes of the tree of Node; SET Tree node TO Maximal non-large tree (Node); Generate code (Tree node, Target register, Auxiliary register set); IF Tree node/= Node: SET Temporary location TO Next free temporary location(); Emit ( I'Store RI1 Target register t r , T 1 lTemporary location); Replace Tree node by a reference to Temporary location; Return any temporary locations in the tree of Tree nod to the pool of free temporary locations; ELSE Tree nodeT= Node: Return any temporary locations in the tree of Node to the pool of free temporary locations; SET Node TO No node; F~C~IO Maximal N non-large tree (Node) RETURNING node: a IF Node .weightc = Size of Auxiliary register set: RETURN Node; IF Node .left .weight > Size of Auxiliary register set: RETURN Maximal non-large tree (Node.left); ELSE Node .right .weight >= Size of Auxiliary register set: RETURN Maximal non-large tree (Node.right); .37 Code generation for large trees.
The method uses the set of available registers and a pool of t e ~ p o r ~variables y in memory. The main routine repeated~yfinds a subtree that can be c o ~ p i l ~using d no more than the available registers, and generates code for it which yields the result in Target entire tree, the code generation process is complete. register. If the subtree was the Otherwise, a temporarylocation is chosen, code formovingthecontents of Targ register to that location is emitted, and the subtree is replaced by a re~erenceto t e ~ p o r alocation. ~ (If replacing the subtree is impossib~ebecause the expression tree is an unalterable part of an AST, we have to make a copy first.) The process of compilin subtrees continues until theentire tree has been consumed. The a ~ ~ i l function i a ~ Maximal non-large tree (Node) first checks if the tree of its parameter Node can already be compiled with the available registers; if so, the n o n - l ~ g etreehasbeenfound. Other~ise,atleastone of the c ~ i l ~ r eofn Node must require at least all the available registers. The function then looks for a non4 the left or the right child; since the register re~uirementsdecrease goin
the code generatedfor our sample tree when compiled with2 regislster is spilled, to temporary variable T1.
Load-Mem Load-Mem Mu1 t-Reg Load-Const Mu1t-Reg Store-Reg Load-Mem Load-Mern Mu1t-Reg Load-Mem Subtr-Reg .38 Code generated for b*b -
a,Rl c,R2 R2,R1 4,R2 R2,R1 R1 T1 b,R1 b, R2 R2 R1 FT1 R2 R2 R1 I
I
I I
4 * ( a * c 1 withonly 2 registers.
A few words may be said about the number of registers that a compiler designer should reserve for expressions. Experience shows (Yuval, 1977) that for handw~ttenproregisters are enough to avoid spilling almost completely. A problem is, hownerated ~ r o g r a can ~ s and indeed do contain ~bitrarilycomplex expressions, ons side ring thatsuchgeneratedprograms or 5 registerswillnotsuffice. bly cause spilling even if much larger numbers of registers were set aside for expressions, reserving 4 or 5 registers still seemsa good policy.
addition to thepureregistermachineinstructionsdescribedabove,manyregister chines have instructions for combining the contents of a register with that of a memory of location. An example is aninstruction Add-Nem X.,R1 foraddingthecontents X to R1. The above techni~uesare easily adapted to include these new example, a memory location as a right operand now requires zero regisne; thisreducesthew of thetrees. The newtree is shown in Figresultingnewcodeine 4.40. We seethatthe a ~ g o r i t ~now m prosubtree 4 *a* c first, and that the produced code differs com~letelyfrom
There are a few simple techniq~esfor allocating registers for the entire routine we are L for the first L local he simplest is to set aside a fixed n ~ ~ bofe registers r rest of the available registers as workin ailable registers are those that are not ne k limit pointer, heap pointer, activation record base ~ointer,etc.).
.39 Register-weighted tree for a memory-register machine.
Load-Const Load-Mem Mu1t-Mem Mult-Reg Load-Mem Mu1t-Mem Subtr-Reg
4,R2 a,R1 c,Rl R1 ,R2 b,R1 b,R1 R2 ,R1
.40 Code for the register-~eighte~ tree for a memory-register machine.
With a little bit of effort we can do better: if we set aside L, registers for local variables we do not necessarily have to give them to the first L, such variables, but we may able to make a better choice. For example, the C language allows local variables to have the storage attributer e g i S t e r , and priority canbe given to these variables when handi out regi more sophisticated approach is use tousage counts (Frei 1974). is an estimate how of frequently a variable is used. The it is best to keep the most fsequently used variables in registers. Frequency estimates ca be obtained from static or dynamic profiles.See below for more on profiling info~mation. The problem with these and all other procedure-wide register allocation schemes is that they assign a register to a variable even in those regions of the routine in which the variable is not used. In Section 4.2.7 we will seea method to solve this problem.
~ v a l ~ ~oftsii o ~ p~l ecode
ene era ti on
Quite generally speaking and as a very rough estimate, simple code generation loses about a factor of three over a reasonable good optimizing compiler. This badly ~uantifiedstatement means that it would be surprising if reasonable optimization effort didnot bring a factor of two of improvement, and that it would be equally su~risingif an improvel~entfactor of six could be reached without extensive effort. Such an extensive effort probably be spent in many small but often complicated refinements, each yiel
rofiling i n f o r ~ ~ t i ~ n e honest,labor-intensiveway of obtainingstatistical i n f o ~ a t i o n codeusage is by dynamicprofiling.Statementsareinserted, ally or auto~atically,into the program, which produce a record of e code are executed: the program is i ~ s t r is then run on a representative set of input data and the records red and condensedinto the desired statistical usage data. ractice it is simpler to do static profiling, based on the simple control of traffic entering a node the a ~ o u n leaving t it; this is the ow-of-control equivalent of off‘s laws of electric circuits (Knuth and Stevenson, 1973). The procedure body is set to, say, 1. At if-statements we the incoming stream passes through the then”part and lse-part; loops are (re)entered 9 out of 10 times; etc. f linear equations, which can be solved, resulting in all the basic blocks. See Exercises 4.9 and 4.10.
ow traffic rule which says that the amount
these techniques here, between the simple and the more advance compilaS
we recall, theidea
of symbolic inter~retationwastohaveanapproximate try of each node and to transformit into an ap~roximate e. The stack representation was approximate in that it usuall~ r~cor~ informatio~ ed items like ‘x is initiali~ed’rather than ‘x has the value 3’, g an approximation is, of course, sentation at compile time. After been initialized, but we have no ame technique but does keep the nded to include the stack, the will call such a representation r for short. Now, if the effect of a e do so. This is, for example, f the assignment x := 3 can be
recorded exactly in the regvar descriptor as ‘x = 3’. If, however, we cannot, for some reason, record the effect of a node exactly in the regvar descriptor, we solve the problem by generating code for the node and record its effect in the regvar descriptor. When confronted with an assignment x :=read-real ( ) we are forced to generate code for it. ine instruction and Suppose in our compiler we call a function by using a Call-Rout suppose further that we have decided that a function returns its result in register R1. We ine read-real. and record in the regvar descriptor then generate the code Call-Rout ‘The value of x is in R1’. Together they i ~ p l e ~ e nthe t effect of the node x :=read-real () exactly. In this way, the regvar descriptor gets to contain detailed information about which registers are free, what each of the other registers contains, where the present valuesof the local and temporary variables can be found, etc. These data can then be used to produce better code for the next node. Consider, for example, the code s e g ~ e n t x:=read-real ( ) ; y :=x * x, Atentrytothesecondassignment,theregvar descriptor contains ‘The value of x is in R1’.Suppose register R4 is free. Now the second assignmentcanbetranslatedsimply as Load-Reg R1,R4 ; Mult-Reg RI,R4, which enters a second item into the regvar descriptor, ‘The valueof y is in R4’,Note that the resulting code Call-Routine Load-Reg Mult-Reg
read-real R1,R4 R1,R4
does not access the memory locations of x and y at all. If we have sufficient registers, the values of x and y will never have to be stored in memory. This technique combines very well with live analysis: when we leave thelive range of a variable, we can delete all inforfree a register. mation aboutit from the regvar description, which will probably Notethat a registercan contain thevalue of morethanonevariable:after a :=b := c expresS ion.>,the register that received the value of the expression contains the present values of both a and b. Likewise the value of a variable can sometimes be found in more than one place: after the generated code Load-Mem x , R3,the value of x can be found both in the location x and in registerR3. The regvar descriptor can be implemented as a set of information items as suggested above, but it is more usual to baseits i~plementationon the fact thatthe regvar descriptor has to answer three questions:
-where can the value of a variable V be found?
-what does register R contain? -which registers are free?
It is traditionally implemented asa set of three data structures:
escriptors, addressed by register numbers, whose n-th entry conon what register n contains; eseriptors (also known as a dress d~scriptors),addressed by variable names, whose entryV contains information indicating where the valueof variable V can be found; and
n
-a set of free registers. The advantage is that answers to questions are available directly, the disadvantage inserting and removing information may require updating three data structures. When this technique concentrates mainly on the registers, it is called
is that
As explained at the beginning of Section 4.2, instruction selection, register allocation, and
inst~ctionordering are intertwined, and finding the optimal rewriting of the AST with available instruction templates is ~P-complete(Aho et al., 1977; run0 and Sethi, 1976). We present here three techni~uesthat each addresses a part of the problem. The first, basic bZoc~s,is mainly concerned with optimization, instruction selection, and instruction ordering in limited parts of the AST. The second, ut^^^-^^ tree r e w r ~ t ~discussed ~g, in Section 4.2.6, shows how a very good inst~ctionselector can be generated automatically for very general instruction sets and cost functions, under the assumption that enoughregisters are available. The third, register all~cationby g ~ colaor in^, ~ discussed ~ in Section 4.2.7, explains a good and very general heuristic for register allocation. In an actual compiler some of these techni~ues wouldbecombinedwitheachother andor withad-hoc approaches, We will turn first to the useof basic blocks. Goto statements, routine calls, and other breaks in the flow of control are complicating factors in code generation. This is certainly so in nmow compilers, in which neither the code from which a jump to the present code rnay have originated nor thecode to which control is transferred is available, so no analysis can be done. ut it is also true in a broad compiler: the required code rnay be available (or in the case of a routine it rnay not), but information about contents of registers and memory locations will still have to be merged at the join nodes in the flow of control, and, as explained in Section 3.2.2.1, this merge may have to be performed iteratively. Such join nodes occur in many places even in wellstructured programs and in the absenceof user-written jumps: the join node of the flow of control from thethen-p~tand the else-part at the end of an if-else statement is an example. ration in more ‘quiet’ parts of the AST has led to the idea s a part of the control graph thatcontains no splits (jumps) orcombines(labels). It is usualtoconsideronly which cannot be extended by including adjacent nodes without violating the definitionof a basic block. A maximal basic block starts at a label or at the beginning of the routine and ~ e or label or the endof the routine. A routine call ends just before a jump or j u ~ p - l i node terminates a basicblock, after the parametershavebeenevaluatedandstoredintheir required locations. Since jumpshave been excluded, the control flow inside a basic block cannot contain cycles, Intheimperativelanguages,basicblocksconsistexclusively of expressionsand assignments, which follow each other sequentially. In practice this is also true for functional and logic languages, since when they are compiled, imperative code is generated for thel~. The effect of an assign~entin a basic block rnay be local to that block, in which case the resulting valueis not used anywhere else and the variable is dead at the endof the basic
block, or it may be non-local, in which case the variable is an output variable of the basic block. In general, one needs to do routine-wide live analysis to obtain thisi n f o ~ a t i o nbut , sometimes simpler means suffice: the scope rules of C tell us that at the end of the basic block in Figure 4.4l n is dead.
{
1
int n; n x n y
= a + l ; = b + n*n = n + l ; = d * n ;
-I-
c;
1 Sample basic blockin C.
Ifwedo not have this i n f o ~ a t i o n(as is likely in a narrow compiler) we have to assume that all variables are live at basic block end; they are all output variables. Sirnilarly, last-def analysis (as explained in Section 3.2.2.3) can give us information about the values of input variables to a basic block. Both types of information can allow us to generate better code; of the two, knowledge about the output variables is more important. A basic block is usually required to deliver its results in specific places: variables in specified memory locations and routine para~etersin specified registers or places on the stack. We will now look at one way to generatecode for a basic block. Our code generation proceeds in two steps. First we convert the AST and the control flow implied in it into a AST thedependencygraph is a a directedacyclic dependencygraph;unlikethe graph. We then rewrite the dependency graph to code. We use the basic blockof Figure 4.41 as an example; its AST is shown in Figure 4.42. It is convenient to draw the AST for an assignment with the source as the left branch and the destination as the right branch; to emphasize the inversion, we write the traditional assign~entoperator := as = :. The C program text in Figure 4.41 shows clearly thatn. is a local variable and is dead at block exit. We assume that the valuesof x and y are used elsewhere:x and y are live at block exit; it is immate~alwhether we know this because of a preceding live analysis or just assume it because we h o w no thin^ about them.
Until now, we have threaded ASTs to obtain control flow graphs, which are then used to make certain that code is generated in the right order. But the restrictions imposed by the control flow graph are often more severe than necessary: actually only the data dependencieshavetobeobeyed.Forexample,thecontrolflowgraph for a + b definesthat a must be evaluated before b, whereas the data dependency allows a and b to be evaluated in any order. As a result, it is easier to generate good code from a data dependency graph
i
~i
/+\
/=:\
n 1
+/+\
X
n
/\ n
/*\
/+\
= /
i = /
: \
: \
n 1
Y
d /*\
n
n
.42 AST of the sample basicblock.
than from a control flow graph. Although in both cases any topological ordering consistent with the interface conventions of the templates is acceptable, the control flow graph generallydefinesthe order preciselyandleavesnofreedomtothetopologicalorderin whereas the data dependency graph often leaves considerab~e freedom, One of the most impo~antproperties of a basic block is that its AST including its control flow graph is acyclic and can easily be convertedinto a data dependency graph, which ad~antageousfor code generation. There are two main sourcesof data dependencies in the AST of a basic block: data flow inside expressions. The resulting data dependencies come in two varieties, downwardfroman assignme~toperatortothedestination,andupwardfromthe operands to all other operators. The generated code must implement this data flow (and of course the operations on these data). data flow from values assigned to variables to the useof the values of these variables in further nodes. The resulting data dependencies need not be suppo~edby code, since the data flow is effected by having the data stored in a machine location, from where it is retrieved later. The order of the assignments to the variables, as implied by the flow of control, must be obeyed, however. The implied flow of control is simple, since basic blocks by definition contain only sequential flowof control. For a third source of data dependencies, conce~ingpointers, see Section4.2.5.3. Three obser~ationsare in order here: The order of the evaluation of operations in expressions is immate~al,as long as the data dependencies inside the expressions are respected. If the value of a variable V is used more than once in a basic block, the order of these uses is immaterial, as long as each use comes after the assig~mentit depends on and before the next assignment toV.
-The order in which the assignments to variables are executed is im~aterial,as long asall assignments to a specific variable V are executed in sequential,left-to-right, order. These considerations give us a simple algorithm to convert the AST of a basic block to a data dependency graph. 1. Replace the arcs that connect the nodes in the AST of the basic block by data dependency arrows. Thearrowsbetweenassignmentnodesandtheirdestinationsinthe expressions in the AST pointfrom destination node to assignment node; the other arrows point fromthe parent nodes downward, 2. Insert a data dependency arrow from each variable V used as an operand to the assignment that set its value, or to the beginningof the basic block if V was an input variable. replaced. This dependency reflects thefact that a value stays in a variable until 3. Insert a data dependency arrow from each assignment to a variable V to the previous assign~entto V , if present. This dependency reflects the fact that an assignment to a variable replaces the old value of that variable. 4. Designate the nodes that describe the output values as roots of the graph. From a data dependency point of view, they are the primary interesting results from which all other interesting results derive. 5. Remove the ;-nodes and their arrows. The effects of the flow of control specified by 2 and 3. them have been taken over by the data dependencies added in steps
Figure 4.43 shows the resulting data dependency graph.
a
J '
L
1
.43 Data dependency graph for the sample basic block.
Nest, we realize that an assignment in the data dependency graph just passes on the valueandcanbeshort-circuited.Also, we caneliminatefromthegraph all nodesnot reachable t~roughat least one of theroots.Thesesimplificationsyield the final data
n
dependency graph redrawn in Figure4.44.
X
Y
+
*
+
c
a
1
d
+
Cleaned-up data dependency graph for the sample basic block.
Note that the only roots to the graph are the external dependencies for x and y. Note also that if we happened to know that x and y were dead at block exit too, the entire data dependency graph would disappear automatically. Figure 4.44 has the pleasant property that it specifies the semanticsof the basic block precisely: all required nodes and data dependencies are present and no node or data dependency is superfluous. eforegoingintotechniques of convertingthedependencygraphintoefficient e inst~ctions,however,wehavetodiscusstwofurtherissues conce~ingbasic blocksanddependencygraphs. The first is an impo~ant opti~i%ation, ~ u sub- ~ e ~ p r ~ s s i u ~ e l i and ~ i ~the ~ tsecond i o ~ , is the traditional representation of basic blocks and dependency graphs as triples.
~~~~n s ~ ~ e ~ ~ r e~ sl si ~ i ~i n n~ti~n Experiencehasshownthatmanybasicblockscontain e expressions that occur more than once in the basic block and evaluate to the same value at each occurrence. Common subexpressions originate from repeated subexpressions in the source code, for example x = a*a y = a*a
+ 2*a*b + b*b; - 2*a*b
-+
b*b;
which contains three c o ~ ~ subexpressions. o n This may come as a surprise to C or Java programers, who are used to factoring out common s~bexpressionsalmost without thinking:
~
double quads = a*a + b*b; double cross-prod = 2*a*b; x = quads + cross-prod; y = quads - cross-prod;
but such solutions are less convenient in a langua tions in sub-blocks. Also, common subexpressions can be generated by the intermediate code generation phase for many constructs in many languages, including C. For example, theCexpression a [i] -t b [i] ,inwhich a and b arearrays of 4-byteintegers, is translated into *(a
+
4*i)
+ *(b +
4*i)
which features the common subexpression 4 *i . Identifyingand c o ~ ~ i n common i ~ g subexpressions for thepurpose of computi~g them only once is useful, since doing so results in smaller and faster code, but this only workswhenthevalue of theexpression is thesame at each occurrence.Equalsubexpressions in a basic block are not necessarilycommon subexpressions. For example, the source code x = a*a + 2*a*b + b*b; a = b = O ; y = a*a - 2*a*b + b*b;
still contains three pairs of equal subexpressions, but they no longer evaluate to the same value, due to the intervening assign~ents,and do not qualify as ‘common subexpressions’. The effect of the assignments cannot be seen easily in the AST, but shows up immediately in the data dependency graph of the basic block, since the a s and bs in the third line have differentdependencies from thoseinthefirstline. This meansthatcommonsubexpressions cannot be detected right away in the AST, but their detection has to wait until the data dependency graph has been constructed. Once we have the data dependency graph, finding the common subexpressions is simple. The rule is that two nodes that have the operands, the operator, and the dependencies in common can be combined into one node. This reduces the number of operands, and thus the number of machine inst~ctionsto be generated. Note that we have already met a simple version of this rule: two nodes that have the operand and its dependencies in common can be combined into one node. It was this rule that allowed us to short-circuit the assignments and eliminate the variablen in the transfo~ationfrom Figure 4.43 to Figure 4.44. ConsiderthebasicblockinFigure4.45,whichwasderived from theoneinFigure 4.41 by replacing n by n*n in the third assignment. Figure 4.46 shows its data dependency graph at the ~ o m e n that t the common variables with identical dependencies have already been elil~i~ated. This results in the presence of two nodes with identical operators (*),identical operands (the + node), and identical data dependencies, again on the -tnode. The two nodes can be combined (Figure4.4171, resulting in the eli~inationof the common subexpression. Detecting that two or more nodes in a graph are the same is usually imple~entedby storingsomerepresentation of each nodeinahashtable. If thehashvalue of anode
{
1
int n; n x n y
= a + l ; = b + n*n + c; = n*n -t 1; = d * n ;
/* subexpression n*n .., */ /* ... incommon */
Figure 4.45 Basic block in C with common sL~b~xpre~sion.
b
a
X
Y
+
*
*
*
1
1
Figure 4.46 Data dependency graph with common subexpression.
depends on its operands, its operator, and its dependencies,c o m ~ o nnodes will hash to the same value. As is usual with hashing algorithms, an additional check is needed to see if they really fulfill the r e q u ~ e ~ e ~ t s , As with almost all optimi~ationtechniques, the usefulness of common subexpression elimination depends on the source language and the source program, and it is difficult to give figures, but most compiler writers find it useful enough to include it in their compilers. T~aditionally,data dependency graphs are implemented as arrays of triples. A record with three fields representing an operator with its two operands, and corresponds to an operator node in the data dependency graph. If the operator is monadic, the second operand is left empty. The operands can be constants, variables, and indexes to other triples. These indexes to other triples replace the pointers that connect the nodes in the data dependency graph, Figure 4.48 shows the array of triples co~e~ponding to the data dependency graph of Figure 4.44.
X
Y
+
*
+
c
+
d
7 Cleaned-up data dependency graph withcommon subexpression eliminated.
position 1 2 3 4 5
6 7
8
triple a + l @l * @l b + @2 @3
+
c
=: x @l + l d * @6 @4
@7 =: TyT
Figure 4.48 The data dependency graphof Figure 4.44 as an array of triples.
~eneratinginstructions from a data dependency graph is very similar to doing so from an AST: the nodes are rewritten by machine instruction templates and the result is linearized. The main difference is that the data dependency graph allows much more leeway in the order of the inst~ctionsthan the AST, since the latter reflects the full sequential specification inherent in imperative languages. So we will tryto exploit this reater freedom. In this section we assume a 'register-memory machine', a machine with reg op := rnern inst~ctionsin addition to the reg op := reg inst~ctionsof the pure register machine, and we restrict our generated code to such inst~ctions,to reduce the complexity of the code generation. n e available machine inst~ctionsallow most of the nodes to be rewritten simply by a single appropriate machine inst~ction,and we can concentrate on lineari-
zation and register allocation. We will turn to the linearization first, and leave the register allocation to the next subsection.
inear~zutionof the ~ a t ~ a e ~ e n ~~e rn ac ~~ h Wehaveseenattheb of Section 4.2 thatany lineari~ationobtained by atopological ordering of theins is acceptableas far ascorrectness is concerned,butthat for optimization purposes some orderings are better than others. In the absence of other criteria,twoorderingssuggestthemselves,correspondingtoearlyevaluationandtolate evaluation, respectively. In the early evaluation orde~ng,code for a node is issued as soon as the code for all of its operands has been issued. In the late evaluation ordering, codefor a node is issued as late as possible. It turns out that early evaluation ordering tends to require more registers than late evaluation ordering. The reason is clear: early evaluation ordering creates values as soon as possible, which may be long before they are used, and these values have to be kept in registers. We will therefore concentrate on late evaluation t is useful to distinguish between the notionof ‘late’ evaluation used here rather than themorecommonnotion of ‘lazy’evaluation. The difference is that‘lazyevaluation’ impliesthatwehopetoavoidtheactionatall,whichisclearlyadvantageous;in‘late evaluation’ we know beforehand that we will have to perform the action anyway, but we find it advanta~eousto perform it as late as possible, usually because fewer resources are tied up that way. The same considerations applied in Section 3.1 5 . 3 , where we tried to evaluate the attributes as late as possible. ven within the late evaluation ordering there is still a lot of freedom, and we will exploit this freedom to adapt the ordering to the characterof our machine instr~~ctions. We observe that re~ister-me~ory machines allow very efficient ‘ladder’ sequences like Load-Mem Add-Mem Mu1t-Mem Subtr-Mem
a ,R1 b,R1 c ,R1 d,R1
for the expression ( ( (a+-b) *c ) -d) ,and we would like our ordering algorithm to produce such sequences. To this end first we define an cea in data dependency graph: 1.Each root nodeof the graph is an available ladder sequence. 2. If an available ladder sequence S ends in an operation node N, then S extended with N’s left operand is also an available ladder sequence. 3. Ifan available ladder sequenceS ends in an operation nodeN whose operator is c o m ~ u tative -~ e a n i n gthat the left and right operand can be interchanged without affecting the result -then S extended withN’s right operand is also an available ladder sequence.
In other words, available ladder sequences startat root nodes, continue normally along left operands but inay continue along the right operand for commutative operators, may stop anywhere, and must stop at leaves. Code generated for a given ladder sequence starts at its last node, by loading a leaf variable if the sequence ends in a leaf, or an i n t e ~ e d i a t evalue if the sequence ends ear-
lier. Working backwards along the sequence, code is then generatedfor each of the operation nodes. Finally the resulting value is stored as indicated in the root node. For example, the code generatedfor the ladder sequencex,+,+,b in Figure 4.44 would be Load-Mem Add-Reg Add-Mern Store-Reg
b, R1
I1,Rl c,R1 Rl,x
assuming that the anonymous right operand of the + is available in some register ‘Intermediate l ’). The actual rewriting is shown in Figure4.49.
11
(for
X
Store-Reg Rl,x
+
Add Add
b
I1
7
l
Memc,R1
7
Reg I1,Rl
Load-Mem b,R1 Rewriting and ordering a ladder sequence.
The following simple heuristic ordering algorithm tries to combine the identification of suchladdersequenceswithlateevaluation.Basically, it repeatedlyfindsaladder sequence from among those that could be issued last, issues code for it, and removes it in reverse order and the last from the graph. As a result, the instructions are identified instruction of the entire sequence is the first to be determined. To delay the issues of register allocation, we will use pseudo-registers during the ~inearization. like normal registers, except that we assume that there are enoughof them. We will see in the next subsection how the pseudo-registers canbe mapped onto real registers or memory locations. However, the register used inside the ladder sequence must be a real register or the whole plan fails,so we do not want to run the risk thatit gets assigned to memory during register allocation. Fo~unately,since the ladder register is loaded at the beginning of the resulting code sequence and is stored at the endof the code sequence, the live ranges of the registers in the different laddersdo not overlap, and the same real register,for example RI,can be usedfor each of them. The algorithm consists of the following five steps. 1. Find an acceptable ladder sequence S that has the property that none of its nodes has more than one incoming data dependency. 2. If any operand of a node N in S is not a leaf but another node M , associate a new pseudo-register R with M if it does not have one already; use R as the operand in the
code generated for N and makeM an additional root of the dependency graph. 3. Generate code for the ladder sequenceS, using RI as the ladder register. 4. Remove the ladder sequenceS from the data dependency graph. 5. Repeat steps 1 through 4 until the entire data dependency graph has been consumed and rewritten to code. In step 1 we want to select a ladder sequence for which we can generate code il~mediately in a last-to-first sense. The intermediate values in a ladder sequence can only be used by code that will be executed later. Since we generate code from last to first, we cannot erate the code for a ladder sequence S until all code that uses i n t e ~ e d i a t evalues from S hasalreadybeengenerated. So anysequencethathas i n c o ~ i ndata ~ dependencieswill have to wait until the code that causes the depend~ncieshas been generated and removed from the dependency graph, together with its dependencies. This explains the ‘incoming data dependency’ part in step 1. Step 2 does a simple-~ndedform of register allocation. The other steps speakfor themselves.
X
Y
*
+
c
d
+
‘50 Cleaned-up data dependency graph for the sample basic block.
Returning to Figure 4.44, which is repeated here for convenience (Figure 4.50), we see that there are two available ladder sequences without ~ultiple inco~ing data dependencies: x,+,+,*,in which we have followed the right operand of the second addition; and y, *, +. It makes no difference to the algorithm which one we process first; we will on the weak grounds thatwe are generati~gcode laststart here with the sequence y, *,-I-, to-first, and y is the rightmost root of the dependency graph. T’he left operand of the node + in the sequence y, *, + is not a leaf but another node, the + of a + 1, and we associate the first free pseudo-register X1 with it. We make X1 an additional root of the dependency graph. So we obtain the following code:
Load-Reg Add-Cons Mu1t-Mem Store-Reg
X1 ,R1 tl ,R1 d ,R1 R1 ,y
Figure 4.51 showsthedependencygraph removed,
after theaboveladdersequencehasbeen
x t
+
G
+
Fi~ure4.51 Data dependency graph after removal of the first ladder sequence.
The nextavailableladdersequencecomprisesthenodes S , +, +, *. Wecannot include the + node of a + 1 in this sequence, since it has three incoming data dependencies ratherthan one. The operands of the final node * arenotleaves, but they do not require a new pseudo-register, since they are already associated with the pseudo-register XI. So the generated code is straightfor~ard: X1 ,R1 Load-Reg Mu1 t-Reg X1 ,R1 b,Rl Add-Mem c,Rl Add-Mem R1,x Store-Reg
Removal of this second ladder sequence from the dependency graph yields the graph shown in Figure 4.52. The available ladder sequence comprises both nodes: X1 and +;it rewrites to the following code: Load-Mem Add-Const Load-Reg
a ,R1 l,R1 R1 ,X1
I
Data dependency graph after removalof the second ladder sequence.
Removing the above ladder sequence removes all nodes from the dependency graph, and we have completed thisstage of the code generation. The result is in Figure 4.53.
Load-Mem Add-Cons Load-Reg
a,R1 t1,Rl R1 X1 I
X1 R1 Load-Reg Mu1 t-Reg x1 R1 Add-Mem b,R1 Add-Mern clR1 R1,x Store-Reg I
Load-Reg Add-Const Mu1t-Mem Store-Reg
X1 ,R1 1,Rl d lR1 RllY
igure 4.53 Pseudo-register target code generatedfor the basic block.
e~ister alloc~tion for the line~rizedcode One thing remains to be done: the pseudo-registers have to be mapped onto real registers or, failing that, to memory locations. There are several ways to do so. One simple method, which requires no further analysis,is the following. We map the pseudo-registers onto real registers in the order of appearance, and when we run out of registers, we map the remaining ones onto memory locations. Note that mapping pseudo-registers to memo^ locations is consistent with their usage in the instructions. For a machine with at least two registers, R1 and R2,the resulting codeis shown in Figure4.54. Note the instruction sequence Load-Reg R1 ,R2 ; Load-Reg R2,Rl,in which thesecondinstructioneffectivelydoesnothing,Such‘stupid’instructionsaregenerated often during code generation, usually on the boundary between two segments of the code. There are at least three ways to deal with such inst~ctions: im~roving the code generation
Load-Mem Add-Cons Load-Reg
a ,R1 t1 ,R1 R1 ,R2
Load-Reg Mult-Reg Add-Mem Add-Mem Store-Reg
R2 ,R1 R2 ,R1 b,R1 c,Rl R1 ,x
Load-Reg Add-Const Mult-Mem Store-Reg
R2 ,R1 1 ,R1 d,R1 R1,y
Code generated for the program segment of Figure 4.41.
algorithm; doing register t r ~ c ~ ias ~ gexplained , in the last paragraph of Section 4.2.4.3; and doing~ e e ~ ~ u ~Z tei ~ i z ~ as t i will o ~ ,be explained in Section 4.2.12. A more general and better way to map pseudo-registers onto real ones involves doing moreanalysis. NOWthatthedependencygraphhasbeenlinearizedwecanapply live ~ ~ ~ Z y as s i described s, in Section 3.2.5, to determine the live ranges of the pseudo-registers, and apply the algorithmsfrom Section 4.2.7 todo register allocation. For comparison, the code generated by thefull optimizing version of the GNU C compiler gcc is shown in Figure 4.55, converted to the notation used in this chapter. We see that i s has avoided both Load-Reg R 2 ,R1 instructions, possibly using register tracking.
Load-Mem Add-Const Load-Reg
a ,R1 1,Rl R1,
Mult-Reg Add-Mem Add-Mem Store-Reg
R1 ,R2 b,R2 c,R2 R2 ,x
Add-Const Mu1t-Mem Store-Reg
1 ,R1 d ,R1 Rl,y
R2
Code generatedby the GNU C compiler, gcc.
Pointers cause two different problems for the dependency graph const~ctionin the above sections. First, assignment under a pointer may change the value of a variable in a subsequent expression: in a = x * y ; *p = 3; b = x * y ;
x * y is not a common subexpression if p happe~sto point to x or y. retrieved from under a pointer may change after an assignment: in a = *p * g; b = 3; e = *p * g ;
*p * q is not a common subexpression if p happens to point tob.
Static data-flow analysis may help to determine if the inte~erencecondition holds, but that does not solve the problem entirely. If we find that the condition holds, or if, in the more usual case, we cannot determine that it does not hold, we have to take the interference into account in the dependency graph construction.Ifwe do this, the subsequentcode generation algorithm of Section 4.2.5.2 will automatically generate correct code for the basic block. 'The intederence caused by an assignment under a pointer in an expression can be inco~oratedin the dependency graph by recognizing that it makes any variable used in a subseq~entexpression dependent on that assignment. These extra data de~endenciescan be added to the dependency graph. Likewise, the result of re~ievinga value from under a pointer is dependent on all preceding assignments. Figure 4.56 shows a basic block similar to that in Figure 4.41,except that the second ~~~e~ x rather than to x. The data ~ependencygraph in Figure 4.57 a s s i ~ ~ m eassigns nt features two additional data dependencies, leadi from the variables n and d in the third andfourthexpressiontotheassignmentunder ointer. Theassignmentitself is marked with a *;note that the x is a normal input operand to this assi nment operation, andthat its data dependency is downward.
{
1
int n; n = a + l ; *x = b + n*n n = n + l ; y = d * n ;
+
c;
Figure 4.56 Sample basic block with an assign~entunder a pointer.
-.
-.
_"""""
7 Data d e p e ~ d e n c ~raph with an a ~ ~ i g ~ under ~ e natpointer.
Since the n inthethird e~pressionhasmore data expression two, it is not a c o ~ o subexpression, n two. As a result, the variable n cannot be e ~ i l ~ i n a h, Figure 4.58. Where the dependency grap ence x, +,+, *,this sequence is now not av has an i n c o ~ i ndata ~ depe~dence.The only available s the co~espondingcode and rernovi the = :* node. This makes the sequ ingthenode n, sincethelatterhastwoincornin sequence is n,= :,+. The r e s u l t i ~code ~ can be foun The codefeatures a pseudo-inst~ction Ins~uction
Store-Indirect-Mem
whichstores the contents of
Actions
R,,x
" x :=R,;
r R, underthepointerfound i ave such an instruction, but
ee that the code d replaced by more expensiv
x
Y
*
a
1
Cleaned-up data dependency graph with an ~ s s i ~ n under ~ e ~at pointer.
a,R1
Load-Mem Add-Cons t Store-Reg
R1,n
Load-Mem Mu1t-Mem Add-Mem Add-Mem Store-Indirect-Mem
n,R1 m,R1 b,R1 c,R1 Rl,x
Load-Mem Add-Cons t Mu1t"em Store-Reg
n,R1
1,r1
1,r1
d,R1 R1,Y
,59 Target code generatedfor the basic blockof Figure 4.56.
ter' in C, no pointer to it can be obtained,so no assign~~ent under a pointer can affect e same applies to local variables in languages in whichno point~rsto local variables can be obtained. Also, if the source language has strong typing, one can restrict the added dependencies to variables of the same type as that of the pointer under w h i c ~the assignof variables an assign~entcan possibly ment took place, since that type defines the set
P affect. This concludes our discussion of the heuristic optimized code generation for basic e will now turn to a very efficient method to generate optimal code for expressions.
3 . 2 we have shown how bottom-up tree rewriting can convertan A hmetic ex~ressioninto an inst~ctiontree which can then be linearized. In o used only very simple machine instructions, with the result that the tree rew cesswascompletely deter~inistic. In practice,however,machinesoftenhave a great variety of i~structions,simple ones and CO icated ones, m d better code can if allavailable inst~ctionsareutilized.chinesoftenhaveseveralhu ~ a c h i n e i n s t ~ c t i ooften n s , each with ten or more addressing modes, and it eous if' codegenerators for suchmachinescouldbederived from a concise descriptionratherthanwrittenbyhand. It turnsoutthatthe com~inationof ottom-up pattern matching and d y n a ~ i cprograml~ingexplained below allows precis el^
small set of instructions of a varied nature; the setis more or less ~achines,large enough to show the principles involved and small eexplanationmanageable. For each inst~uctionweshow tics in the form of a formula, its cost of exec t r q units, its name, both a~breviatedand in full, and a number label in ourpatternmatchingalgorithm. Since wewillbematchin 11, each node in the AST of an instruction has been given a label: instruction, the simple label goes to its top node and the other nodes are labele tosomescheme.For examp the Nult-scaledits only subnode is labeled .1. We will call the ee, becausewewillusethes Ts as patternsin a p s an aside, the cost figures in igure 4.60 suggest that on ~ e m o r ycosts 3 units,multiplicatiocosts 4 units,addition is apparently done in parallel with other CPU activities, and if an instruction contains tw m~ltiplications,their activities overlap a great deal, Such conditions and theco~espondin i~egularitiesin the cost structure are fairly common.If the cost structure of the i n s t ~ c t i o ~ set is such that the cost of each inst~ctionis simply the sum of the costs of its a p p ~ ~ n t s no gain in choosing co~bined inst~ctions, and simple code generat real-world machines are more baroque, for better or for worse. The AST contains three types of operands: mem, which indicates the co~tentso C S t,which indicates a constant; and reg, which indicates the conte h inst~ctionyields its (single) result in a register, which is used as t nother instruction7 or yields the final result of the expression to be compiled. The instr~ctionset shown here has been restricted to addition and multiplicatio instructions only; this is sufficient to show the algorithms. The 'scaled register' i
Rn
#l est
#2
mem
Load-Const est, R,
load constant
cost = 1
Load-Mem mem, R,,
load from memory
cost = 3
Add-Mem mem R,
add from memory
cost = 3
Add-Reg RI,R,
add registers
cost = 1
Mult-Mem nzem, R,
multiplyfrommemorycost
I
#4
/+\
R,
R1 12
*
#
/\
Mu1
#7
/+\
t-Reg R,, R,
Add-Scaled-Reg cst ,R, R, I
=6
multiply registers
cost = 4
addscaledregister
cost = 4
R,
est/*\
R,
B
Mul /*\
t-Scaled-Regcst ,R, ,R,
multiplyscaledregistercost
Sample inst~ction patterns for BURS code
en era ti on.
=5
tions #7 and #8 are s o ~ e w h a unnatural, t and are i n t r o ~ u c eonly ~ for ote that it is quite simple to describe an inst~ctionusi inear nature of the AST; this is necessary if we inst~ctionsto an auto~aticcode g e n e ~ ~ t ogenerator. r ~nstruction be specified bya line containing fourse~icolon-separatedfields: reg +:= (cst*regl); Add-Scaled-Reg cst,regl,reg; 4; Add scaled register
The firstfieldcontains eno i n f o ~ a t i o nto co~struc thesymbolic inst~ctiontoissued;thethird field is an e yields the cost of the inst~ction;and the fourth field is the be used in diagnostics, etc. The third field is an expression, to be evaluated inst~ctionis considered, rather than a fixed constant. an i n s t ~ c t i dependent o~ on the context.
(cst ==
1.
II
cst == 2
II
cst == 4) ? 3
: 4
Another form of context could be a compiler flag that indicate should optimize for program size rather thanfor p r o g r a ~speed. then be: O p t i ~ i ~ ~ F o ~? ~3 p :e (cst e~
>
in which the 3 is an indication of the time consu~ptionof arethe instructio~sizes for smallandnon-smallvalue numbers suggest thatc s t is stored in one byte if it fits in 7 not u n ~ ~ s u~angement). al ing to generate code fo
e. A rewrite of th described by attaching the instruction node labels to the n to see that there are many possible rewritesof our input tr ure 4.60. For e ~ a ~ p lFigure e, 4.62 shows a naive rewrite, wh ,and #6 only;these co~espondto those of a pure results in 7 inst~ctionsand its cost is 17 units, using line~ization,as obtained following the t e c h ~ i ~ uexplai es register allocation in~ection4.2.4.2, is shown in Figure 4.63. ure 4.64 illust~atesanother rewrite possibility. st-fit algo~thm:starting from the top, was chosen,andtheoper theoperatorsinthetree r e ~ u i r e ~ e n of t s that inst~ction. Thisforce rewrite is better than the naive one: it uses 4 cost is l 4 units. On the other hand, the top-down l
b
/+\* 4
/\ /\ *
8
a
Input tree for the BURS code generation.
+ #4
l
#l 4
/\R l l
R #l 8
#2 a
Naive rewrite of the input tree.
Load-Cons t Load-Hem Mult-Reg Load-Const Mult-Reg Load-Mem Add-Reg
8,R1 a,R2 R2 ,R1 4 I R2 R1 I R2 b, R1 R2 ,R1 Total
;1 ;3 ;4 ;1 ;4 ;3
unit units units unit units units ; 1 unit = 17 units
3 Code resul~ingfrom the naive rewrite.
rewrite the top of the tree in such a way that no rewrites can be found for the bottom parts; This discussion identifies two main problems:
#2
i: b
* #5
4 /*\
/\
'i
#l 8
Top-down largest-fit rewrite of the input tree.
Load-Const Mu1t"em Load-Mem Add-Scaled-Reg
8,Rl a,Rl b,R2 4,Rl,R2 Total
;1 ;6 ;3 ;4
unit units units units = 14 units
.65 Code resulting from the top-down largest-fit rewrite.
do we find all possible rewrites, and how do we represent them? It will be clear t we do not fancy listing them all! do wefindthebest/cheapestrewriteamongall ~ossibilities,~ r e f e ~ in ~ btime l~ mear in the sizeof the expression to be translated?
y a form of botto~-uppattern m . Thistechnique is known as a in particular, the code is generat tree: s bottom-up and identifies possible instruci s top-downandselectsat
each nodeone llected during the previous scan; bottom-up and emits the inst~ctionsin the
correct linearized order. ach of the scans can be i~plementedas a recursive visit, the first and the third ones as post-order visits and the secondas a pre-order visit. The instructi0~-collectingscan is themost i~teresting,andfourvariantswillbe
first variant finds all possible inst~ctionsusing item sets s all possible instructions using a tree automaton (Section ists of the first variant followed by a bottom-up scan that identifies the best tions using dynamic programming, (Section 4.2,6.3), and the final onecome third into a single efficient bottom-up scan (Section 4.2.6.4).
tween each pair of characters a set of in which a position was marked by a nized from the part we still hoped to
d at thebe~inningof Section 4.2.6. resent in the label set at node N in attern tree I (including the node L ) N. Moreover, we hope to be able pattern tree co~espondsto a reg-
o ~ ~ ~ i o n d e t which e ~ i n inst~ctions es can accept the result as an operand. For register, its top node cannot be to be inamemorylocation. the location in which the result mber #7 as an example. The igure 4.66a. When the label e write the compound label re 4.66b. When there is no after a character from the
The most obvious one thesecondonatree, ble token s t a ~ i n gat a
c
#7-
>reg
#7.1
The dotted trees c o ~ ~ s p o n dto i n#'7+reg ~ and to #"7.l.
given position; we then make that decision state. In treematchingwekeep all po reaches the top of the input tree and we line code for this bottom-u~pattern rec the co~espondingtyp
final and rest
node: the operand should bea memory loca only one such pattern: the variable can be there are twolabels here, +mem The rewrite ~ossibilitiesfor the n +c st for the constant itself and # 1+r The label sets for nodes 8 and a are obtained similar~y. The lower * node is next and its label set is more interest in^. We scan the set of pattern trees again for nodes that could match this node: should have two operands. We find five such nodes: see that we can match our node to the top node of pattern tree #5:
-its left operand is required to be a re~ister,and indeed the label ant 8 in the input tree shows thata register can be found erand is required to be a memory location, the presenc tree is confir~edby the label +mem in the node with variablea. This match results in the additionof a label #5+reg to our node. Next we match our node to the t node of inst~~ction # ~ the : rig required to bea register, and the label can be made available. Next we rec requires a constant for its left operand, which is c o n f i ~ ~ ebyd the l
PROCEDURE Bottom-up pattern matching (Node) : IF Node isan operation: Bottom-up pattern matching (Node .left); Bottom-up pattern matching (Node .right); SET Node .label set TO Label set (Node); for ELSE IF Nodeis a constant: 0 ; SET Node .label set TO Label set for constant ELSE Node is a variable: SET Node .label set TO Label set for variable 0 ; FUNCTION Label set for (Node) RETURNING a label set: SET Label set TO the Empty set; FOR EACH Label IN the Machine label set: FOR EACH Left labelIN Node .left .label set: FOR EACH Right label IN Node .right .label set: IF Label .operator= Node .operator AND Label .operand= Left label .result AND Label .second operand= Right label .result: Insert Label into Label set; RETURN Label set; F ~ C T I O NLabel set for constant(1 RETURNING a label set: SET Label setTO { (No operator, No location, No location, I1Constantr1) }; FOR EACH Label IN the Machine label set: IF Label .operator= lrLoadfl AND Label .operand= "Constantrr : Insert Label into Labelset; RETURN Label set; F ~ C T I O NLabel set for variable( ) RETURNING a label set: SET Label setTO { (No operator, NO location, No location, lrMemorylf) }; FOR EACH Label IN the Machine label set: IF Label .operator= rlLoadll AND Label .operand= lrMemoryrl : Insert Label into Labelset; RETURN Label set; .67 Outline code for bottom-up pattern matching in trees.
node, and a register as its right operand, which is also there; this adds the label the same reasoning we recognize subnode lf.8.1,but we fail to match node #8 to our node: its leftoperand is aregister,which is availableatthe 4 node,but its rightoperand is #$ .l,and #B. 1 is not in the label set of the a node. e next node to be visited by the bottom-~~p pattern matcher is the higher * node, where the situation is similar to that at the lower * node, an where we i ~ ~ e d i a t erecogly nize the top node of i ~ s t ~ c t i kt6 o ~and s the subnode #7.1. ut here we also recognize the top node of i ~ s t ~ c t i #8: o n the left operandof this top nodeis a register, whichis available,
TYPE operator: llLoadll +‘, ’*’ ; TYPE location: llConstantll llMemoryll llRegisterll a label; I
// anode ina TYPE label: an operator FIELD operator; a location FIELD operand; a location FIELD second operand; a location FIELD result;
pattern. tree
8 Types for bottom-up pattern recognitionin trees.
and is #8.1, which is indeed in the label set of the right operand of e the left operand allows a constant and the right operand allows a ude subnode ft8.1. e top nodes of inst~ctions#4 and #7 for the top node of the input tree is now easy.
#4 ->reg
/\ /y +
#7->req
- #6->reg ti7.1
b
#.-;m;ng
*
g
*
4
8
->cst #l ->reg
# 5 ->reg #6 ->reg #7.1
a ->mem 112 ->reg
Label sets resultingfrom bottom-up pattern matching.
t we have obtained in the aboveinst~ction-collect in^ scan is
theinputtreewithsets of possible re~ritinginstructions concise recipe for const~uctin~ tree rewrites using a subse uent node of theinputtreegivesusthechoice of rewrit or by i n s t r ~ c t i o#7. ~ We could, for example, decide to rewrite by forces the b node and the lower * node to be rewritten to registers and the high in place. The label set at the b node supplies only one rewrit #2, but that at the lower * node allows two possibili hoosing inst~ction#5 resultsintherewritesh
#6 causes an additional rewrite of the a node using instrucucceeded in obtaining a succinct representation of all possible
ssible for the pattern set to be insuf~cientto match a given input empty set of rewrite labels at some node, in which case the ck. In practice, however, this is a non-problem since all real mall' inst~ctionsthattheyalonewillsufficetorewriteany Note, for example, that the instructions #I, ft2, #4, and #6 f rewriting any expression tree consisting of constants, varications.Also,the BURS auto~atonconstruction algo~thm allows us to detect this situatio~statically, at compiler con-
t the algorithm sketched above perfoms at each node an amount t of the size of the input tree. Also, the amount of space used to by the n u ~ b e of r possible labels andis also independent of the sequently, the algorith~is linear in the size of the input tree, other hand, both the work done and the space required are proportional to the n set, which can be co~siderable,and we would like to remove this ques from thelexicalanalysissceneproveagainvaluable;morein Ute allpossiblematches at code ge~erator ge~erationtime, chniques as in the generation of lexical analyzers. set of pattern trees (which is supdescriptio~),there is only a finite ,given the operator of Node in Label set operan~s,we can precompute the resulting 1 algo~thmfor lexicalanalyzers in sup~liedby thefunctions Label set f o r constant ieldconstantresults.(Real-world an operand, etc.) Usin bel sets as operands, we check all nodes in the pattern trees to of these operands. If they can, we note the relation and add label set to our setof label sets. We then repeat the process with our enlarged ets, and co~tinueuntil the process converges and no changes occur any more. the theoretically possible label sets are realized in this prolaced by numbers, the states. Rather than storing a label set e store a state; this reduces the space needed in each node ati ion to a constant and quite small amount. The result sionaltable,indexed by operator,leftoperandstate,andrightoperand s the state of the possible matches at the operator. for pattern matching ateach node to that of simple table
indexing in a transition table; the simplified code analysis, the table orithrn usesconstant and S Inanalogyto the -state auto~aton(FSA) used in lexical ~alysis the character list and c o ~ p u t e snew states from old states and input ch
PROCE~UREBottom-up pattern matching (Node) : IF Node isan operation: ottom-up pattern matching (Node .left); Bottom-up pattern matching (Node .right); SET Node .state TO Next state [Node .operand, Node left.state, Node .right.state] ; ELSE IF Node is a constant: SET Node .state TO State for constant; ELSE Node isa variable: SET Node .state TO State for variable; .70 Outline code for efficientb o t t o ~ - ~ pattern p ~ ~ t cin htrees. i ~ ~
With, say, a hundred operators and some t ~ o u s a ~states, d the an empty entry, the origi~alset of p
without slowing them down seriously.
the steps of the algorithm on larger trees, we see that we often recompute the optimal rewrites of the lower nodes in the tree. We could prevent this by doing m e ~ o i ~ a t i oon n the results obtained for the nodes, but it is easier to just precompute these results in a botto~-upscan, as follows. or each node in our bottom-up scan, we examine the possible rewrites as determined by the instruction~collectingscan, and for each rewriting instruction we establish its cost by adding the cost of the instruction to the m i n i ~ a costs l of getting the operands in the places in which the inst~ctionrequires them to be. ~e then record the best rewrite in the node, with its cost, in the form of a label with cost indication. or example, we will write the rewrite label #5+reg with cost 7 units as #5+reg@7. The ~ i n i m a costs l of the operands are known because they were preco~putedby the same algorithm, which visited the co~espondingnodes easlier, due to the ~ o t t o ~ - unature p of the scan. The only thing still needed to get the process started is knowing the ~ i n i m a costs l of the leaf nodes, but since a leaf node has no operands, its cost is equal to the cost of the instruction, if one is required to load the value, and zero otherwise. As withtheoriginal inst~ction-collecting scan(asshowninFigure4.69), this bottom-up scan starts at the b node; refer to igure 4.71. There is only one way to get the value in a register, by using instru~tion#2, and the cost is 3; leaving it in m e ~ o r ycosts 0. The situatio~at the 4: and 8 nodes is also simple (load to register by instr~ction#l, cost = 1, or leave as constant), and thatat the a node is equalto that at the b node. entries, #5+reg, 6+reg, #7.1, and
-A rewrite with pattern tree#5 (=instruction #5) requires the left operand to be placed in a register, which costs l unit; it requires its right operand to be in memory, where ;and it costs 6 unitsitself:together7units.hisresultsinthelabel
it
patte~ tree #6 again requires the left operand to be placed in a register, at ires its right operand to be placed in a register to which costs 3 units; and itself together 8 units. This results in the label .l and #8 .l do not correspond with top nod cannot get a value into a register, so no cost is attached to them. see that there are two ways to get the value of the subtree at the lower * node into a ster, one costing 7 units and the other 8. We keep only the cheaper possibility, the one with instl~ction#5, and we record its rewrite pattern and its cost in the node. We do not the rewrite possibility with instruction #6, since it can never be part of a rewrite of the input tree. A si~ilar sit~ation obtains at the higher* node: it can be rewrittenby instruction #6 at cost 1 (left operand) + 7 (right operand) + 4 (instruction) = 12, or by inst~ction#8 at cost 1 (left operand) + 3 (right operand) + 5 (instruction) = 9. The choice is obvious: we keep reject instruction #6. At the t get e again two possi~ilities: instructio~#$ and at cost (left 3 operand) + 9 (right + (instruction) 1 = 13 ) + 4 (inst~ction)= 1 at cost 3 (leftoperand) + 7 (right choice is again clear: we keep instruction#4 and reject instr~ction ow we have only one rewrite possibility for each location at each node, and we are
+
b
/\
#4->reg 013
#7.1
* #E->reg @9
->mem 00 #2->reg @3
->cst @O
8 ->cst 0 0
#l->reg 01
a ->mem 00 #2->reg @3
Bottom-up pattern matching with costs.
certain that it is the cheapest rewrite possible, given the instru have more than one instruction attached to them, and the next guity in atop-downinstruction-selectingscan,similartotheone de consider the result location required at the top of the input tree, whic donthis i n f o ~ a t i o n ,we ch turn decides the rewrite e top node is rewritten using instruction #4, which requires two register ope selects instruction a register, which decid put into registers, but remains a constant. The labels involved in the actual rewrite have been checked i ure 4 only thing that is left to do is to linearize the rewritten tree into an inst~ction sequence: the code-generation scan in our code generation scheme. As e~plainedin the su~sectionon weighted register allocation in Section 4.2.4.2, we can do this by a recursive process which for each node generates code for its heavier operand first, followed by code lowedbythe inst~ctionitself. The result is s h o in~ ~ gain over the naive code (cost 17 units) and top-down largest-fit (cost 14 units) is lies mainly in the a~ificiallysmall inst~ctionset of our example; real machines have much larger inst~ctionsets and consequently provide much more good pattern matching. algorith~has advantages over the other rewriting algorithms in that it provides optimal rewritin of any tree and that it cannot get stuck, provided the setof instruce techni~ueof finding the ‘best’ path through a graph
by scan~ingit in a fixed
Load-Mem Load-Const Mult-Scaled-Reg Loa~-Mem Add-Reg
a,R1 4,R2 8,Rl,R2 b,R1 R2 ,R1 Total
;3
units unit ; 5 units ; 3 units ; l unit = 13 units ;l
.72 Code generatedby bottom-up pattern matching.
a set of ‘best’ sub-solutions at each node is called
from the infor~ationat nodes that have already been visited. When all a very c o ~ approach ~ o to ~ all kinds of optimi~ationproblems. As already ove, it can be seen as a specific im~lementationof memoization. For a more of dynamic p r o g r a ~ i n g see , text books on ~~gorithms, for example
headditionoperator to saveanotherunit.The e it does not (yet) know about such c o ~ ~ t a t i v i t i e There s. : specify for each inst~ctionthat involves a c o ~ u t a t i v e e code generator generator, or mark c o ~ u t a t i v eoperators r generator and let it add the patterns. The it is more auto~atic,and is less work in the code generator will indeed produce theAdd” cost to 12, as shown in Figure
to find the cheapest possible rewrite for each type of hine on which the cost functions of the i n s ~ u c t i o are ~s
Load-Mem Load-Const Mult-Scaled-Reg Add-Nem
a,R1 4,R2 8,Rl,R2 b,Rl Total
;3
units unit units units = l2units ;1 ;5 ;3
igure 4.73 Code generated by bottom-up pattern matching, usingcorn~u~ativity'
erator generation time and a single state number suffices at each node. The two processes are combined by adapting the subset algorithm from Section 4.2.6.2 to generate a transition table Cost conscious n e x t s t a t e []. Thisadaptation is far fromtrivial, as we shall see.
o ~ ~ ithen ~ i~ ~ t t e~atc r n chi^^
~~~
i~~tr~ction
sel~ctio~
~l~orith~s
The first step in ~ o m ~ i n i nthe g two algorithms is easy: the cost of each label is incorporated into the state; we use almost the same format for a label as in Section 4.2.6.1: ~ ~ Z o c f f t i o ~ This ~ c oextension ~t. of the structure of a label causes two problems: l . Input trees can be arbitrarily complex and have unboundedcosts. If we include the cost inthelabel,therewill bean ~nboundednumber of labels and conse~uently an unbounded numberof states. 2. Subnodes like #7,1 and #8. l have no cost attached to them in the original algorithm, but they will need one here. We shall see below how these problems are solved. create the initial states. Initial states derive from inst~ctions S only,operandsthatareavailablewithout the intervention of further instru~tions. The mostobviousexamples of suchoperandsareconstantsand memory locations, but the program counter (inst~ctioncounter) and the stack pointer also come into this category, As we have seen above, each basic operand is the basis of an initial state. Our example inst~ctionset in Figure 4.60 contains two basic operands -constants and memory locations -and two instructions that operate on them Constants give rise to state S1and memory locations to state S2: State SI: +CSt@O #l+reg@l State S 2 : +mem@O #2+reg@3
We are now in a position to create new states from old states, by precomputing entries of our transitio~table. To find such new entries, we systematically consider all triplets of an operator and two existing states, and scan the inst~ctionset to find nodes that match the
cost of label n in the left state -Icost of label m in the right state -Icost of instruction p
and sucha form is c o l ~ p ~ to e dthe costof a rewrite cost of label q in the left state -Icost of label r in the right state + cost of instruction S
But that means that only the reZ~tivecosts of e labels in each state count: if the costs of all labels in a state are increased or reduced by the same amount the result of parisonwillremainthesame e sameappliestotheinitialstates.This ob allows us to normalize a state a consta~tamount from all costs in We shall n o ~ a l i z estates by subtracting the smallest cost it contains from each of the costs; this reduces the smallest cost to zero. Nor~alization states various the reduces g@6, reg@^, etc., to a single state #4+reg@0. Now this cost 0 no longer means that it costs 0 units to 0 c o ~ p ~ toe dother possi rewrite by pattern tree #;“c,but that that possibility has cost (of which there happen to be none). All this means that the top of the tree will no carry an indication of the total cost of the tree, as it did in igure 4.71, but we would not base any decision on the absolute valueof the total cost anyway, even i loss is not serious. It is of course possible to assess the total cost of a g scan, or even on the fly, but such action is not finite- state^ and require side the FSA. is Cost conscious next s t a t e * / , atetocompute s are #5,#6, #7.1,and #8.l ;the labels for #5 and #6 are #5+reg@1-1-0+6=7 #6”+reg@l+3-1-4=8
of whichonlylabel #5+reg@7 survives. ~ o ~ p u t i ne costs for thelabe nodes #7.1 and #8.1 involves the costs of the no es then~selves,which are u decide to localize the entire cost of an instruction in its top node, so the cost of the subnodes is zero. No cost units will be lost or gained by this decision since subnodes can in the end onlyc o ~ b i n with e their own top nodes, which then carry the cost.So the new state is #5+reg@7 #7.1@0+3+0=3 #8.1@0-1-3+0=3
which after normalization reduces to #5+reg@4 #7.1@0 #8.1@0
We continue to combine one operator and two operand states using the above techniques untilnomorenewstatesarefound.orthe inst~ctionset of Figure 4.60 this
yields l 3 states, the contents of which are shown in Figure 4.74.. The states SI, S2, S3, and S4 in our explanation correspond to S S 02, S 03, and S 5 , respectively, in the table. The state S o o is the empty state. Its presence as the value of an entry Cost cons cious next state [op, S,, S,] means that no rewrite is possible for a node with operator op and whose operands carry thestates S, and S,. If the input tree contains such a node, thecode €eneration process will get stuck, and to avoid that situation any transition table with entries S o o must be rejected at compiler €eneration time. A second table (Figure 4.75) displays the initial states for the basic locations supported by the instruction set.
Figure 4.74 States of the BURS auto~atonfor Figure 4.60.
c s t : so1 m e m : So2 Figure 4.75 Initial states for the basic operands.
The transition table Cost conscious next state E I is shown in Figure 4.76; we see that it does not contain the emptystate S oo. To print the three-dimensional tableon two-dimensionalpaper,thetables for theoperators -I- and * are disp~ayedseparat~ly. Almost all rows in the tables are identical and have already been combined in the printout, compressing the table vertically. Further possibilities for hori~ontalc o ~ p r e ~ s i oare n clear, even in this small table. This redundancy is c h ~ a c t e ~ s tof ic using the proper techniques, such tables can be compressed to
The last table, Figure 4.77, contains the actual rewrite information. It specifies, based on the state of a node, which inst~ctioncan be used to obtain the result of the expression in a given location. mpty entries meanthatno inst~ctionis required, entries with -mean that no inst~ctionis available and that the result cannot be obtained in the required location. Forexample, if a node is labeledwiththestate So2 anditsresult is to be delivered in a register, the node shouldbe rewritten using instruction #2, and if its result is required in memory, no inst~ctionis needed; it is not possible to obtainthe result as a constant.
S03
s07
S05
S03
s07
s03
sll
s03
s03
s07
S11 S07S07
s03
s03
s03
s03
s07
S07
S07
S11
F i g ~ 4.76 ~ e The transition table Cost conscious next
$11
state [1.
CSt
mern reg Figure 4.77 The code generationtable,
ode ~ ~ ~ @ r usin^ t i t~~ o ~cost-co~scio~s ~e~t-stut~ t~ble The process of ~eneratingcodefrom an inputtreenowproceedsasfollows.First all leaves are labeled with their co~espondinginitial states: those that contain constants with S o and those that contain variables in memory with So2, as specified in the table in Figure 4.75; see Figure 4.78, Next, the bottorn-up scan assigns states to the inner nodesof the tree, using the tables in Figure 4.76. Starting at the bottom-most node which has operator
#4
States and i ~ s ~ r uused ~ ~ini ~ ~ s
One of tlne most pleasant properties of the circumstances. We will give s getting the value of the expres essentially the same and can has two kinds of registers, A ppose, for example, th -registerscannot.Th tion whether it is an
rewriting with instruction#73. As a different example, suppose we want to use the size of the CO when two rewrites have the same r u n - t i ~ ecost (which happens often). i ~ ecode size). Now, when com~ari cost pair rather than a single cost value: ( ~ ~ n - tcost, costs, we first comp e the r u n - t i ~ ecost fields and if they turn out to be equal, we CO pare the code sizes. these are equal too, the two sequences are equivalent as to cost, and we can choose either. Another advantage of this method is that if we want to optimize for code size, we just compare them as ordered integer pairs with the first and the second element e~chan~ed. The run-time cost will then be used as a tie breaker when two rewrites require the same amount of code. Anadaptation in a completelydifferentdirectionagain is toinclude all mac~ine inst~ctions-flow of control, fast move and copy, conversion, etc. -in the instruction set and take the complete A of a routine (or even the entire compilation unit) as the input tree.Instructionselectiand linear~zationwouldthenbecompletely a~tomatic. Such S t~chnologyare still experi~ental. eatment of the theory behind bottom-u~tree rewriting is given by (19 9 ~ ) for ; the full theory see Aho and Johnson (1976). A more recent publication on the application of d y n a ~ i c ~ m i to n tree ~ rewriti Proebsting (1995). An interesting variation on the al~orithmis described Ganapathi, and Tjiang(1989). In thissectionwehaveassumedthatenoughregistersilable for anyrewritewe choose. For a way to include register allocation into the 4.25. o ode generation by c o ~ ~ b i n bottomin~ Thisconcludesourdiscussion matching and dynamic pro have e seen that it yields optimal code when applicable, but thatit isap~licableto pure trees only.
In the subsection on procedure-wide
registe~allocation in Section 42.42, we have seen
that naive register allocation for the entire routine ignores the fact that variables only need registers when they are live. On the other hand, when two variables are live at the same position in the routine, they need two different registers. We can therefore say that two variables that are both live at a given position in the program ‘interfere’ with each other when register allocation is concerned. It will turn out that this inte~erenceinformation is important €or doing high-quality register allocation. Without live analysis, we can only conclude that all variables have values at all positions in the program and tbeyall interfere with each other. So it is essential for good register allocation to have done live analysis on the variables. We will de~~onstrate the techof Fi nique of register allocation by graph coloring, using the program segment
a := read0; b := read0; G := read(); a := a + b + c; <: 10) { d := C + 8 ;
if (a
print (c);
} else if (a . :20) { e := 10; d := e -I- a; print (e) ; } else { f := 12; d := f + a; ; print (f)
1
print (dl ; Figure 4.79 A program segmentfor live analysis.
This program segment contains 6 variables, a through f ;the r () calls symbolize u~o~timizable expressionsandthe print ( ) calls unopti~zabl ableuse.Itsflow graph is shown in Figure4.80. In addition, the diagram shows the live ranges of the six vari as heavy lines along starts‘half-way’ thecode. A smallbut impo~antdetail is thatthe live range of a le through its first assign~ent,and stops ‘half-way’ through the a s s i g ~ m ein ~ twhich it is last used. This is of course because an assignment computes the valueof the source express” completely before assigning it to the destination variable. In other words, in y :=x*x live ranges of x and y do not overlap if this is the end of the live r a g e of x and the start of the live range of y.
b := reado/ . c := read0 a := a+b+c:
e := IO; d := e+a; print (e) ;
print (dl ;
0 Live ranges of the variables from Figure 4.79.
ouslyatanypointinthe allows usconstruct to ure 4.8 1.
tly which variables are live sirnultanec e, andthuswhichvariables interfere with each other. This a I" h of the variables, shown Figasin
f-
l
d-e
a-b
l\l
C
1 Register interference graph for the variables of Figure 4.79.
The nodes of this (non-directe~)graph are labeled with the variables, and arcs are drawn between each pairof variables that interfere with each other. Note that this inte~er-
FUNCTION Color graph (Graph) RETU~NINGthe Colors used set: IF Graph = the Empty graph: RETURN the Empty set;
// Find the Least connected node: SET Least connected node TO No node; FOR EACH NodeIN Graph .nodes: SET Degree TO0; FOR EACH Arc IN Graph .arcs: IF Arc contains Node: Increment Degree; IF Least connected node = No node OR Degree .= ~inimumdegree: SET Least connected node TO Node; SET ~inimumdegree TO Degree;
// Remove Least connected node from Graph: SET Least connected node arc set TO the Empty set; FOR EACH Arc IN Graph .arcs: IF Arc contains Least connected node: Remove Arc from Graph .arcs; Insert Arc in Least connected node arc set; Remove Least connected node from Graph .nodes; // Color the reduced Graph recursively: SET Colors used set TO Color graph (Graph); // Color the Least connected node: SET Left over colors set TO Colors used set; FOR EACH Arc IN Least connected node arc set: FOR EACH End point IN Arc: IF End point/= Least connected node: Remove End point .color from Left over colors set; IF Leftover colors set= Empty: SET Color TO New color; Insert Color in Colors used set; Insert Color in Left over colors set; SET Least connected node .color TO Arbitrary choice from Left over colors set; // Reattach the Least connected node: Insert Least connected node in Graph .nodes; FOR EACH Arc IN Least connected node arc set: Insert Arc in Graph .arcs; RETURN Colors used set; Outline of a graph coloring algorithm.
r e t u ~ i n gfromtherecursion.Therearetwoplacesinwhichthealgorithm is nondeterministic: in choosing whichof several nodes of lowest degree to detach and inchoosing a free color from C when kc I C 1 --l. In principle, the exact choice may influence the further path of the process and affect the total numberof registers used, but it usually does not. Figure 4.83 was constructed using the assumption that the alphabetically first among the nodes of lowest degree is chosen ateach disconnection step and that thefree color with the lowest number is chosen at each reconnection step. We see that the algorithm can allocate the six variables in three registers; two registers will not suffice since the values of a, b and c have to be kept separately,so this is optimal.
Graph: f - a
f - af - a " - - b
d - e
Stack:
f
f
d - e
C
e a c b
dd - e" - - e
C
b a c b
f
c b
f
"+%
d
l
"
d e fa dc eb d a ec a b c b
f
l
l
d2 c b a ce ba c b
f
l
d2-
el
fl-
a2
fl-
a2
fl-
d2-
el
d2- c l
el
d2"-
b
U
a2-
b,
.83 Coloring the interference graph for thevariables of Figure 4.79.
The above algorithm willfind the heuristically ~ i n i m anumber l of colors required for any graph, plus the way to use them, but gives no hint about what to do when we do not have that many registers. An easy approach is to let the algorithm run to completion and view the colors as pseudo-registers (of which there are an infinite number). These pseudo-registers are then sortedaccording to theusagecounts of thevariablestheyhold,andrealregisters are assigned to them in that order, highest usage count first; the remaining pseudo-re allocatedinmemorylocations.Ifour code hadtobecompiledwithtworegisters,we would find that pseudo-register 3 has the lowest usage count, and consequentl~b would not get a register and its value would be stored in memory.
Inthissection we havedescribedonlythesimplestforms of registerallocation throughgraphcoloringandregisterspilling.Muchmoresophisticatedalgorithmshave been designed, for example, by Briggs et al. (1989), who describe m algorithm that is linear in the numberof variables to be allocated in registers. Extensive descriptions can be found in the booksof Morgan (1998) and Muchnick (1997). The idea of using graph coloring in memory allocation problems was first p~blished by Yershov (1971), who applied it to the superposition of global variables in a machine with very little memory, Its application to register allocation was pioneered by Chaitin et al. (198 l).
The compilation methodsof Sections 4.2.5 to 4.2.7 are applied at code generation time and are c o n c e ~ e dwith rewriting the AST using appropriate templates. Supercompilation, on the other hand, is applied at compiler writing time and is concerned with obtaining better templates. It is still an expe~mentaltechnique at the moment, and it is treated here briefly because it is an exampleof original thinking and becauseit yields su~risingresults. At the beginning of Section 4.2, we explained that optimal code generation requires exhaustive search in the general case, alt~oughlinear-time optimal code generation techniques are available for expression trees. One way to make exhaustive searchfeasible is to reduce the size of the problem, and this is exactly what su ~ ~ c o ~ ~ does: i l a optimal ~ i o ~ code sequences for some very simple but useful functions are found off-line, during compiler construction, by doing exhaustive search. The idea is to focus on a very simple ~ t h m e t i cor Boolean function F, for which optimal code is to be obtained. A suitable set S of machine instructions is then selected by hand; S is restricted to those inst~ctionsthat do arithmetic and logical operations on registers; jumps andmemoryaccessare explicitly excluded.Next, all combinations of two instructions from S are tried to seeif they perform the required function F. If no combination works, all combinations of three instructions are tried, and so on, until we find a combination that works or run out of patience. Each prospective solution is tested by writing the instruction sequence to memory on the proper machine and trying the resulting small program with a list of say 1000 wellchosen test cases. Almost all proposed solutions fail on one of the first few test cases, so testing is very efficient. Ifthe solution survives these tests,it is checked manually; in practice the solution is then always found to be correct. To speed up the process, the search tree canbepruned by recognizingrepeatingorzero-result i n s ~ c t i o ncombinations. Optimal code sequences consisting of a few instructions can be found in a few hours; with some luck, optimal code sequences consisting of a dozen or so inst~ctionscan be found in a few weeks. A good example is the function sign (n),which yields +l for n > 0 , 0 for n = 0 , and -1 for n c 0. Figure 4.84 shows the optimal code sequence found by supercompilation on the Intel 80x86; the sequenceis surprising, to say the least. The cwd inst~ctionextends the sign bitof the %ax register, which is assumed to contain the value of n,into the %dx register. Negw negates its register and sets the carry flag cf to 0 if the register is 0 and to 1 otherwise. Adcw adds the second register plus the carry
;n
in register %ax cwd ; convert to double word: I (%dx,%ax) = (extend-s negw %ax ; negate: (%ax,cf) := ( - % a x adcw %dxl%dx ; add with. carry: %dx := %d
O ~ ~ i code ~ a for l the function S ign (n ) .
ase n > 0
%dx %ax initially:
cf
-
n
-
0
n
-
0
-n
1
-n
ase %dx
n = 0 %ax cf 0
-
0
0
-
l
0
0
l
0
0
_ .
ase
n
< 0
%
-
n
-
-1
n
-
0
-1.
-n
1
0
-l
-n
1.
cwd negw
%ax
adcw %dxl%dx
Actions of the 80x86 code from Figure 4.84.
e
Techni~ue
roblem
uality
xpression trees, usin re~ister-re~ister or ~ e ~ o r y - r e ~ iinstrucste~ tions ptimal ptimal euristic
lnst~ctionswith cost funcptimal euristic egister allocation when all
euristic
~ o ~ ~ ~ofisome s o code n eneration t~c~lliques.
Tbe desc~ptionof code ~ e n e r a t i o ~ t e c h n i ~in~ this ~ e sbookaints a rel~tively oder rate view of code ~ptimization. enerators often are much more a and use tens and sometimes h ues and tricks, each of w ~ i c hcan ple interfere with each of the other optimi~ations.Also, such code ~ e n e r ~ t ooften r s distinguish large numbers of special these special cases and the tricks c o ~ b i ~ a t i owith n s any of the 0th
o p t i ~ i z a t i o ~and s fails when in error lies
c
In one non-optimizi~~ c
when c o ~ p i l ~ d i ~ o ~ t cessaril~ the thatmean details on the
Figure 4.87 Incorrect C p r o g r with ~ ~ co~pil~tion-dependent effect.
just after the array A [1 0 ]. When during execution i reached the value 10, the assignment A f 1.0 1 = 2 *1 0 was pe~ormed,which updated i to 20, since it was located at the position where A [1.01 would be if it existed. So, the loop t e ~ n a t e dafter having filled the i was allocated array as expected. In another, more optimizing compilation, the variable in a register, the loop body was performed 20 times and i n f o ~ a t i o noutside A [] or i was ove~~tten. Also, an uninitialized variable in the program may be allocatedby chance in a zeroed predicta~ly location in one form of compilation and in a used register in another, with unpredictable results for the ~ n n i n gprogram. All this leads to a lot of confusion and argume~tsabout the de~arcationof responsibilities between compiler writers and compiler users, and compiler writers have sometimes gone to great lengths to isolate optimization errors. When introducing an opti~ization, it is impo~antto keep the non-opti~zingcode present in thecode generator and to have a simple flag allowing the opti~izationto be performed or skipped. This allows selective testingof the optimizations and anyof their combinations, and tends to keep the opti~zationsrelatively clean and independent, as far as possible. It also allows the following drastic technique, invented by Boyd and Walley (1993). A counter is kept which counts the number of opti~zationsapplied in the compilation of a program; at the end of the compilation the compiler reports something like ‘This co~pilationinvolved N optimizations’. Now, if the code generated for a program P malfunctions, P is first compiled with all optimi~ationsoff and run again. If the error persists, P itself is at fault, otherwise it is likely, though not certain, that the error is with the optimizations. Now P is compiled again, this time allowing only the first N / 2 opti~izations; since each optimization can be applied or skipped at will, this is easily implemented. If the error still occurs,the fault wasdependentonthe first N / 2 opti~zations,otherwise it binary search will thus lead us to depended on the last N-N/2 opti~zations. ~ontinued the precise o ~ t i ~ z a t i othat n caused the error to appear. Of course, this optimization need not itself be wrong; its m~f~nctioning could have been triggered by an error in a previous optimization, But such arethe joys of debugging,..
As we have seen at the beginning of Section 4.2, it may be useful to do preprocessin the i n t e ~ e d i a t ecode. We have seen that the interm~diatecode originates from source-
langua~e-depe~dent inte~ediate code ene era ti on, which removes most specificfeaturesandperformstheific optimi~atio~s required by the loopsandcasestatementshavebeeoved from imperat~ve~ r o g r ~ s , from functionalprograms,andunification from logicpro involvin~themhavebeendone.asically,onlyexpressions,if-state in. So preproc the i n t e ~ ~ e d i acode t e conce~tr~tes onthes
xp~essionsare consta~tf o l ~ i ~ e for~ c o~m ~ i l e - t i ~ e e~pression eval~ation. For example, most Cco~pilerswill c o ~ p i l the e routine is the traditional t
char lower-case-from-capital(char ch) return ch + (’a’ - ’A‘);
{
char lower-case-from-capital(char eh) return ch -I- 32;
{
1
as
1
since a has the integer value 9’7 and A is 65. Some compilerswillapply ~mutativityandassociativity mles to ex order to find co~stantexpression. ch compilers will even fold the constants I
I
char lower-case-f~o~-capital(charch) { return ch + ‘a’ - ’A’;
1
in spite of the fact that both co~stantsdo not sharea node in Constant folding is one of the simplest and most effe program~ersusually will not write constant expressionsdi arise from character constants, macro processing,symboli code ene era ti on.
that the semantic
. Some care has to be inprocess. the For ex
eneratedfrom specifications. In principle,constant extreme case of arith~eticsimpli~cation.
E + O
E * l E ** 1 l ** E
foldin
= $ E = $ E * E
*
l
.88 Some transfor~~tions for arithmetic ~ i ~ ~ l i f i c a t i o n ,
orithmsfor thereduction of multiplic~tiontoaddition are suppliedby dy (1977); generalized operator stren th reduction is discussed in depth
inated.
of R. To this end, a copy i s made of the of S in the place of the call. S o ~ e w
same, since only one call is replaced by eac replace the parameters in-line, in macro sub necessary to i ~ p l e m e n tthe p ~ a m e t e rtr uage. So the call node in S is rep1 properly, a block that results from cop handle the return value, if applicable. As an example, the C routine prin igure 4.90. Note thatthe naive ma -I-+* i++ ) would be incorrect, since shown that there are no other calls to
eliminated, but this may not be easy to determine, especially not in the presence of separate compilation. void S { ,*. print-square (i++);
...
void print-square (int n) { printf ( "square = %d\nl@ , n*n) ;
1
Figure 4.89 C code with a routine to be in-lined.
void S { ... {int n = i++; printf ("square = %d\n", n * n ) ;} ...
1
void print-square (int n) { printf ("square = %d\n", n * n )
1
;
.90 C code with the routine in-lined.
The obvious advantage of in-lining is that it eliminates the routine call ~ e c h a n i s ~ , which may be expensive on some machines, but its greatest gain lies in the fact that it opens the door to many new optimizations, especially the more advanced ones. For example, the call pr int-square (3 ) is in-lined to {int n = 3; printf ('Isquare= %d\nit, n * n ) ;}
which is transformed by constant propagationinto {int n
= 3;
printf ("square = %d\n",
3*3)
;}
Constant folding then turns thisinto {int n = 3; printf(llsquare= %d\n1I,g ) ; }
and code generation for basic blocks finds that the variable n is not needed and generates something like
“square = %d\n”
Load-par Load-par Call
*
9
printf
not always live upto the expectations of the implementers; see, for all and Torczon (1992). The reason is that in-lining can co~plicatethe r o ~ r text a ~ to ~ such an extent that some otherwiseeffective opti~izationsfail; also, inforneeded for opti~zationcanbelostintheprocess.Extensivein-liningcan, for te very large expressions, which may require more registers than are availonsequent degradation in performance. Also, duplicating the code of a soumay increasetheload on theinstruction cache. These are examples of conflicting The complications of in-lining function calls in C are discussed by Hwu and
Chang
is s i ~ l ato r in-lining in thata copy of a routine is made, but rather than using place a call, it is used to create a new routine in which one or more parametershavebeenreplaced by constants. The cloning of a routine R is useful when static alysis showsthat R isoftencalledwiththesameconstantparameter or parameters. se for example that the routine double power-series (int n, double a[l , double x) { double result = 0.0; int p; for (p = 0; p S: n; p++) result += aCp1 * (x ** p); return result;
1
n
u,X”
,is called with x set to I .O. Cloning it for this p ~ a m e t e yields r the
p=O
new routine double power-series-x-l (int n, double a[]). { double result = 0.0; int p; for (p = 0; p S: n; p++) result += a[p] * (1.0 ** p) ; return result;
1
etic si~plifi~ation reduces this to double power-series-x-l (int n, double a[1) { double result = 0.0; int p; for (p = 0 ; p S: n; p + + ) result += a [p1 ; return result;
1
ach call of the form ower-series (n, a, 1 . 0 ) is then power-series-x-l (n, a),and a moree transformation is useful even if there is only one constant p ~ a m e t e rcan be substituted in macro fashion, si have side effects. A large p r o ~ o ~ i of o nthe calls with constant par tines, and cloning is most effective when the complete ing the library routines,
As we have seen above (Fi ures 4.54 and
4 5 9 , even ~oderately so~histicated cod eration tec~niquescan produce stupid inst~ctionsequences like Load-Reg R2 ,R1
or Store-Reg Load-Nern
R1,n n ,R1
One way of reme~yingthis situatio~is to do postprocessing in the form of replaces ization. sequences of symbolic ~ a c h i in ~ b input questions: es. two what raises This what er we are i n s t r ~ c t i osequences; ~ an instructions to be replaced? The two questions can be answered inde~ende~tly.
. A replace~entpattern consists of three components:
ters that fulfill the conditions. Its applicatio~consists of replacing the ~ a t c h inst~ctions e~ by the instructions in the r e p l a c e ~ ~list, n t with the para~eterssubs will normally be shorter. An example in some ad-hoc notation is LOad-Reg R, ,Rb;
Load-Reg R,, Rd
I
Ra=Rd, Rb=R,=> Load-R
which says that if we find the first two Load-Reg instructions in t they refer to the same but reversed register pair, we should replace them(=>) instruction. It is t e ~ p t i n gto c o n s t ~ c at full set of replacement patterns for a
which can be applied to any sequence of symbolic l~achineinstructions to o efficient sequence, but there are severalp r o b l e ~ swith this idea, The first is that instruction sequences that sequences are rarer than one fight think. For ex increment inst~ctionIncrement R,, which in efore accepti~git as a replaceme~tfor Add-Const 1 ,R, instructions affect the condition registers of the machine in the same way and react to er overflow in the same way. If there is any difference, the replacement cannot be accepted in a general-pu~osepeephole optimizer. special-purpose and is used after a code generator th ters and is used for a language that declares ow undefined, the e the ment can be accepted withoutpro~lems. he second is thatwewouldenlike to accept r e ~ e n t that s patentlydo not do thing e as the original, r example, we W like to replace the sequence Load-Const l,R,; Add-Reg R,,R, by I n c ~ ~ r n ~ ~ ut t this is inco~ect sincethefirstinstructionsequenceleaves R, set to l andtheseoes not affectthat enou the s e c o ~ duse of register. however, If, the code generator is kind R, is its lastreplacement the use, is correct. nreplacement the pattern Load-COnSt l,Ra; Add-Reg Rb,R, Increment R,
1
R, = Rb, ~ ~ - l ~ ~ t - U S e => (~b)
Last-use infor~ationmay be readily obtaine when the code is being not be available to a general-purpose peephol he third is that code generators usually have a very ~imited re~ertoire of inst~uctio~ se~uences,and a general-pu~osepeephole optimizer contains many ~ a t t e that ~ s will just never match anything that is generated. e~lacementp a t t e ~ can s be created by ~ ~ orn~ ed~ ~ r abt e d postprocessing, a handw~tten replace~ent patternsetsuffices. structed by somebody with a good nowl ledge of the ma chin^ in pages of generatedcode.Goodreplacementpatternsthen ea Experience shows (Tanenbaum et al., 1982) that about a hundred take care of almost all correctable inefficiencies left by a relative1 xperience shown also (Davidson has an that s e ~ c h i n gfor clever peephole optimizations is ente~ainingbut of most useful o~tirniz~tions are generally obvious. eplacement patterns canalso be derive a process generation code similar to by bo o, three, or more instruction trees are combined into one tre If this rewrite hasa lower total cost than the rnent pattern. The process is described by This auto~aticprocess is especially peep~ole opti~~ization. An example is the use of peephole opti~izationto s u b s ~ the ~~e entirecodegenerationphasefromintermediatecode to machineinstructions ( 1990). In this process, the inst~~ctions of the inter~ediatecode and the tar
instructions together are considered instructions of a single imaginary machine, with the provisothatany i n t e ~ e d i a t ecode instruction is moreexpensivethananysequence of machine instructions. A peephole optimizer is then used to optimize the i n t e ~ e d i a t ecode instructions away. The peephole optimizer is generated automatically from descriptions of both the intermediate and the machine instructions. This combines code ge~erationand peephole optimization and works because any rewrite of any inte~ediate inst~ctions to machine instructions is already an improvement. It also shows the interchangea~ilityof some compiler construction techniques. S
We will now turn to techniques for locating instruction sequences in the target inst~ction list that match any of a list of replacement patterns; once found, the sequence must be replaced by the indicated replacement. A point of consideration is that this replacement may cause a new pattern to appear that starts somewhat earlier in the target instruction list, and the algorithm must be capable of catching this new pattern as well. Some peephole optimizers allow labels and jumps inside replacement patterns: GOT0
La; Lb:
I
La = Lb => Lb:
butmostpeepholeoptimizersrestricttheleft-handsideofareplacementpatterntoa sequence of instructions with the property that the flowof control is guaranteed to enter at the first instruction and to leave at the end of the last instruction. These are exactly the re~uirementsfor a basic block, and most peephole optimization is done on the code produced for basic blocks. The linearized code from the basic blockis scanned to find left-hand sidesof patterns. When a left-hand side is found, its applicability is checked using the conditions attached to the replacement pattern, and if it applies, the matched instructions are replaced by those in the right-hand side. The process is then repeated to see if more left-hand sides of patterns can be found. The total result of all replacements depends on the order in which left-hand sides are identified, but as usual, finding the least-cost result is an NP-complete problem. A simple heuristic ordering is to find the first place in a left-to-right scan at which a matching lefthand side is found and then replace the longest possible match, The scanner must then back up a few instructions, to allow for the possibility that the replacement together with the preceding instructions match another left-hand side. We have already met a technique that will do multiple pattern matching efficiently~ choosethelongestmatch,andavoidbackingup -usian FSA; andthatiswhat peepholeoptimizersdo.Sincewehavealreadydiscusseveralpatternmatching rithms, we will describe this one only briefly here. The dotted items involved in the matching operation consistof the pattern instruction lists of the replacement patterns, without the attached para~eters;the dot may be positioned between two pattern instructions or at the end. We denote an item by P ,.. with P, for the i-th instruction in the pattern, and the input Iby I ... IN. The set o between the two input instructionsI, and I,+l contains all dotted items P ...Pk which P ...Pk matches ...I,. To movethissetovertheinstruction the items for which Pk+i matches I,+l, andweaddallnewitems PI
~ a t c h e sIncl. When we find an item with the dot at the end, we have found a matching p a ~ t and e ~ only then are we going to check the condition attached to it. If more than one ncluding conditions, we choose the longest. replaced the pattern instructions by the replacement instructions, we can start our scan at the first replacing instruction, since the item set just before it s u ~ a r i z e s allpartly match in^ patterns at thatpoint. No backingupoverprevious inst~ctionsis required.
optimizer is in~erselyproportional to the quality of the code he i ~ ~ o ~ aofn ac peephole e lded by the codegene~ation phase. A good code generatorrequires little peephole tion, but a naive code generator can benefit greatly m a good peephole optime compilerwriters(DavidsonandWhalley, 1989; vidsonandFraser, 1984.b) quality compilers from naivecode generation followed by aggressive peephole optimi~ation.
heresult of theabove compi~ation efforts is thatoursourceprogramhasbeen tralls~or~ed into a linearized list of target machine inst~ctionsin some symbolic format, usual re~resentationis an m a y or a linked list of records, each describing a machine i ~ s t ~ c t i oinna f o ~ a that t was decided by the compiler writer; this fonnat has nothing to do with the actual bit patterns of the real machine instructions. The purpose of compilation in an executabl~object file with se~anticscorrespondi~gto that of the ch an object file contains the bit patterns of the machine instmctio~s ut of the code generation process, embedded inbinary-encod~dinfor~ a t i othat ~ is partly progra~-dependentandpartly o~erating-system-dependent. For ailers are OS-dependent, in~or~lation about calls to library rout, and the format in which this i n f o ~ a t i o nis specified is again t machine code generationis the conversion of the machine code in at into an object file. Since code selection, register allocation, and in principle. already been done, this conversionis s~aightfor~ard is a lot of work, and since it involves specifying hundredsof bit paterns, ~ ~ o r ~ p r work o n e at that. In short, it should be avoided; fo~unatelythat is easy to do,
a programthat accepts lists of most all systemsfeatureatleastoneassembler, chine code inst~ctionsand su~ounding info~ation in character code format and genthem. These ~ u ~ a n - r e ~ d alists b l e of symbolic chine instmce machine instructions we have seen above were in some by generating assembly code as the last stage of our code ration process we can avoid writing the target machine code generation part of the ork of the people who wrote the assembler. In addition to nvolved in the c o n s ~ c t i o nof our compiler we also gain a useful interface for checkinganddebuggingthegeneratedcode:itsoutputinreadable
assembly code. It is truethatwriti e assemblyoutputto file andcallin the job slowsdown th ilationprocess,butthe costs areoften soft~~e~engineering benefits. Even if no assembler is available, as experimental machine, it is probably worth while to first write it asthe final step in thecompilationprocess.ing so partitionsthe interface usefulin const~ctingthe compiler,yieldsan asse~~ler, d which can also be used for other compilers. piler is available on the t attractive to take this idea a step further, by chan ss: ratherthangeneratin i ~ t e ~ e d icode a t ~from theannotated ++ code from it, which all optimization and target machine code ge C and C++ here, since these are probabl andthemostwidelyavailablecornpi s name as the ~latform-independentasse~bler Code generation into a higher-level langu attractive for compilers for non-i~perativelang logical, distri~uted,and special-purpose languages produce the approach can also be useful for imperative and object first C++ compilers produced C code and even a heavily for C itself could generateC code in whichall chec~ing cit. In each of these situations the savi enormous. On the down side, using C a factor of two slower than those that may be up to machine code. in (1988) gives a case study of C as a target l language SAIL, and Tarditi, Lee and Acharya trans la tin^ ~tandardML. If, for some reason, the compiler should do its own object file techniques can be applied as those used in an assembler. The constru discussed below.
An assembler, like a compiler, is a converter from source the usual compiler const~ctiontechniques are applicable in assembler const~ction;the include lexical analysis, symbol table m a n a ~ e ~ e nand t, ences too, though, resulting from the relative si~plicityo tive comple~ityof the tar et format, Assemblers are best t an a s s e ~ b ~ eisr still severalstepsawayfromthe s e~ecuting.~e willstartfromthissituationand and the tasksof an assembler. : A runnin~program CO ts of four c o ~ p o ~ e n tas CO data segment, and a set of isters. Thecontents of the
in
source code and are usually immutable; ften exte~dibleto allow dynamiclinking. The contents of the le and start off empty. e of the are also mutable and are refilled from the literals the sou . The contents of the registers usually start off ~ n i n zeroed. The code and the data relate to each other through addresses of loc rnents. These addresses are stored in the machine instructions and in the data segment. Most operating systems will set the re mana~erunit of the machine in such a way that the address spaces of the code and data segments start at zero for each running progral~,regardless of where these segments are located in real memory. is initiated by loading the contents of an execl~tablecode file into . The loader is usually an inte~ratedpart of the opera tin^ system, visible, and its activation is implicit in call in^ a pro should not forget that it is there. As part of the o~eratingsystem, it has special ~rivileges. Allinitializedparts of the ~rogramderive from theexecutablecode file, inwhich all addresses should be based on segments start in^ at zero. The loader reads these segments from the executable code file and copies them to suitable m~mory astack s e g ~ e n t and , jumps toa predeterm~nedlocationinthec se~ment,to startthe program. So the executable code file must contain a code se ment and a data se may also contain other indications, for example the initial stack size and theexecutio~start address. The executable code file derives from c o m b i ~ i ~one g or and ~ ~ o b a bsome l y l i b r ~ yobject files, and is constructed by a mal user ~ r o g r awithout ~, any privileges. A11 o~eratingsystems provide at least one, and most traditional compilers use this standard linker, but an increasi~gumber of com~iling systems come with their own linker. The reason is that a S eciaiized linker can check that the proper versions of ~suallydesigned for F 0 Each object file carries its own code and data segl~entcontents, and it is the task of the linker to combine these into the one code segment and one data~egmentof the executable code file. The linker does this in the obvious way, by making copiesof the segl~ents, con catena ti^^ them, and writing themto t e executable code file, but there are two complications here. (Needless to say, the object file generator and the linker have to agree on the format of the object files.) The first com~licationc o n c e ~ sthe addresses inside code and code and data in the object files relateto each other through addre in the executable code file do, but since the object files were creat they will be linked into an execL~table ment of each object file starts at zero. object files except the first one havet ents from different object files are linked together. ose, for example, that the length of the code ent in the first object file a .o is 1000 bytes. Then the second code object se file b . o, will start at thelocationwithmachineaddress1000.Allitsinternaladdresseswereori *
puted with 0 as start address, however, so all its internal addresses will now have to be increased by 1000. To do this, the linker must know which positions in the object segments contain addresses, and whether the addresses refer to the code segment or to the data segment. This i n f o ~ a t i o nis called r e ~ o ~ ~ t in~orm~tion. ion There are basically two formats in which relocation information can be provided in an object file: in the form of bit maps, in which some bits correspond to each position in the object code and data segments at which an address may be located, and in the form of a linked list. Bit maps are more usual for this purpose. The second complication is that code and data segments in object files may contain addresses of locations in other program object files or in library object files. A location L in an object file, whose address can be used in other object files,is marked with an e ~ t e ~ xtemal symbol looks like an identifier. The int. Object files can refer to L, by using an . Object files contain information about the external symbols they refer and tothe external which r they provide entry points. This information is stored in an e ~ t e r Sn ~ ~ For example, an if object file a . o contains routine p r i n t f at location 500, the file contains the explicit information in the external symbol table that it refers to the external symbol - p r i n t f at location 500. And if the library object file p r i n t f .o has the body of p r i n t f starting at location 100, the file contains the explicit information in the external symbol table that it features the external entry point - p r i n t f at address 100. It is the task of the linker to combine these two pieces of information and to update the address at location 500 in the copy of the code segment of file a . o to the address of location 100 in the copy of p r i n t f .o, once the position of this copy with respect to the othercopieshasbeenestablished. The prependedunderscoreintheexternalname p r i n t f is a convention to mark external symbols, he linking process for three code segments is depicted in Figure 4.91; the segments derive from the object files a .0, b .0, and p r i n t f .o mentioned above. The length of the code segment of b . o is assumed to be 3000 bytes and that of p r i n t f .o 500 bytes. The code segment for b .o contains three internal addresses, which refer to locations1600, 250, and 400, relative to the beginning of the segment; this is indicated in the diagram by having relocation bit maps along the code and data segments, in which the bits corresponding to locations 1600,250, and 400 are marked with a C for ‘Code’. Segments usually contain a high percentage of internal addresses, much higher than shown in the diagram, which is why relocation bit maps are more efficient than linked lists. The code segment for a .o contains one external address, of the external symbol - p r i n t f as described above, The code segment for p r i n t f o contains one external entry point, the locationof - p r i n t f . The code segments for a.o and p r i n t f .o will probably also contain many internal addresses, but these have been ignored here. The linkingprocessfirstconcatenatesthesegments.Itthenupdatestheinternal addresses in the copiesof a .0,b. 0,and p r i n t f .o by adding the positionsof those segments to them; it finds the positions of the addresses by scanning the relocation maps, which also indicate if the address refers to thecode segment or the data segment. Finallyit stores the external address of - p r i n t f ,which computes to 4100 (=1000+3000+100),at location 100, as shown.
a.o
o~ reference to printf _ I
1000
i__l
0
b.o
'
0
1000
/
4000
3000
p r i n t f .o
500
entry point srintf
codeoriginal seg~ents
relocation bit maps
resulting executable code seg~ent
.!H Linking three code segments. e see that an objectfile will contain at leastfour com~onents:the code se ment, the relocation bit map, and the external symbol table. e are now in a position to discuss issues inthe const~ctionof asse~blersand linkwill not go into the c o n ~ t ~ c t i oofnloaders, since they hardly require any special t e ~ h ~ i ~ and u e $are almost universally s u p p ~ i with e ~ the opera tin^ system.
n asse~blerconverts from symbolic machine code to binary machine code, and from s y l ~ ~ o ldata i c to binary data. In principle the conversion is one to one; for example the assembler instruction add1
%edx,%ecx
which does a 32-bit addition of the contents of the %edxregister to the %ecx register, is converte~to the binary data
0 0 0 0 0 0 0 1 11 0 1 0 0 0 1
(binary) =
01
Dl ( ~ e x ~ ~ e c ~ ~ ~ ~ )
The byte 0 0 0 0 0 0 0 1 is the operation code of the operation addl,the next two bits 11 mark the inst~ctionas register-to-register7 and the trailing two grou and 0 0 1 are the translations of %edxand %ecx. It is more us lation in hexadecimal;as shown above, the instructionis 0 ID1 translationscanbelooked up in tablesbuilt into the assembler. languages, there aresomeminorcomplicationsduetothe overlo names, which have to be resolved by considering the types of the Q the ~ a n s l a t ~ d i n s t ~ c t are i o npacked s closely, with no-op inst~ctions requirementswouldleavegaps. A that does nothing (except perhaps was The conversion of symbolic data to binary data involves conv two-byte integer 6 6 6 to hexadeci~al91102 (again on an 80x84, which is a l i t t ~ e - e n ~ i ~ n e dou~le-length @-byte) floating point numbe E3210940, and the two-byte string l l F C 1tolhex 5 in assembly code is not e~tendedwith a null byte; the null-byte convention, and language-specific conventions have no place in PC\ 0 string Peli mustbetranslated by thecodegeneratorto code. the assembler will then translate this to 5hex 0 4 3 0 0. he main problems in const~ctinganassembler lie inthe handlin~ ses, addresses that refer to locatio~sin the same segl~ent;and external those that refer to locations in segments in other object files. issues.
the de~initionof the label l a b 2 1 l. The assembler reads the a s s e ~ b l ycode and assembles the code segments into two different arrays. When the assembler re data may. It translates the source material for the data segm~n in the data array, and records the addresses of the locations a bler records the valueof the label varl as
varl must be used in a m e ~ o ~ - r e ainstruction, ~in~ into the code may. While translating the code segment, th
.data
... .align 8
varl : .long 666
.code
... ...
add1 varl,%eax
...
jmp
...
labell
labell : S
.
.
Assenibly code fragment with internal symbols.
addl varl, %eax, for which it assembles the pro er binary attern a tion, plus the value of the data segment label which the code segment is location being assemble e of this as ‘relocatable to the data segme instr~~ction bler encounters the inst~ctionjrnp labe11, however, it c since the valueof 1 abel 1 is not yet known. There are two solutionsto this problem: baclspatching and two-scansasse~bly. using bacls~atching, assembler the keeps whose valuel is not ..A,,oflocations the known. yet The back~atchlist for a labe in the code and data segments being assembled, into which the value of L must be stored whenthevalue of L becomesavailable.nanapplied o c ~ u ~ e n oc e encounteredandtheassemblerdecidesthvalue of L mustbeassem tion Ai, the addressAi is inserted in thebac~patchlist for I, and the location Theresulting a~angementis shownin 4.93, whichdepictstheassemblycode,the asse~bledbinarycode,and one baclsp t, forthelabel label1. Whenfinallythe d~finingoccurrence of L is found, the address of the position it labels is det~rminedand assigned to L as its value. Next the backpatch list is processed, and for each entry Ak, the cation addressed byAk. ,the assem~lerprocesses its in ut file twice, the values of ail labels. ss described above, but with of where everythin occurrences of alllabels. For each the value of L, since that value derives from the position ,the values of all la els are known and the actual translation can take place withoutproble~s. I
.
Assembly code
Assembled binary
Backpatch list for labell
. . . . jmp labell
. . * .
. . . .
. . . . jmp labell f
.
.
.
jmp labell
.... labell: .
I
.
.
.93 A backpatch list for labels.
Some additional complications may occur if the assembler language supports features like macro processing, mu~tiplesegments, labels in expressions, etc., but these are mostly of an administrative nature.
The external symbol and address informationof an object file is summarized inits external symbol table, an example of which is shown in Figure 4.94. The table specifies, among other things, that thedata segment has an entry point named-opt ions at location 50, the code segment has an entry point named -ma in at location 100, the code segment refers to an external entry point -printf at location 500, etc. Also there is a reference to an external entry point named-f i le-l ist at location 4 in the data segment. Note that the meaning of the numbers in the address column is completely different for entry points and references. For entry points, the number is the value of the entry point symbol; for references, the number is the address where the valueof the referred entry point must be stored. The external symbol table can be constructed easily while the rest of the translatio~is being done. The assembler then produces a binary version of it and places it in the proper position in the object file, together with the code and data segments, the relocation bit maps, and possibly further header and trailer material.
The basic operation of a linker is simple: it reads each object file and appends each of the four components to the proper one of four lists. This yields one code segment, one data
External symbol
options main __ printf atoi printf exit msg-l is t -Out-Of-Memory fprintf exit -file-list
Type
entry point entry point reference reference reference reference entry point entry point reference reference reference
Address
50 data code 500 code 600 code 650 code 7 0 0 code 300 data 800 code 900 code 950 code 4 data 100
Figure 4.94 Example of an external symbol table.
ment, one relocation bit map, and one external symbol table, each consisting of the concatenation of thecorrespondingcomponents of theobject files. Inadditionthelinker r e t a i ~ s i n f o ~ ~ a tabout i o n the lengths and positions of the various components. It is now s~~ightforward to do the relocation of the internal addresses and the linking of the external addresses; this resolves all addresses. The linker then writes the code and data segments to e executable code file; optionally it can append the external symbol table, which sefui for debugging the translated program. This finishes the translation process arted in thefirst line of Chapter 2! al-world linkers are often more complicated than described above, and const~cting t a particularly simple task. There are several reasons for this. One is that the actual sit~ationaround object modules is much hairier than shown here: many object file f ~ r ~ ahave t s features for repeated initialized data, special arithmetic operations on relocatable addresses, conditional external symbol resolution, etc. Another is that linkers often havetowadethroughlargelibrariestofindtherequiredexternalentrypoints,and advan~edsymbol table techniques are used to speed up the process. A third is that users tend to think that linking, like garbage collection, should not take time,so there is pressure linker writer to producea blindingly fast linker. ne obvious source of inefficiency is the processing of the external symbol table. For each entry point init, the entire table must be scannedto find entries with the same symbol, which can then be processed. This leads to a process that requires a time O(n2)where n is the number of entries in the combined external symbol table. S c ~ n i n gthe symbol table for each symbol can be avoided by sorting it first; this brings all entries concerning the same symbol together,so they can be processed efficiently.
This concludes our discussion of the last step in compiler construction, the transfo~ation of the fully annotated AST to an executable binary file. The basic process is that of tree rewriting: nodes or sets of nodes are replaced by nodes or sets of nodes that embody the samesemanticsandthatareclosertothehard st powerfultree rew~tingproRS, is based on a pattern matching technique that is a tree extension of that used l analysis inthe front-end of a compiler. In an extremely high-level view of compiler construction, one can say that textual analysis is done by pattern matching, context handling by data-flow machine, and object code synthesis (code generation) again by pattern matching. Many of the algorith~sused both in compiler build in^ and in compilation can conveniently be expressed as closure
-The annotated AST produced by context handling is converted to intermediatecode in a igm- and langua~e-specificprocess. Next, the intermediate code is converted into symbolic machine inst~ctionsby a paradigm-independent, language-independent, and largely machine-independent process: code generation. The symbolic ~ a c h i n e i n s t ~ c C code,assemblycode, tionsarethenconvertedtosomesuitablelow-levelcode: machine code. -Another option is to i n t e ~ r ethe t intermediate code. -~ n t e ~ e d i a code t e usually consist of expressions, routine ad~nistrationand calls, and jumps; it may include specia~-pu~os~ lan~ua~e-specific operations, which can be inlined or idd den in a library routine.
-An i n t e ~ r ~ t ise ra program that considers the nodesof the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. An inte~reterperforms essentially the same actionsas the CPU of the computer, except that it works on AST nodes rather than on machine inst~ctions. -Inte~reterscome in two varieties: recursive and iterative. A recursive inte~reterhas an inte~retingroutine for each node type in the AST;it follows the control flow graph. An iterative interpreter consists of a flat loop over a case statement which contains a code segment for each node type; it keeps an active-node pointer similar to the instruction pointer of a CPU. -The routine in a recursive interpreter for the n o n - t e ~ n a N l perfoms the semantics of the non-terminal N. It normally follows the control flow graph, except when a status indicator indicates otherwise. -Unless the source language specifies the data allocation explicitly, run-time data in a
recursiveinterpreter is usuallykeptinanextensivesymboltable.Thisallowsample debugging information to be kept. -A recursive interpreter can be written relatively quickly, and i s useful for rapid prototyping; it is not the architectureof choice for a heavy-duty interpreter. -The ~ n - t i m edava in an iterative interpreter are kept in arrays that represent the global data area and the activation records of the routines, in a form that is close to that of a compiled program. -Additional information about the run-time data in an iterative interpreter can be kept in shadow mays that parallel the data arrays. These shadow arrays can be of assistance in detectingtheuse of uninitializeddata,theimproperuse of data,alignment errors, attempts to overwrite protectedor system area data, etc. -Using an iterative interpreter can be expected to be between 30 and 100 times slower than ~ n n i n ga compiled program, but an interpreter optimized for speed can reduce the loss to perhapsa factor 10.
he basis of code generation is the systematic replacement of nodes and subtrees of the AST by target code segments, in such a way that the semantics is preserved. It is followed by a ~inea~zation phase, which produc~sa linear sequence of inst~ctionsfrom the rewritten AST. -The replacement process is called tree rewriting. The linearization is controlled by the ~ata- low and ow-of-control requirements of the target code segments, -Thethreemainissuesin code generationare code selection,registerallocation,and instruction ordering. -Finding the optimal combination is N~-completeinthegeneral case. There are ways to simplify the code generation problem: l. consider only small partsof the A a time; 2, simplify the target machine;3. restrict the interfaces between code segments. -Codegeneration is pe~ormedinthreephases:1. pre~rocessing,in whichsomeAST nodepatternsarereplaced by other (‘better’) ASTnodepatterns,usingprogram t r a n s f o ~ a t ~ o2. ~ scode ; generation proper, in which all AST node p a t t e ~ sare replaced by target code sequences, using tree rewriting; 3. postprocessing, in which some target code sequences are replaced by other (‘better’) target code sequences, using peephole o timi~ation. - and postprocessingrnay be performed repeatedly. simplest way to obtain code is to generate for each node of the AST the code segment an iterative interpreter would execute for it. If the target code is C or C++,all optimizations can be left to the C or C++ compiler. This process turns an interpreter into a compiler with a minimum of investment. a single copy in a ather than repeating a code segment many times, routine calls to library can be generated, reducing the size of the object code consi~erably.This technique is called threaded code. The object size r e d u ~ t i ornay ~ be i m p o ~for ~ ~embedded t systems.
- An even larger reduction in object size can be achieved by numbering the library routines and storing the program as a list of these numbers. All target ~ a c h i n edependency is now concentrated in the library routines.
- Going in the other direction, the repeated code segments may each be partially evaluated in their contexts, leading to more efficient code.
- In simple code generation, a fixed translation to the Parget code is chosen for each possible node type. These translations are based on mutual interface conve~tions.
- Simple code gener~tionrequires local decisions only, and is therefore especially suitable for narrow compilers.
- Simple code generation for a register machine rewrites each expression node by a single machine instruction; this takes care of code selection. The intefiace convention is that the output register of one instruction must be used im~ediatelyas an input register of the parent instruction. - Code for expressions on a register machine can be generated by a depth-first recursive visit; this takes care of inst~ictionordering. recursive routines carry two additional p ~ a ~ e t e r the s : register in which the result be delivered and the set of free registers; this takes care of register allocation. - Since each operand that is not processed immediately ties up one register, it is advantageous to compile code first for the operand that needs the most registers. "his need, called the weight of the node, or its ~ e t h i - ~ l l m anumber, n can be c o ~ p u t e din a depthfirst visit. - When an expression needs more registers than available, we need to spill one or more registers to memory. There is no best register spilling techni~ue,except for exhaustive search, which is usually not feasible. So we resort to heuristics. - In one heuristic, we isolate maximal subexpressions that can be compiled with the available registers, compile them and store the results in t e m p o r ~ yvariables. This reduces the original tree, to which we repeat the process. - The machine registers are divided into four groups by the compiler designer: those needed for administration purposes, those reserved for parameter transfer, those reserved for expression evaluation, and those used to store local variables. Usually, the size of each set is fixed, and some of these sets may be empty. - Often, the set of re isters reserved for local variables is smaller than the set of candidates. Heuristics include first coine first served, register hints from the progra~mer,and usage counts obtained by static or dynamic profiling. A more advanced he~iristicuses graph coloring. - omp pi la ti on on the stac~compilationby symbolic inte~retationis s o ~ e w h a similar t to symbolic interpretation. As in the latter, we keep a symbolic representation, but now we include both the stack and the registers; more impo~antly,this time the i n f o ~ a t i o nin the representation must be exact. The representation is called a regvar descriptor. - If the effect of a node can be stored exactly in the regvar descriptor, we do so. No code is generated for the node, but its semantics is preserved in the regvar descriptor: old regvar + node = new regvar. - If the effect of a node is such that we cannot keep the information in the regvar descr tor exact, we generate code to achieve the effect, and record the result in the regvar.
3
the semantics is preserved in the rewriting of the node: old regvar + node = code + new regvar. -If we have the information from live analysis available, we can delete all information about a variable from the regvar description the momentwe leave its live range. -A basic block is a maximal part of the control graph that contains no splits tiumps) or combines (labels). A basic block starts at a label or at the beginning of the routine and ends just before a jump or jump-like node or label or the end of the routine. It contains expressions and assign~entsonly. -Theconcept of a basicblockseparatestheconcerns of code g neration for straight sequences of expressions and assignments from thoseof flow of control. The separation is especially useful for narrow compilers, sinceit allows them todo optimized code generation for expression sequences. -Code generation for basic blocks proceeds in two steps. First the control flow converted into a dependency graph, which is a ‘dag’, a directed acyclic graph. rewrite the dependency graph to code. The gain lies in the fact that the dependency graph is much less restrictive toinst~ctionordering than the control flow graph. -The dependency graph of a basic block is formed by two kinds of dependencies: data de~endenciesinside expressions t ~ o u g hoperands, and data d~pendenciest ~ o u g hvariables that obtain their value in an assignment and whose value is used in an expression further on. The ultimate data dependencies are the values that are still needed after the basic block; these are called the roots of the basic block. -~ ~ p h a s i z i nthese g data dependencies and removing other ow-of-con~ol dependencie~ yields a rough data dependency graph that can be simplifiedby shL~nting out the assignments and keeping only those nodes that are reachable from the roots. The graph is a dag, a directed acyclic graph. -The dag of a basic block can be reduced furtherby recognizin c o m o n sube~pressions. The reduction takes place by merging nodes that have the s e operands, operator, and ~ependencies,repeatedly. Traditionally, dags of basic blocks arei~plementedas arrays of triples. -The nodes in the dagof a basic block are rewritten to theco~espondingmachine instructions. The dag is then linearized based onlate evaluation of the operands. -The specific f o m of late evaluationused for linearizingdagsidentifiesladder sequences; such ladder sequences match sequences of r e g i s t e r - ~ e m o ~ i n s t ~ c t that ions all have the register inc o m o n . Such sequences are veryefficient. To find the linearization,first avai~ableladder sequences are isolated and thencode from them is generated in last-to-first order, starting with a ladder on which no data from otherladdersdepend.Theladder is thenremovedfromthedagandtheprocess is repeated. -Pointers in expressions in basic blocks can be handled by two simple rules: 1. an assignment under a pointer m&es any variable used in a subsequent expression dependent on that assign~ent;2. retrieving a value from under a pointer is dependent on all preceding assignments. Extended analysis may allow someof these dependencies tobe canceled. -Optimal rewriting of expression trees (but not of dags!) can be obtained through code generation; BURS standsfor Bottom-Up Rewriting System.
-
-
7
URS technique allowsone to decompose a given input treeof arbitrary complexity number of subtrees, each of which is a member of a given set of trees, the pattern ttern trees may again beof arbitrary complexity. RS to code generation, we equate the input tree with the expression AST trees with the ASTsof the machine instructions. in two scans over the input tree, one bottom-up and one top-down. The m-up scan annotates each nodeof the input tree with references to nodes in the pattern trees. The presence of a reference to a node N with an node I in the input tree means that the tree with I at the top can be rewritten by rewriting its top section by the of the top section, all subtree that has N as its top. This implies that after the rewrite other parts of the tree belowI can also be rewritten, The top-down scan can then rewrite the entire tree. -The bottom-up scan combines sets of fragments of pattern trees that are a~plicableat a certain node in the input tree much in the same way as a lexical analyzer combines sets of items of regularexpressionsthatareapplicableinacertainpositionintheinput stream. -Like the lexical analyzer, the speed of the BURS pattern matcher can be improved by implementing it as an FSA, a tree auto ma to^ in this case, rather thaninte~retatively. -Unlike the case of the lexical analyzer, the various patterns have different costs, and we want a minimum-cost rewrite. In the inte~reting imple~entation, cost-based decisions canbehandledbyusingadynamicprogrammingtechnique:at each node,onlythe cheapest way to get the result in a given register type is retained, In the tree automaton imple~entation,constant costs can be incorporated into the automaton. The resulting ansition table are often huge but can be compressed considerably. URS code generation is adapted relatively easily to additional requirements. Examples eneration for machines with several typesof registers and the extensionof the method to t ow-of-con~ol instructions. -Two variables which are both live at a given position in the program ‘interfere’ with each other when register allocation is concerned. If we know the live ranges of all variables, we can construct the register interference graph of the variables, in which each node represents a variable andeach arc between two nodeN I and N 2 represents overlap N Iand N 2 . of the live rangesof the variables represented by the nodes -We can find a possible assignment of registers to variables by coloring the graph such that no two nodesof the same color are connectedby an arc; each color then represents a register. The optimalregisterassignmentcorrespondstoagraphcoloringwiththe lowest possible numberof colors. -The problem of optimal graph coloring is ~P-complete,but good heuristics exist; for example,temporarilyremoveanode of minimaldegreefromthegraph,colorthe remaining graph recursively using the same algorithm, reconnect the removed node and color it. -In supercompilation, a small but frequently usedintemediate code fragment is taken and the best possible code for it is generated using exhaustive search. The resulting code is then used as a template in a compiler. Surprising code sequences have been found in this way.
eforeconvertingthe i n t e ~ e d i a t ecode totargetcodeitmaybepreprocessedto improve efficiency. Examples of simple preprocessing are constant folding and arithmeticsimplification.Carehas to betakenthatarithmeticoverflowconditionsare so. translated faithfully by preprocessing, if the source language semantics requires -More extensive preprocessing can be done on routines: they can be in-lined or cloned. -In in-lining a call to a routine is replaced by the body of the routine called. This saves the calling and return sequences and opens the way for further optimizations. Care has to be takento preserve the semanticsof the parameter transfer. -En cloning, a copy C of a routine R is made, in which the value of a parameter P is fixed to the value V, all calls to R in which the parameter P has the value V are replaced by calls to the copy C. Often a much better translationcan be produced for the copy C than for the original routineR. -~ o m sub-optimal e symbolic machine code sequences produced by the code generation processcanberemoved by peepholeoptimization,inwhichfixedparameterized sequences are replaced by other, better, fixed parameterized sequences. About a hundred re~lacementpatterns are sufficient to take care of almost all correctable inefficien'es left by a relatively simple code generator. eplaceable sequences in the instruction stream are recognized using an FSA based on the replace~entpatterns in the peephole optimizer. The FSA recognizer identifies the longest possible sequence, as it does in a lexical analyzer.The sequence is then replaced and scanning resumes. -Code generation yields a list of symbolic machine instructions, which is still several stepsaway from aexecutablebinaryprogram.Inmostcompilers,thesestepsare delegated to the local assembler. he assembler translates the symbolic instructions generated for a source code module to a relocatable binary object file. The linker combines some relocatable binary files andprobablysomelibraryobjectfilesintoanexecutablebinaryprogramfile. The loader loads the contents of the executable binary program file into memory and starts the executionof the program. -The code and data segme~tsof a relocatab~eobject file consist of binary code derived directly from the symbolic instructions. Since some machine instruction require special ent, it may be necessary to insert no-ops in the relocatable object code. tablebinaryobjectfilescontaincodesegments,datasegments,relocationinformation, and external linkage information. -The memory addresses in a relocatable binary objectfile are computed as if the file were of the loaded at position 0 in memory. The relocation information lists the positions addresses that have to be updated when the file is loaded in a different position, as it usually will be. -Obtai~ingthe relocation information is in principle a two-scan process. The second scan can be avoided by bac~patchingthe relocatable addresses as soon as their values are determined. The relocation information is usually imp~ementedas a bit map. -An external entry point marks a given location in a relocatable binary file as available fromotherrelocatablebinaryfiles.Anexternalentrypointin one module can be
accessed by an external reference in a different, or even the same, module.
- The external linkage information is usually i ~ ~ p ~ e ~ easn tane array d of records. - The linker combines the code segments and the data segments of its input files, converts relative addresses to absolute addresses using the re~oca~ion and external linkage information, and links in library modules to satisfy left-over external references. - Linking results in an executable code file, consisting of one code segment and one data segment. The relocation bit maps and external symbol tables are gone, having served their purpose. This finishes the translation process. - In an extremely high-level view of compiler const~~ction, one can say that textual analysis i s done by pattern ~ a t c h i n gcontext , handling by data-flow machine, and object code synthesis (code generation) again by pattern matching .
Books on and general discussions about interpreter design are rare, unfo~unately. The most prominent examples ar Griswold and Griswold (1986), who describe an * rpreter in detail, and by t (1981), who describes a variety of interpreter types. h valuable information c 11 be found in the ~ r ~ ) c e ~o ~ tihnSe ~ s~ G ’87 ~ Sym~ N ~~~~i~~ on I ~ t e r p r e t e rand ~ ~ I~terpretive~ e c ~ ~(1987). i ~ ~ e s The most important recent books on code g e ~ e r ~ t i oare n by ~ u c h n i c k( Morgan (1998). A continuous source of infor~ationcan be found in the annual PLAN ~onferenceson P r o ~ r a ~ ~Li anngg ~ a esign and ~mple~entation, I? plete compiler, the survey of data-flow analysis techni~uesse retar~etableC c o ~ p i l e Zrc, r is described by As with interpreters, readin material on assem~lerdesign is not abundant; we mention Saloman (1992) as one of the few books. Linkers and loaders have long lived in the undergrowt~of compilers and operating systems; yet they are getting more i ~ ~ o r t a with n t each new p r o g r a m ~ i nlanguage ~ and more c o ~ ~ l i c a t ewith d each new o ~ e r a t i ~system. g Levine’s book (1999) is the first book in 20 years to give serious attention to them and the first ever to be dedicated exclusively to them.
Exercises This is an exercise in ~ e ~ o i ~ a t iwhich o n , is not properly a compiler const~ction subject, but the exercise is still instructive. Given a directed acyclic graph G and a node N in it, design and implement an algorit~mfor finding the shortest distance from N to a leaf of G by recursive descent, where a leaf is a node with no outgoin arcs. Test your imple~entationon a large graph of the structure shown in ure 4.95. Extend the iterative inte~reterin Figure 4.5 with code for operators.
Test graph for recursive descent marking.
Iterative interpreters are much faster than recursive inte~reters,but yield less extensiverun-timediagnostics.Explain.Compiledcodegivesevenpoorererrormessages.Explain. Explain how a self-extracting archive works (a self-extracting archive is a that, when executed, extracts the contents of the archive that it represents).
program
Section4.2.3.1outlines howthethreaded code of Figure 4.14 canbe reduced by numbering the routines and coding the list o f calls as an m a y o f routine n u ~ b e r s . Show such a coding schemeand the corresponding interpreter. Generating threaded code as discussed in Section 4.2.3.1 reduces the possibilities for partialevaluationas discussed in Section4.2.3.2, because the switchis inthe Expression-P routine. Find a way to prevent this problem.
.7
The weight of a tree, as discussed in Section 4.2.4.2, can also be used to reduce the ma~imumstackheight whengenerating codefor the stackmachine of Section 4.2.4.1. (a) How? (b) Give the resulting code sequence for the AST ofFigure 4 2 3 . Thesubsection on machines with r~gister-memoryoperations in Section4.2.4.2 explainsinformally howtheweight function mustbe revised inthe presence of inst~ctionsfor combining the contents of a register with that of a m e ~ o r ylocation. Give the revised version of the weight functionin Figure 4.34.
The code of the C routine of Figure 4.96 co~esponds to the flow graph of Figure 4.97. The weights for static profiling havebeenmarkedbythe letters a to g. Set LIP the traffic flow equations for this flow graph, underthe following assump-
tions. At an if-node 70% of the traffic goes to the then-part and 30% goes to the else-part; a loop body is (re)entered 9 out of 10 times; in a switch statement, all cases get the same traffic, except the default case, which gets half.
void
Routine (void) { if (...) { while (.)..) { A;
1
1
else { switch { case: .,: B; break; case: ...: C; break; (
1
1
.
S
.
)
1
igure 4.96 Routine code for static profiling.
Using the same techniques as in Exercise 4.9, draw the flow graph loop
for the nested
while (..,) { A; while (.".) { B;
1
1
et up the traffic equations and solve them. In Section 4.2.5. l, an a l g o ~ t is h ~presented to convert theAST of a basic block to a data dependency graph. Step 3 of that a ~ g o r i t hinserts ~ dependencies to reflect the fact that an assi~nmentto a variable replaces its old value. Which other r e ~ u i r e ~ e n t does it also express? Given the code fragment x := a*a + 2*a*b + b*b; y := a*a - 2*a*b c b*b;
draw the depe~dencygraph before and after common subexpression e~imination. (a>Rraw the dependency graphof the C expression *p+ +. (b)In thebasicblock { a = *p++ ; b = *p++},theexpression *p++is not a
common subexpression, appearances ~otwithstan~ing. How does the c o ~ m subo~
end if ~
7 Flow graph for static profilingof Figure 4.96,
expression elimination technique described in Section 4.2.5.1 discover that *p++ is not a c o ~ ~ sube~pression? o n In Section 1.2 we chose ASTs over lists of instructions for the intermediate code representation, Redo the demo compiler from that section with triples rather than te ASTs as i n t e ~ e ~ i acode. Give the data dependency graph after common sube~pression elil~ination of Exercise 4.12 in triple rep~esentation. tep 2 of the algorithm in the subsection on the lineari%ationof the data dependency raph in Section 4.2.5.2 features two nodes, S and N. Is the algorithm still correct when S and N happen to be the same node? Show that there isalways at least one available ladder sequence whenl i n e ~ i % ithe n~ data dependency graph. efer to Figure 4.49 concern in^ the rewriting and ordering of a ladder sequence. Giventhe com~utativityof the 1- operator, it is temptin to addthevalue of b
directly to I1 without loading it into register R1 first, thus saving one register and one instruction. Explain why this is not allowed. Use the ladder-sequence algo Answers.8. graph of Figure ladder sequence.
9generate to code for the dependency d dependency graph after each coded
equence algorithm on page 329 always uses R1 as the ladder er code can be obtain by using the appropriate' register an to-registertransfers.scusswaystochoosean 'appro~riate' Given a ~ a c h i n ewith 3 machine instrlIctio~s:
where ~e~ denotes the contentsof a memory location. (a) Show the sets that the S code ge~eratorbuilds at the nodes of the input tree co~espondingto the expression a + b, and explain why it does so. (b) Show the tree or trees that result from the rewriting process. Refer to Section 4.2.6.3 on instruction selection by dyna~ic ~rogra~lmillg. we indicate associativity of operators to the code generator generator, in addition to commutativity. Would this be useful? Would this cause the code generator generator to add more patterns? Which ones'? edesign the lexical analysis algorithm to yield the lar est number of tokens for a given input string, rather than the longest matches, using dynamicp r o g r a m ~ i nin~a way similar to the tree pattern~ a t c h i n galgorithm. Why is it not good enoughto just yield the shortest token all the time'? edesignthelexicalanalysis algorith~toyieldthesmallestnumber rather than the longest matches.
of matches,
The dynamic progra~mingalgorithm sketched in Section 4.2.6.3 assigns cost zero to ~ e m o r y rather ' than the label rnern, since the label means 'value already resides in 'value can be brought to memory at cost C'. Add an ins~ructionStore-Reg R , x with cost 3 units to the instruction set of Figure 4.60. This illst~~ction allows any operand that resides in a register to be yielded in memory, at a price. Also, in addition to the cost of a rewrite in a label, we can record the ~ a x i number ~ u ~of registers used. edo the c o ~ p u t a t i o that ~ s led to igure 4.7 1, while including the ing operands in memory and recording the l l l a ~ i number ~ u ~ of the resulting tree to derive an i~structionsequence when only one register is available. What happens when two registers are available?
Given the register interference graphfor the five variables a,b, c,d,and e: a
b-C d
e
show the steps the graph coloring algorithm goes through when coloring this graph, and produce the resulting register allocation. 7 Given the code sequence
{
int tmp-2ab = 2*a*b; int tmp-aa = a*a; int tmp-bb = b*b; = tmp-aa + tmp-2ab +tmp-bb; y = tmp-aa - tmp-2ab +tmp-bb; X
1
and given that a and b are live on entry and dead on exit, and that x and y are live on exit: (a) C o n s t ~ cthe t register interference graph. (b) Color the graph. How many registers do we need? Refer to Section 4.2.8 on supercompilation. Suppose we have a modest set of 20 machineinstructions, each operating on 2 registers,and 3 machineregisters.We want to find a pattern for a given function by trying all combinations of at most N i ~ s t ~ c t i o n sUnder . theassumptionsthattryingasinglecombinationtakesten microseconds, that all but one or two combinations are rejected, and that rejecting takes place on the average after two tests, what would abereasonable value ofN for the search to take a weekend? On some processors, multiplication is extremely expensive, and it is worthwhile to replace all multiplications with a constant by a combination of left-shifts, additions, and/or subtractions. Assume that our register machine of Figure 4.22 has an additional inst~ction: Actions
Instruction
Shif t-Lef t-Reg c, R,R,:
=R,<
which shifts the contents of R, over I c I bits, to the right if c c 0, and to the left otherwise. Write a routine that generates code for this machine to multiply R0 with mu1 t iplor given i as cat a amet et er, withoutusingthe apositivevalue Mu1 t-Reg inst~ction.The routine should leave the result in R1. Hint: scoop up sequencesof all Is,then all Os, in the binary rep~esentationof mu1 tiplicator,starting from the right.
What is the result of in-lining the call P ( 0 ) to the C routine void P(int i) { if (i c 1) return; else Q O
;
1 (a) im~ediatelyafter the substitution? (b) after constant propagation? (c) after constant folding? (d) after dead code elimination? (e) What other optimization (not covered in the book) would be needed to e l i ~ i n a t e the sequence entirely? How could the required i n f o ~ a t i o nbe obtained? For a processor of your choice, find out the exact semantics of the Add-Gonst 1,R, and Increment R, inst~ctions,find out where they differ and write a complete replacement pattern in the style shown in Section 4.2.12.2 for Increment Rc* Given a simple, one-register processor, with, among others, an instruction Add-Constant c , which adds a constant c to the only, implicit, register. obvious peephole optimization patterns are Add-Constant c; Add-Constant d Add-Constant 0 = >
=>
Add-Constant cl-d
Show how the FSA recognizer and replacer described in Section 4.2.12.2 co~pletely removes the instruction sequence Add-Cons tant 1; Add-Cons tant 2 ; Add-Cons tant - 3. Show all states of the recognizer during the transfo~ation. Learn to use the local assembler, for example by writing, assembling and running a program that prints the tiables of ~ u l t i ~ l i c a t i ofrom n 1 to 10. any processors have modes and/or instructions. They may, for example, have a jump instruction that adds a constant to the program counter. What is the advantage of such inst~ctionsand addressi~gmodes? any processors have conditional jump inst~ctionsonly for conditional jumps with a linited range, For example, the target of the jump may not be further than 228 bytes away from the current program counter. ~ o ~ e t i m ean s , assembler for such a processor still allows unlimited conditional jumps. How can such an un~imitedconditional jump be i~plemented? Find and study docu~entationon the object file format of a compiler system that you use regularly, In pa~icular,read the sections on the symbol table format and the ~elocationi n f o ~ a t i o n . istory of code ~ e ~ e r ~: Study t i o ~Anderson’s two-page paper (1964), which introduces a ~ u d i m e n t aform ~ of bottom-up tree-rewriting for code generation, and identify and s u m ~ ~the i ~techniques e used. Hint: the s u m m ~ ywill be much longer than the paper.
All compilers and many run-time systems use dynamically sized data. The size of such data is not known in advance and room for it must be found at run time. Examples inside thecompileraresymboltables,stringsfromthesourceprogram, ASTs, register interferencegraphs for graphcoloring,andmanyothers. The examples in run-time systems derive from the nature of the source language: strings, dynamically sized arrays in il~perativelanguages,closures in functionallanguages,tentativeunificationsinlogic languages, and incoming messages in distributed languages are a few that come to mind. of There was a time when compiler writers could afford to allocate fixed amounts memory for such data, for example 32 bytes for an identifier, and give an error messageor even truncate the data, with or without warning, when the limit was exceeded, With the s o l ~ e t i ~ e s a wmemory ~ w ~ dst~cturesand the l i ~ i t e dmemory man~gement techni~ues of thosedays,suchrestrictionshelped to simplifycompilerwriting.Also,available memoriesweresosmallthatnobodyexpectedtobeabletouselongidentifiersorto compile large programs. With present memory sizes and better p r o g r ~ m i n glanguages, techni~L~es, andalgorithms,both ~ g u m e n t shavebecomevoid,anddynamicmemory ~ a n a g e ~ise an ~ tintegral partof both compilers and~ n ~ t i m systems, e When a program is started,mostoperatingsystemsallocateatleastthefollowing three memory segmentsfor it:
-the ‘code segment’, which contains the program code and which
is usually read-only or execute-only;it is addressedbytheprogramcounterand is nexttoinvisible to the program code itself; -the ‘stack segment’, which contains the stack and which may come with an overflow and underflow detection mechanism; itis addressed by one or more stack pointers, which are most of the time manipulated automatically by machine inst~ctions; -the ‘data segment’, which is a single contiguous stretch of memory locations, totally at the disposition of the program for the purpose of storing data; its start location and size
are accessible to the program in some way and its contents are addressable by machine inst~ctions;itsuse is uptotheprogrammer.Whentheemphasisisonmemory mana~ement,the data segmentis also called the Some operating systems allow more co~plicated d so~etimes baroque r main interest is in the ~chitectures,but we will restrict ourselves to the simple case. rnent. ~ s u a lthere l ~ is an o~erating data segment; we will use it for all our memory m ent; if the s e g ~ e nhas t to be moved system c o ~ a n or d system call to resize the data as a result of this, the operating system makes sure this move is transparent to the program. It is the task of m e ~ o r y m a n a g e ~ eton hand t out and collect subseg~entsof the data segment such that all su~segmentsfit inside the data segment and no memory location is ever part of more than one subsegment. These subse~mentsare usually called ‘blocks’ or ‘chunks’; for a more precise definition of blocks and chunks see Section 5.1.1. Additional important objectives are thata request for memory should not be refused when thereis still a possibility to fulfill it and that the amortized cost of obtaining and freeing~ e ~ oshould r y or all~ostconstant. See below for an e x ~ l a n a t i oof~the notion of ‘amortized ory allocation methods in the 1960s and’70s used to require a time proportional tothenumber of blocks han~edout;thisresultedinquadratictime require~entsand limited the methods to some several thousands of blocks. This was no problem with the limitedcomputermemories in thosedaysbut is ~~na~ceptable now,andmoreefficient methods are required.
cost average 1s the of one The ence step,averagedover a longenoughperiod.If, for example, each step normally costs 1 unit but one in five costs l0 units, the amortized cost is 2.8 units, since thatis the avera e per step in the long run. This way of measuring costs is useful for situations in which we cannot prevent occasional steps from being much more expensive than usual. This is the case in algo~ithmsthat require occasional reorganization of their data; it also applies to buyinga house.
Allocating memory in the heap is easy enough: we keep a pointer to the first free location in the heap, allocate the requested block from there, and bump the pointer to the next free location. The problem with this schemeis that sooner or later? we run outof heap space; if memory blocks are requested continually and no blocks are released, even the largest memory will fill up eventually. f we decide we want blocks to be released, a new question arises: who is responsible for the release? The naive answer is of course’,butthatanswer un~eresti~~ates thecomplexity of thetas systems often allocate data which have unpredictable lifetimes and of whose existence the p r o g r a ~ e is r not even aware. Efforts to solve these problems have resultedin techni~ues for automatically releasing unuseddata,
En the following sections we will discuss techniques for data allocation with explicit deallocation bytheprogrammer,and for data allocationwithimplicitdeallocation. Emplicit deallocation is usuallycalled'garbage collection'. Generallyspeaking,most compilers deallocate their internal data explicitly (or not at all) and most run-time systems leave the deallocation of user- allocate^ data to the user or the garbage collector, and that of their own data to the garbage collector. More in p ~ i c u l a ralmost , all ~ n - t i m esystems for programsinthefunctional,logical,andmoreadvancedparadigms feature garbage collectors.Garbage collectors Me also usedinmany app~icationsoutsidecompiler construction. Examples are text editors, word processors, photo editors, and web browsers. We will therefore discuss the tec~niqueswith explicit deallocation in a general and compiler setting and those with implicit deallocation ainrun-time system setting. Roadmap e m ~ a~ g e ~ eo n t ~ S. l Data allocation with explicit deallocation 5.1.1 Basic memory allocation 5.1.2 Linked lists S. 1.3 Extensible arrays 5.2 Data allocation with implicit deallocation 5.2.1 Basic garbage collection algorithms 5.2.2 Preparing the ground 5.2.3 Reference counting 5.2.4 Mark and scan 5.2.5 Two-space copying 5.2.6 Compaction 5.2.7 Generational garbage collection
5~
In most systems, basic memory allocation comes in the formof a routine that finds a block of unused memory of the requested size, marks it as used, and returns a pointer to the block. If no such block is available, the result varies: a null pointer may be returned, an error routine rnay be called, or the program rnay be aborted. The requested size is given as a p~ameterto the routine; the marking prevents the block from bein handed out more than once, A second routine can be used to return the block to the system. This routine simply marks the block as not in use.The implementation of this m ~ ~ varies; n g a bit map with one marking bit for each byte suggests itself, but other imp~ementationsare possible and perhaps equally usual. This basic mechanism is available in C as the routines void *malloc (siz e-t size) and free (void * p t r ) . The void * const~ctionis a tricktofake p o l y m o ~ ~pointers ic in C, and need not worry us here. We will discuss the ~ o r ~ n of gs malloc () and free ( ) below. Although compilers are essentially just n o ~ a pro~rams, l they have enough in common to consider the allocation and deallocation problems of two classes of data types that ~ eand ~ e ~ t ~ ~ sarrays; i~le are in great demand in compilers. These data types are Z i ~ lists
both are used to imple~entsurte~or ~ ~ ~ osets r or t ehags. ~ It WO mentation l a n ~ u a had ~ e type constructors for them built in, but h A correct and efficient implementati~nof these data structures is for the co~ectnessand the efficiency of the compiler, which is to them here. It is true that some modernl Java) have automatic data alloc all compilers are still written in a traditional tion and deallocation in these shown that it is very advanta~ properly and systematical~y.
A memory allocation requestof N bytes supplies the user witha p a block of N free bytes. The ~ e m allocation o ~ process requires the block to
in a more complicateci data s t ~ c t u r esince some additional ad~inistrationis
manyplaces, includin~ pro~ramvariables, ma chin^ re~isters, Chunk pointers occur only in chunk ad~inis~ations, if at all. The offset between chunk pointer and block pointer needs s is that the address must be a ~ u l ~ p ofl eZN, with N depend in^ on t machines, for example, inte~erscan only be stored in and re addresses divisible by 4.
st or in^ any data type in the common multi~leof all d means that the nu~ericalvalue of the block pointer rem ~ u ~ t i pofl e8 or 16, and chunk addresses must beat least 8 or We will now consider the relation between ch a chunk is known, one can find its end and thus the start of the next reside inside a block are under user control and will point to the start pointers inside the allocation administration, if any, are un S. A possible layout of chunks is shown in tothechunk si weneed a fewbits in each chunk for acimini of the bits is a ,whichindicates if a chunk is free.
the . Onmachineswithalignmentrestrictions?thesizefieldmustbealignedonan integer-align~entb o u n d a ~if; we allocate 4 bytesfor the field, it can probably include the administration bits. For the block pointer to come out on a 16-, 32-, or 64-byte boundary, the chunk must then be allocated on a 16-, 32-, or 64-byte boundary minus4.
chunk
chunk
IOW
high
marked free use marked in Memory structure used by the mallodfree mechanism.
~ewillnowdescribeanaive i~plementationof a routine Mal loc (Block but is not good si2;e ) ;it is intuitively very simple and exhibits the basic principles? enough for practical use; opti~izationswill be suggestedin Section 5.1.1.1. To allocate a block B of the required Block s iz e, Ma lloc steps through the chunks until it finds a chunk C marked ‘free’ that is l e enough to a ~ c o ~ ~ oB.d The a~e chunk C is then broken into two chunks C l and Cz, S that Clhas the proper size for the block B -unless of course the block happens to fit exactly, in which case no chunk C 2 is created. Now the size fields of Cland C 2 are set to their new values, the free bit of Clis turned off, that of C 2 is turned on, and a pointer to the block in Cl is returned to the requester. To free a block pointed to by a given pointer, the free bit of the co~espondingchunk is simply turned on. If Malloc cannot find a chunk that is large enough it first tries to coalesce adjacent free chunks into larger chunks,by scannin~the ~ ~ r looking n o for ~ such situatio~s.If two adjacent free chunks Cland C 2 are found, the size of Clis set to the sum of the sizes of Cland Cz. The free bit of Clneed not be modified since it is already set to free; neither do the administ~ationfields in Cz need adjustment since they are now out of reach of the allocator. This operation can also be performed during any other scanof m e ~ o r yo,r upon f this does not provide a suffi~ientlylarge chunk, Mal loc calls a routine Soluof memory condition ( ) in a last a t t e ~ pto t remedy the situation. ion to out elow we discuss some ways of o b t a i n i ~additional ~ memory that could be i~plemented of memory condition(). intheroutine Solution to out
SET
the polymorphic chunk pointer First-chunk pointer TO Beginning of available memory; SET the polymorphic chunk pointer One past available TO memory Beginning of available memory+ Size of available memory; of available memory; SET First-chunk pointer .size TO Size SET First-chunk pointer .free TO True; F ~ C T I O NMalloc (Block size) RETURNING a polymorphic block pointer: SET Pointer TO Pointer to free block of size (Block size); IF Pointer /= Null pointer: RETURN Pointer; Coalesce free chunks; SET Pointer TO Pointer to free block of size (Block size); IF Pointer /= Null pointer: RETURN Pointer; RETURN Solution to out of memory condition (Block size); PROCEDURE Free (Block pointer) : SET Chunk pointer TO Block pointer - Administration SET Chunk pointer .free TO True;
size;
Figure 5.2 A basic implementationof Mal loc (Block s ize ) .
Outline code for the basic Mal loc is given in Figures 5.2 and 5.3. The pointer One past avai lab1 e points memory to the first byte past the end of available memory. This is a useful pointer, but if memory extends to the last byte of addressable memory it may not be representable; in that case additional trickeryis needed. The codeassumes that data can be accessed at any alignment; if this is not supported by a given machine, the computation of RequeS ted chunk size in Figure 5.3 must be modified to take the alignment requirements into account. The i~plementationof theroutine Solution to out of memory condi() depends on the system. If the basic memo^ allocation system coexists with a t ion garbage collector that uses the same chunk structure, the routine can call it, in an attempt to find unusedchunksthathavenotbeen freed. Anotheroption is totrytoincreasethe mount of memory allotted to the program. The Malloc algorithm as stated assumes that the value of One past available memory is fixed for any one run of the program, but actually many operating systems feature requests to reset this value and thus extend or restrict the amount of memory available to the program. After all, our memory allocation system is a client to a more basic memory allocation system, that of the operating system. past available memory can be issued at any moment Such requests to bump One the routine Pointer to free block of size (Block size) fails tofind a block,and it is temptingto do so atthe first possibleoccasion,toavoidthework of coalescing blocks or doing garbage collection. The problem is, however, that some operating systems will extend the available memory almost indefinitely but implement this by supplying paged or virtual memory, which resides completely or partially on disk and is
FUNCTION Pointer to free block of size (Block size) RETURNING a polymorphic block pointer: // Note that this is not a pure function SET Chunk pointer TO First-chunk pointer; SET Requested chunk size TO Administration size + Block size; WHILE Chunk pointer/= One past available memory: IF Chunk pointer .free: IF Chunk pointer .size- Requested chunk size>= 0: // large enough chunk found: Split chunk (Chunk pointer, Requested chunk size); SET Chunk pointer .free TO False; RETURN Chunk pointer+ Administration size; // try next chunk: SET Chunk pointer TO Chunk pointer -I- Chunk pointer .size; ETURN Null pointer; ~ROCEDURESplit chunk (Chunk pointer, Requested chunk size): - Requested chunk size; SET Left-over size TO Chunk pointer .size IF Left-over size> Administration size: // there is anon-empty left-over chunk SET Chunk pointer .sizeTO Requested chunk size; SET Left-over chunk pointer TO Chunk pointer + Requested chunk size; SET Left-over chunk pointer.size TOLeft-over size; SET Left-over chunk pointer.free TOTrue; ~ROCEDURECoalesce free chunks: SET Chunk pointer TO First-chunk pointer;
WHILE Chunk pointer/= One past available memory: IF Chunk pointer .free: Coalesce with all following free chunks (Chunk pointer); SET Chunk pointer TO Chunk pointer + Chunk pointer .size; PROCE~URECoalesce with all following free chunks (Chunk pointer): SET Next-chunk pointer TO Chunk pointer -I- Chunk pointer .size; ~ H I L ENext-chunk pointer /= One past available memory D Next-chunk pointer .free: // Coalesce them: SET Chunk pointer .size TO Chunk pointer .size+ Next-chunk pointer .size; SET Next-chunk pointer TO Chunk pointer + Chunk pointer .size; igure 5.3 Auxiliary routines for the basic Nalloc (Block size).
whichareunknown:the memo^ allocator itself,the way the ~ r o g r auses ~ it, an operating system. One impo~antrule is of course that the ~ ~ e ~ allocator o r y should not give up as long as there is one possibility left to continue.
There are two efficiency problems with the list. First, finding a suitable chunk in the free list requires lin memory, which is unacceptable. Second, coalescing is done i only when the usual linear searchfails; this makes the perfor lar. The simplest approach to the first problem is to chain t the link pointers can be a c c o ~ ~ o d a t eindthe unused spac area. Chunks can still be split as explained above; chunks to the list. ~e have thus replaced linear search through all the free chunks. This is some improvement, butis likely n A more sophisticated implementation classifies the keeps a free list for each interval. ~ntervalscould for 2, and separate linked lists could be kept for all free c 2n'1--3 ,for a sensible set of values of n. Now, when determine a k such that 2"' ~ ~ 5 We 2 now ~ find . the through 2k'1-l ;if that list is empty, we try thelist for 2 we find one that is not empty. The first block in tha room, so allocation takes place in cation must of course be linked in time too. The interval boundaries tion, a different increasing sequence canbe more useful. can be done on thefly, during each callof Free if we have easyacces~to the chunk preceding the one beingfreed; the one following it is already wit~ineas using the size of the chunk beingfreed. This access can beo ~ t a i n si e~ size i n f o ~ a t i o nof each chunk atits end, where it can befound by the course, this duplicate size information costs additional space and must other operations on the chunks. By combining these two optimizations, a very effic~ent~ e ~ o ~rl lyo c ~ t can o r be constructed. The combination of the two optimizations is not entirely trivial since a cbun being freed may bemergedwiththeprecedingand/orfollowingchun which of them is free, The su~oundingfree c h ~ n kor chunks must then their respective free lists, and the c h u result ~ ~ in^ from the ~ e ~ g proper free list. We will now turn to two special a~pli~ations of ~ ~ ~llocation m oin a CO ~ linked lists and extensible arrays.
lists are used for many purposes in a compiler: identifier lists storage, synta~trees, code frag~ents,etc. The C compiler from the
t ACK (Tanenbaum et al., 1983) contains 25 different list types and the Orca compiler contains 17; for Orca, see Bal,~ a a s l ~ o and e k Tanenbaum (1992). a naive Records in these linked lists are added and removed at irregular intervals, and implementation is to request them one by one from the standard memory manager, for example mal loc ( ) ,and return them using f r e e ( ) Much efficiency can be gained by batching these records in blocks rather than allocating them one by one. A separate set of blocks is allocated and maintainedfor each record type T. Each block is an arrayof a fixed number of records and is obtained from the standard memory ~ a n a g e rsuitable ; sizes are 16 or 32 records per block.A free list is ~aintainedlinking the free records in the blocks. Initially, the system starts with zero blocks and an emptyfree list, The first allocation request for a record of type T finds the free list empty, allocates a block of type ARRAY O F T, creates a free list linking the free records using the space in the same free records., andhandsoutthefirstrecord.Subsequentallocationrequestsnormallyobtaintheir records directly from the free list, which speeds up the memory management considerably. ecords can be r e t u ~ e dto this system and are then reattached to thefree list. Blocks are never returned to the standard memory manager. In principle they could be, if an entire block came to consist of free records only, but detecting such an event would require much more ad~inistrationand the total gain would be limited, as observed in Exercise 5.5. Figure 5.4 showsoutlinecode for theroutines New T () and Free T (Old record) . The global variable Free l i s t f o r T is used to hold the beginning of the linked list of free records. When a new record is requested, Free 1i s t f o r T is tested to see if a free record is available. If it is not, a new block is requested from Nalloc ( ) and the records in the newly obtained block are linked into the free list, which is then no er empty. The requested record can now be unlinked from the head of the free list, Free T () ibly zeroed, and a pointer to it be delivered to the caller. The routine links the record to be freed back into the free list; the record is not marked or cleared in any way and if the program uses the pointer to it afterwards, chaos will ensue. The struce free list,as it winds through the allocated blocks,is sketched in e free listisconstructedusing link fields in therecords; si present only in free records., they do not need allocation space of their own and can be stored in the records, in the same space where user data resides when the record is in use. This techni~uerequires the records to be at least as large as a link pointer, a condition that is almost certainly fulfilled for records used in a compiler. It will be clear that this code can easily be generated automatic all^ for all record types 7' used in dynamic allocation in the compiler or in the compiled program.
~~~y is an array to which elements can be added at the high-inde~end. Extensible mays are used in several places ina compiler: in hash tables, while reading and storing input the length of which is not known in advance, while constructing executable program images in the assembler, etc. Their use in run-time systems is perhaps less common but not exceptional; one application is the event queue in simul~tionprograms, when this queue is l~aintainedby heap sort. Extensible arrays have advantages over linked lists w ~ e nthe elements are small, never change their order,or are never freed again. For exam-
SET Free list for T TO No T; FUNCTION NewT ( ) RETURNING a pointer to T: an IF Free list for T = No T: // Acquire a new block of records: SET New block [l .. Block factor forTI TO Malloc (Size ofT * Block factor forT ) ; // Construct a free list in New block: SET Free list for T TO address of New block [l]; FOR Record count IN[l .. Block factor forT - 11 : SET New block [Record count] .link TO address of New block [Record count + 11; SET New block [Block factor for TI .link TONo T ;
// Extract a new record from the free list: SET New record TO Free list T;for SET Free list for T TO New record .link; // Zero the New record here, if required; RETURN New record; PROCEDURE FreeT (Old record) : // Prepend Old record to free list: SET Old record .link TO Free list T; for SET Free list for T TO addressof Old record; ~
i 5.4 ~Outline ~ code r for blockwise ~ allocation of records of type T
ple, storing the characters of a string in an extensible array is much more attractive than storing them ina linked list, The simple-minded approachto extensible arrays would be tofirst allocate an array of reasonable size, and then, when the array turns out to be too small, to extendit with a reasonable increment, and to repeat this process as needed. Doing so, however, yields a quadratic algorithm, as can be seen as follows. We write the cost of allocating an array of size n as C(n), When we try to increase thesize of the array, say withp bytes, we will find that there is not always room for this at the end of the array. In that case we will have to allocate a new array, at a cost a,which we suppose is independent of a. We then have to copy the contents of the old array; this will have a cost proportional to n, say yn. In C, the function realloc ( ) does all this for us; other systems usually have similar facilities. This leads us to the following recurrence relationfor the cost of obtaining a new array of length n +p:
C(n+p)=C(n)+a+yn
The recurrence relation is solved by
a
‘I
free-list-Elem
block l
block k
F i ~ 5.5 ~ rList ~ of free records in allocated blocks.
w h i c ~can easily be verified by substitution, and which shows that the processis quadratic ~n n. o ~ u n a t e ~ity ,is possible to extend mays in linear time. The trick is to increase the size of the array by a constant factor rather than by a constant amount. When needs to be extended, we allocate a new array that is P times as big, with p> l . A allocation has constant cost a. We then copy the contents of the old array into the new array, at cost yn. This yields the recurrence relation
c(Pn)=c(~)+a+Yn which has the solution
his can again be verified by substitution, and shows that the process indeed requires time is negligible. Intuitively speaking, each reallocation step mear in n; the logarith~ic tern takes p times as much time as the previous one,but the results lastp times as long. A disadvantage of almost any scheme for extensible arrays is that sooner or later the array bas to be moved. ~onsequently,no pointers can be kept to entries in the array, and elements of the array using an index involves loading the address of the array first rather than using a fixed starting location. Fortu~ately,the C syntax allows the same construction €or both situations: itis ‘referentially transparent’ in this case since the index-
.2. I
ing construct A [i] will work both if .& i s a fixed allocated array o pointer to a dyna~icallyallocated array of type 7'. The basic allocation scheme embo~iedby malloc ( ) and f allocation, and the extensible array t e c ~ n i ~ all u e have one thin explicitly indicated by the user. Explicit deallocation is a proble will now turn to auto~atic~ e t h o ~ s mer and compiler alike. explicit deallocation. ~~
blocks in handwritt~ncode gram. Correctly freeing ~npredictablelifeti~esof Many p r o g r a ~ e r shave experienced the conse~uencesof freei by ~ s t a and, ~ e later o possibly reused - data.
- the dangling pointer e~oneousl y; - after ~ereferencingthe dangling pointer, the p r o ~ r amay ~ proceed with the incorre~t data for some time before an inco~sistencyis ~ ~ s e r v e d ; - the error may be hard to reproduce, because the exact same sequence of memory alloeations and deallocations may be required to cause the observed problem in interactive systems. erefore, garbage collection is co~sideredto be an i ~ p o ~ afeature nt of ~ o d e r n m n g s~stemsthat reduces p r o ~ ~ a ~ ~ ne ififno g~ considerably. s rning systems that offer ~ ~ r b colle a ~ e ~ m a l ~ t afunctional l~, la~g~~ages scripting languages like awk and
fied in this chapter by two techniques, mark and scan and ~ o - s p copying. ~ c ~ The thee techniques are all very different: eference counting directly identifies gqrbage chunks. It is simple and reasonably efficient but requires all pointer actions tobe monitored during program execution and may cover all garbage chunks. andscanidentifiesreachablechunksandconcludesthattherest is garbage. It is reasonably efficient and does not require pointer monitoring, butis quite complicated. It is the only algorithm that will recover all available memory, -Two-space copying is not concerned with garbage. It copies the reachable chunks from a memory region called ‘from-space’ to a memory region called ‘to-space’; the rernainspace in to-space is a single free chunk. It is very efficient, does not require pointer monitoring andis moderately complicated but wastes half of the memory. ~ n c e g ~ b chunks a g e have been identi~edby these techniques they must be turned into free memory space to be useful. Garbage chunks found by reference counting or mark and scan must be returned to the free list by the a~gorithm.Two-space copying automatically creates a fresh free list consisting of a single large chunk that includes all of free memory. ~ocatingall free chunksandaddingthem to thefreelist is notalwaysenough: it allows us to get more chunks in the free list, and we may even succeed in coalescing adjacent free chunks, but it leaves the free memory fragmented in a number of free chunks separated by chunks in use. This phenomenon is called user requests a chunk with a size that is larger than the l m e ~ o r yallocator is unable to supply that chunk, even though the total size of free ~emory may be larger or even much larger than the request. To obtain the free memory in its most valuableform,asinglefreechunk,weneedto do c ~ ~ ~ ~ omp c pact t ~iono moves ~ . the reachable chunks to one side, thereby moving the free chunks to the other side; the latter can then easily be coalesced into a single free chunk; it is more complicated and timecon sum in^ than just freeing the garbage chunks, butit avoids fragmentation and is the best recover all unused memory. hemainproblemwithcompactionisthatitinvolvesmovingreachablechunks, m a y contain pointers to other chunks which are also moved. Needless to say, this has to be orchestrated carefully; a technique for doing so is explained in Section 5.2.6. eference counting and mark and scan can optionallybe followed by a compaction phase; wo-space copying does compaction automatically. abage collection a l ~ o r i t h come ~ s in three v~ieties: :the garbage collector is started, runs to comp~etionwhile in full control of all chunks, and then returns, leaving behind a hopefully improved situation. Since the gare collector is in full control when it runs, it can be fairly simple, but its unexpected activation can be disruptive. This is not much of a problem inside compilers, butit may ‘ . e applicatio~l~rogram ): some garbage collector actions are performed at each call of Mall OG andlor Free. These actions make some local modificationsto the chunk structure to increase the probability of finding a free chunk when needed. On-
the-fly garbage collectors are usually much more difficult to construct than one-shot garbage collectors, but are smoother and less disruptive in their operation. Also, they may still need a one-shot garbage collector as back-up for situations in which they cannot cope with the demand. -~ o n c ~ r r e nthe t : garbage collector runs on a second processor, different from the one that runs the program. It runs continuously and concurrently, and tries to keep memory garbage-free. ~nfo~unately, concu~ent garbagecollection is sometimesalsocalled on-the-fly, in spiteof the fact that this term suggest one agent rather than two. Referencecountinggarbagecollection is an on-the-flyalgorithm.On-the-flyandconcurrent variants of mark and scan garbage collection have been known since the late 1970s (Dijkstra and Lamport, 1978; Ben-Ari, 1984), but have not seen wide application. We will restrict ourselves here to reference counting and the one-shot variants of mark and scan and two-space copying. Garbage collection needs considerable preparation and support in the compiler, which will be treated in the next section. We will then discuss the three garbage collection techniques mentioned above: reference counting, mark and scan, and two-space copying (Sections 5.2.3 to 5.2.5), followed by an explanation of compaction (Section 5.2.6). It has been foundadvantageoustorestrictmost of thegarbagecollectionactivationstothemost recently allocated set of chunks, the newest generation. A discussion of this o p t i m i ~ i n ~ approach, called ‘generational garbage collection’ (Section 5.2.7), concludes this section and this chapter.
A chunk is only reachable by the program if the program directly has a pointer to it or can use a pointer to reach the chunk indirectly. The pointers that are directly available to the program can be located in various places, depending on the implementation. These places may include the global variables, local variables, routine para~eters,registers, and perhaps others. We will use the term r o ~ r data a ~ area for the non-heap memory that is directly accessible totheprogramcode. The set of pointersin the program data areawillbe referred to as the root set. It should be noted that the rootset is a conceptual notion rather than a data structure; it is the set of all pointers in the program data area, nota list of their values. The root set is usually not imple~enteddirectly but is conceptually present in the program code of the garbage collector. The pointers in the root set may point to chunks in the heap, which is under control of the garbage collector; such chunks are then reachable. Reachable chunks in the heap can, of course, contain pointers that point to other chunks in the heap, which are then reachable as well. This decomposes the problemof finding all reachable chucks -and so the problem of garbage collection -into three subproblems: 1. determining the rootset by finding all pointers in the program data area, with their types; 2. finding all pointers in a given chunk, with their types; 3. finding all reachable chunks using thei n f o ~ a t i o nof 1 and 2,
ubproblems l and 2 require knowledge of the pointer layout of the program data area and of each chunk type. The garbage collector needs compiler support for this; techniques to supply the required i~formationare covered in the next section. Solutions to subproblem 3 ~onstitutethe garbage collection algorithms proper, and are usually im~lementedas runtime system routines. A constellation of a root set and a heap with reachable and unreachable chunks is shown in Figure 5.6; chunks a and c are reachable from the root set, d and fr' are reachable fromtheheapthrough a and c respectively,andchunks b and e are ~nreachable.The first pointer from c points into the program data area.
heap f " " " " " " " " " " " "
I I
I
I I
I l I I I
I I
I I l I I I
I
1""""""""""""~
" I
I I I
I
ure 5.6 A root set and a heap with reachable and u n r ~ a c ~ a chunks. b~e
nce counting, which is not based on the concept of reachable chunks, does not require a root set, but it muststill be able to find all pointers ina given chunk. Since the garbage collector will, in principle, follow any pointer, there is still another re uireme~tthatthecodegeneratedbythecompiler has toobey:allpointersmustbe hich means that they must oint to locations the garbage collector can again interinter validity is also called ointer co~sist~ncy.
compiler he rmation pro to has about pointer thelayout of each ~ ~garbage the to u ~ k c t of a chunk C describes position the of each pointer Pthe in c of chunk thethat P points The to. compiler also has to make sure that all reachable pointers, both in the p r o ~ r data a ~ area and in the heap, are valid when the garbage collector is activated. Providing the root set requires e of the poi~terlayout of the program data area, and it was already clear that we also need the pointer layoutof each chunk type. Ac~ievingpointer vali~ityis relatively simple. A pointer can only be invalid when it
is uninitialized or when it has been ruined by assi compiler m ~ e sure s that each newly allocated po correctly, no bad pointer valuesof the type pointer type-secure, no valu S of other types can be assi validity is assured, f thesourcelanguage is not achieved and garbage collection can only be based servative garbage collection; conservative garbage this section. ~ e t e ~ i n i nthe g layout of chu for specifying the layout to theg layout of the p r o ~ r data a ~ area the next two sections. The compiler is in full control of the layout of chunks, so the only transfertheknowledge of thepointerlayouttothe answer to an important questi e if having a pointer to it is sufficient to ob of the chunk pointed to. There are two ways for chunks to be se1 carry their pointer layout i n f o ~ a t i o nin each copy, either direct pointer to a shared desc~ptor,or they can all have the same layo arises for examplein Lisp inte~reters,inwhich each chunkconsists pointers, one to thecar field and one to thecdr field of the Lisp cell. There are several ways to make the pointer layout of chunks avail collector.
of exactlytwo
-The compiler can g~neratea bit map for each chunk type, speci.fyin a chunk of that type are pointers to other chunks. With this me self-descriptive, sincejust having the pointer must be sufficientfo to continue. So each chunk must either contain its bit map or a This method requires the collector to i n t e ~ r e the t bit map, whic sive. -The compiler can generate a specific routine for each chunk type, which calls a collector routine passed as a parameter for each pointer inside the eh avoids the run-timei ~ t e ~ r e t a t i oofn bit maps and the need forself-de code can pass the type of the chunk tog er with the pointer to th routine. It also makes handling any chu f a type that does not CO operation,probably a usefuloptimizatiproblemwiththisme tines tend to be recursive, in which case they require an u n ~ e t e ~ i ~ a e ~d o of~stac ~ t space, which may or may not be available when the garbage colle is one of the rare cases where a significant portion of the run-ti erated automatical~y. -The compiler can org~nizethe chunks to start off with an followed by the other data types. With this organization, t the location of the pointer array and the totaln u ~ of~pointers ~ r pointer array is the first field and the total n u ~ b e rof pointers
administration, the chunkis self-desc~ptiveat low cost. Collecting the pointers inside a consecutive region is the most efficient method since it requires the fewest cross calls between compiled code and collector. The language definition may, however, not allow thereorgani~ationof the data items inside a chunk. In addition to the pointer layout, the collector has to know the size of each chunk. Similar considerations as with the pointer layout apply. The technique of specifying the pointer layout by routine is described extensively by (1991), for a strongly typed source language.
e
~
ithe~~
~ i ~ila ~ yo^^ ~ t~~t~~ e ~~
r
o data ~ ~r~~ r ~
~
The root set is usually supplied by running a library routine that scans the program data area and calls a specific garbage collection routine for each pointer the program data area contains. It is thenuptothespecificgarbagecollectionroutinetosee if the pointer is interesting and perform the proper actions, To perfom its task, the l i b r ~ yroutine must be able to find all pointers in the programdata area, with their types, and be sureeach pointer is valid; in short, it needs the pointer layout of the p r o g r a ~data area. The problem is that the pointer layout of the program data area, unlike that of chunks, is complicated and d~namicallyvariable. The program data area usually consistsof the lobal data area and a stack holding one or more stack frames or activation records. The pointer layout of the global data area is known and constant, although it may be distributed over several source program modules. To know the pointer layout of thestack, the garbage collector has to know which activation records it contains, and what the pointer layout of each activation record is. i~formationare dynamic, so activation records must be self~desc~bing. This is, however, easier said than done: the contentsof the activation records changedynamica~ly. Themainobstaclesto a pointer-validpr m data area of known layout arethe stacksandparameter lists underconsionwhichmayoccur ~ o r ~ stacks n g maycontainpointersin dyna~icallydifferin dyna~~ically differing types depending on the progress of the program, and parameter lists do not correspond toa data type in the language. Also,para~eterslists are sometimes constructed on the working stack, which combines the problems. In both cases the solution lies in defining anonym s data types for all st~cturesthat can occur, and keeping track of which one is current. r example, the parameter list of a routine with three formal parameters P R O C E ~ ~ RThree E parameters ( an integer I, a tree pointer Tp, a graph pointer Gp 1
can be described bya record type
TYPE
-anonymous-0001: an integer I; a tree pointerTp; a graph pointerGp;
To get guaranteed pointer validity, the record must be allocated in its entirety before the actual para~etersare evaluated, and the pointers in it must be set to null; they are then updated as the actual parameters are obtained. Adding this feature to an existing compiler may require considerable work. The same technique must be used for the working stack but there the situation is worse: whereas parameter lists are fixed and appear in the program in the form of formal parameter lists, thedatastructuresthatappearontheworkingstackaretheresults of pushes and pops generated by the code generator, after it has optimized the target code. One source of relief in this awkward situation is that pointer validity is only required at points where the garbage collector can indeed be activated; this is at an explicit use of an allocator and at a routine call. Fortunately, these are exactly the points at which a code generator tends to make sure that all values from registers are stored safely in memory.So we only haveto define anonymous data typesfor the working stack con~gurationsat these points. Still, for a large routine with many expressions that manipulate pointers, this can amount to a considerable number of anonymous data types. Also, the actual data type of the working stack has tobe updated at run time. Similar techniques can be used for data structures with dynamically changing types, for example unions. Another approach to pointers in the working stack can be followed in implementations in which the working stack is used only to impl~mentexpressions. The technique is simple: make sure the working stack is always empty when a garbage collector activation can occur. In fact, it has to be empty when any call occurs, since any call might in the end cause a call to the garbage collector to be made. To achieve this situation, all calls, and all other actions that can have sideeffects are moved to positions before the expression, using temporary variables. For example, an expression a := (b :=
c(3))
+ d(new Tree);
in which the assignment (:=) inside the expression yields the value assigned and new is the record allocator,is transformed into b := ~ ( 3 ) ; tl := new Tree;
t2 := d(t1); a := (b) -c t2;
In effect, the ‘difficult’ partof the working stack has been moved to the local variable stack. Since the data type of the local variable stack is constant for a given routine, only one anonymous data type needs to be generated. Some languages and compilers just cannot supply the pointer layout infor~ationand consistency required by garbage collection. An example of such a language is G, in which in principle anypiece of data can be stored in any typeof variable by using the proper cast.
o a c c o ~ o d a t esuch languages and such compilers, another approach to the pointer layout and consistency problem has been devised. In this approach any sequence of bytes in ram data area and theheap that contains a value that could representa valid pointer is taken to be a valid pointer, nce this drastic idea hassunk in, two things will become clear. Thefirst is that some ed pointers may actually be fake. This has two consequences: we have to be very careful in following pointers; and we may occasionally reach a chunk by accident that could not have been reached by following correct pointers. This may cause subsequent orithms to retain chunks that could be freed, The second is that if all pointers to nonfree chunks indeed reside in the program data area and the heap, we will find all non-free This prevents the algorithm from freeing chunks that should be retained. e approach described here is known as ~ ~ ~ s e ~ ~ ~ t i ~ e ~ ~ n s e r v at e sleast as many chunks as it should and probably more; the idea was introduced m and Weiser (1988). Its implementation requires considerable care; pitfalls are (1990) and Boehm (1993). Although it has been applied successdiscussed by ~ e n t w o ~ h fully in some systems, it is still regarded as experimental, and we will not discuss it any further here.
ss e ~ollectionis anoldandmaturesubjectandmanytechniques,rangingfrom st~aightfor~ard to inge~lioustodownrightsneaky,havebeeninventedinthecourse of time. A full treatment of garbage collectors could easilyfill a book (at least one such book exists, Jones and Lins (1996)), so we will make a number of simplifying assumptions for the benefit of the following discussion. We assume that the garbage collector operates on the same memory s t ~ c t u r eas and in cooperation with the simple malloc/free mechanism described in Section5.1-1.We also assume that it is possible both to free blocks explicitly and to have them freed automatically by a garbage collector. This is useful in implementations in which the ~ n - t i m esystself uses allocated blocks and knows when tofree them safely; it also allows the garcollector to free blocks through this mechanism, thus decreasing the comple~ityof Even with our simplified malloc/free model, the following constitutes an outline only; t under the rug. For one thing we assume that all pointers point to the b e g i n n i ~of~chunks -or at least to a fixed position inside of them. In some languages r a ~ e can r construct a pointer to a field inside a record; it must then be possible to r e c o ~ s t ~from c t such a pointer the pointer to the all~catedchunk into which it points. Actual garbage collectors are full of details required by the peculiarities of the data types of the implemented language, including mays, unions, and sets of types that may contain pointers. And last, and in this case probably least, in addition to data pointers the admi~istrativepointers that connect activation records must be followed. These pointers are essential especially in functional languages and in imperative languages that feature coroutin~s,functions as first-class citizens, or tasks. We also assume that all chunks are self-descriptive, either because they have e ~ o u g h information in them to retrieve the pointer layout,or because they all have the same structure. The algorit~msbelowcanusuallybereworkedeasilytooperateonnon-self-
descriptive data, but some benefits of the algorithm may be lost in the process. As said before, the discussion below is concerned with the basic algo~thmsonly, which are relatively simple; actual garbage collectors tend to be complicated and hairy, and, which is worse, error prone. The reasons for this additional complexity are twofold. The first is that the basic algorithms assume the user data to consist of chunks of known size and type, with all pointers initialized. Worse, many memory management publications in the computer science literature use a model in which chunks are all of the same type and in which the heap and the program data area have the same uniform structure. Reality is different. Actual languages have uninitialized pointers, arrays of dynamically alterable size, unions which have dynamic types some of which may contain pointers, exception handlers which may ~nvalidateassumptions about what is on the w o r ~ n gstack, tasks that may update data structures concurrently and other surprises. Also the structures of the program data area and the heap are usually vastly different. There is no fundamental reason why all these complications could not be inco~oratedcorrectly in an existing garbage collection algorithm, but doing so requires considerable care and increases the complexity of the code. The second reason is that garbage collection is intuitively considered a waste of valuable computer cycles by many users, and there is pressure on the compiler writer to optimize the garbage collector heavily. Given the baroque structure of the data types in most languages, there is indeed often room for spectacular optimi~ation;for example, no wing that an array consists of elements that do not contain pointers and so does not need to be scanned will improve the speed of the garbage collector. Also, the target code optimizer may optimize away code that is essential for the garbage collector. It may, for example, decide in a late stage that it can hold a variable in a register all the time and delete all memory accesses to the location reserved for it. This leaves an uninitialized pointer location, which wiI I derail the garbage collector, unless the pointer layout is also updated. Again, this can all be ~ r o g r a ~ correctly, ed but tends to heap complication on complicatio~. Each optimization, both of the garbage collector and of the generated code, is a threat to garbage collection correctness, and has to be designed and i ~ p l e m e ~ t ewith d the utmost care. Garbage collectors are notoriously difficult to debug, since they are doing surgery on the nervous system of the running program - the pointers in the activation records and in user data structures in the heap. For the same reason, a bug in the garbage collector will cause unpredictable and incompreh~~sible failure of the user program. Conversely, once a compiler has a name of having a buggy garbage collector, any failure of any user program using the compiler will first be blamed on the garbage collector, before any other source of error is considered. Even more than is the case with other parts of a compiler, it is important to get a garbage collector right before making it fast. We will now turn to the actual garbage collection algorithms.
is an intuitive garbage collection algorithm that records in each chunk the number of pointers that point to it; when the number drops to zero the chunk can be declared garbage; in a literal sense, reference counting collects garbage, unlike the other
garbage collection algorithms, which actually collect reachable chunks. In line with the name 'reference counting', we will call pointers 'references' in this section. The reference count is updated inseveral actions on the chunk. When a chunk is allocated from the heap, its reference count is initialized to one. Whenever a reference to the chunk is duplicated, its referencecount is increased by one ('incremented'). Likewise, whenever a referencetothechunk is deleted, itsreference count is decreased by one ('decremented'). If the reference count drops to 0, the chunk can be freed because it is no longer reachable. Figure 5.7 shows a heap with a number of chunks with reference counts and the references connecting them; some references come from the global data area. Note that there are no chunks with reference count 0.
program
data area""""""""""""r
heap l
d
I I
I I I I
I
I I I I
l l l
I I I
l
I I I
l.
Figure 5.7 Chunks with reference count ina heap.
Simply returning the chunk witha zero reference count to thefree list is not enough to reclaim all garbage, since it may contain references to other chunks that may now become garbage as well. For example, by deleting the reference to chunk b in Figure 5.7, chunk e also becomes garbage, but f remains in use, since there is still one reference left and its reference count does not drop to0. The resulting constellation is shown in Figure5.8. The twomainissuesin implement in^ reference counti~garekeepingtrack of all referencemanipulationsandrecursivelyfreeingchunkswithzeroreferencecount. The compiler plays an important role in keeping track of references; the recursive freeing is delegated to a run-time routine. We will now consider both. The compiler inserts special code for all reference manipulations: incrementing the reference count whena reference to a chunk is duplicated and decrementing it when such a reference is deleted. References are typically duplicated as an effect of some assign~ent in the source language. It does not matter what the targetof the assignment is: it may be a variable in the program data area, a field in a dynamically allocated data structure, etc. In
program data area""""""""""""r l
heap I
a
d
I I I l
I I I
U I _ I
I
I
I I I l
I I I I I I I I l
Figure 5.8 Result of removing the reference to chunk b in Figure 5.7.
each case the reference to the chunk is duplicated, so its reference count must be incremented. Besides assignment statements, the compiler also has to add reference increasing code to parameter transfers, since a reference that is passed as a parameter is effectively assigned to a local variableof the called routine. Note that not all references in the running program are references to chunks on the heap; many of them point to blocks in the program data area, and all reference-counting code must make sure it does not follow such references. References to chunks are typically deleted implicitly by assignment statements. An assignment to a reference variable overwrites the current reference with a new value, so beforeinstallingthenewreferencethereferencecount of thechunkaddressed by the current reference should be decremented. Figure 5.9 shows an outline of the code that must be generated for the pointer assignmentp :=g;when performing reference counting.
IF Points into the heap (q) : Increment q .reference count; IF Points into the heap :(p) Decrement p .reference count; = 0: IF p .reference count Free recursively dependingon reference counts (p); SET p TO g; Figure 5.9 Code tobe generated for the pointer assignment p :=g.
ta ~ l l o c a t i ~ ~
~ith
i~~licit
The other source of reference deletions is returning from a routine call. When a routine returns, all its local variables are deleted. A local variable holding a reference to a chunkshould be processed to decrement the associatedreferencecount,andthesame applies to parameters that hold references. We have seen that the proper way to reclaim the memory allocated to a chunk is to first decrement recursively the reference counts of all references contained in the chunk, and then return it to the free list. Recursion, however, is an unwelco~efeature in a garbage collector since it requires an unpredictable amount of stack space. ~ e p e n d i non~ the run-time system andthe operating system, thisstack space may be part of the ~ r o g data r ~ area, part of the heap, or a separate memory segment; in any case, its size is limited and may not suffice for the operation of the garbage collector. Having a garbage collector fail for lack ofmemoryiskindof e ~ b ~ a s s i n though, g, and several techniques have been invented to avoid the problem. The best solution is using pointer reversal, which will be explained in Section 5.2.4.3. A simpler but still adequate solution is suggested in Exercise 5.1O(c). We will discuss here briefly a very simple trick that does not solve the problem completely, but usually alleviatesit sufficiently: tail recursion elimination. The trick is to use repetition rather than recursion for the last pointer in the chunk to be freed. This avoids using the stack for freeing chunks that form a linear list. An outline of the code is given in Figure 5.10, in which Point er references the chunk to be freed recursively.
PROCEDURE WHILE IF IF
Free recursively depending on reference counts(Pointer); Pointer /= No chunk: NOT Points into the heap (Pointer) : RETURN; NOT Pointer .reference count = 0: RETURN;
FOR EACH IndexIN l .. Pointer .number of pointers - l: Free recursively depending on reference counts (Pointer .pointer [Index] ); SET Aux pointer TO Pointer; IF Pointer .numberof pointers = 0: SET Pointer TO No chunk; ELSE Pointer .numberof pointers > 0: SET Pointer TO Pointer .pointer [Pointer .number of pointers]; Free chunk(Aux pointer); // the actual freeing operation Figure 5.10 Recursively freeing chunks while avoiding tail recursion.
Reference counting is a simple technique that can be i m ~ l e ~ e neasily t e ~ by modifyingthecompilertoperformreference m~ipulationasoutlinedabove. ~nfo~unately, referencecountinghassomeseriousdrawbacksthatlimit its applicability. First,and foremost, reference counting cannot reclaim cyclic data structures. Consider the example in Figure 5.8, in which the reference count of chunk a is 2. If the reference from the root
set to chunk a is deleted, the reference count drops to l so the chunk is not reclaimed, as shown in Figure S. 31. By definition, however, chunk a has become garbage since it is no longer reachable from the root set. Reference counting cannot reclaim chunk a because chunk d, which has also become garbage, holdsa reference to it.
program data area -"-""""""""""r
U
n
l I I
heap I
a
d
I I l
I I I
I I I
I I I I I I
I I I I I
I I I
I l.
ure 5.33 Reference counting fails to identify circular garbage.
The problem with reference counting is that it takes its decisions by considering only one node in the graph at a time, and in order to reclaim a cyclic data structure all nodes in the data structure should be considered as garbage together. Once reference counting has failed to reclaim a cyclic data structure, the chunks involved will never be reclaimed. This has the unfortunate effect that free space leaks away, which might evencause the program to run out of free space when other garbage collectors would be able to reclaim the cyclic structures and allow the program to continue. The second problem with reference counting is efficiency. The compiled code has to monitor all reference manipulations, .and each and every reference manipulation requires the adjustment of the associated reference counts. This is a considerable overhead in comparison to other garbage collection techniques that do not monitor any pointer action and reclaim garbage chunks only when needed (see Sections 5.2.4 and 5.2.5). The final problem with reference counting is memory fragmentation. The free list is augmented with the reclaimed chunks, but it remains fragmented. In principle, doing a compaction phase during a reference counting allocation request is possible, but few reference counting garbage collectors go to such lengths. Despite its problems, reference counting is a popular technique for managing relativelysmallnumbers of dynamicallyallocated data st~ctures,usuallyinhandwritten to handlethe software.Forexample, a TJNIX kerneltypicallyusesreferencecounts recovery of file descriptors. Its use in generated code is less widespread.
~~~
S ~ C T5.2 I ~Data ~ allocationwith
i ~ ~ l i c~eallocation it
and scan garbage collection algorithm described in this section is the most effective one in that it frees all memory that can be freed (reference counting fails to free circular structures, two-space copying leaves half of the memo^ unavailable). It is often combined with compaction, and then also provides the largest possible chunk of memory available. The mark and scan algorithmis also sometimes called mark and sweep. The mark and scan garbage collection algorithm consists of two phases. The first, the mark in^ phase, marks all chunksthatare still reachable; thesecond,the scan phase, scans the allocated memory and considers as free chunks all chunks that are not marked reachable and makes them available again. We will now consider the marking and scanning phases in more detail.
ark in^ is based on two principles: chunks reachable through the root set are
reachable d anychunk reachable from a pointerin a reachable chunk is itself reachable. We assume that the root set resides in a program data areaor the topmost activation record, and that a data type description for it has been constructed and made available by the compiler. Now, in its simplest form, marking marks the program data area as reachable, finds the pointers in it using its data type description, and recursively marksall chunks pointed to by these pointers, in a simple depth-first scan. If this recursive process finds a chunk without pointers or a chunk that has already been marked,it backtracks and continues with the next pointer. Since the number of reachable chunks is finite and no chunk is processed more than once, this depth-first scan terminates and takes a time linear in the number of reachable chunks. Marlsing requires a second bit in the administration header of the chunk, the ~ a r k e d it, in addition to thefree bit. This bit starts off as ‘cleared’. The main problem with this recursive process is thatit needs a stack of unknown size, and the question is where to find room for this stack at a time when memory is in short supply, witness the fact that the garbage collector was activated. It has been suggested to us that in this day and age, in which computers easily have stacks of 1 megabyte or more, the question is no longer relevant. It is our experience that each generation of designers againsuggeststhat ‘X iscertainlyenough’,andthatsuggestionsliketheseleadto 3 boundaries. machines with built-in640 H The simplest answer to the questionof where to put the stack is: reserve room for the marking stack in each chunk. One pointer and a small counter are sufficient: the pointer points back to the parent chunk which contains the pointer that caused the present chunk to be processed, and the counter counts how many pointers have already been processed in the p~esentchunk. See Figure 5.12, in which the third child is being processed. Thistechniquecostsroom for onepointer,one counter, andonebitperallocated chunk. Whether this is a problem or not depends onthe average size of a chunk and on the ease with which the counter and bit can be accommodated. In a Lisp interpreter, in which chunks (called ‘cells’ there) contain two pointers only, the overhead is more than 3396, even though the counter can only have the values 0 and 1; in modern systems, in which the minimum allocation size is perhaps 16 or 32 bytes anyway the overhead is often consider-
to parent
t
2
3
4
Figure 5.12 Marking the third child of a chunk.
ably less. For a way to mark the directed graph of all reachable chunks without using space for the extra pointer, see Section 5.2.43.
eein Just freeing the unreachable chunks is now easy: using the lengths noted in the chunks we step through memory, from chunk to chunk. For each chunk C we check if it has been marked reachable; if so, we clear the marked bit of C, and if not, we turn on the free bit of C. We can also exploit this left-to-right scan to combine adjacent free chunks. To this end, we keep a pointer F to the first free chunk we find and note its size. As long as we keep on meeting free blocks, we just add up their sizes until we run into a chunk in use or into end of m e ~ o r y .We then update the administration of the chunk pointed to by F to the total size of the free chunks, thus creating a single larger free chunk, and continue our scan. We repeat this process as soon as we meet another free chunk, and so on. The result of a mark and scan operation is a heap in which all chunks marked in use are reachable and each pair of free chunks is separated by chunks in use, This is the best one can achieve without moving the chunks. A subsequent compaction phase can combine all the free chunks into one large free chunk, thereby improving the performance even further.
The above marking technique requires an overhead of one pointer plus a few bits in each chunk at all times, even when the garbage collector is not running. This overhead can be avoided almost completelyby pointe~~ e v e ~ sThis ~ l . ingenious technique has applications outside of the field of garbage collection, since it allows one to visit all nodes of a directed additional without using graph space for algorithm based on pointer reversal on isthe called also S after its inventors (Schom and Waite, 1967). The marking algorithm described above kept a pointer in each chunk C This pointer pointed to C’s parent P in the visitin process md would be the marking algorithm was finished with chunk C. Now, when the marki working on chunk C, it finds the pointers inC, which point to c ~ l of ~C e ~ one by one. Ima~inethat the muking algorithm has gone off visiting the say D; then after a while it will return from this visit. Upon r e t ~ i n gfr m a ~ ~ algorithm i~g can retain a pointer to D, the chunk it just resides in C in the B-th pointer field! This leads to the observ n-th childof C the contents of the n-th pointerfield in C are re
Parent
Figure 5.13 The Schorr and Waite a l g o r i ~ ~arriving , at C.
The ‘Schorr and Waite graph making a l g o ~ t h utilizes ~’ this r e d u n ~ to ~~ sto~e y the parent pointer which would otherwise go on the stack, as follows. Figure 5 situationwhenprocessingthechunk C starts. The a l ~ o ~ t h ~ ~ a i n t ~ n s pointers, Parent pointerand Chunk pointer; Chunk pointe chunk being processed, Parent pointer to its parent. Moreover, each a counter field which records the number of the pointer in. the c h u that ~ is b the counter starts off at 0 and when it has reached the total~ u r n ~of e rpointe the chunk has been visited completely. Weassumethattheprocessingofpointersin C proceedsuntil we pointer, which points to child D. In order to move to D, we shift the conte
Parent pointer
Chunk pointer
Parent pointer
Chunk pointer
Number of pointers in D
igure 5.15 About to return from D.
pointer, Chunk pointer, andthen-thpointerfieldin C circularly,using a tempoint Some er. care is required, since the pointer that porary variable 0l d parent addresses C changes in the middleof the code:
p /
pointer
pointer
l
U U
I
Number of pointers in C
Figure 5.16 About to return from C.
// C is pointed to by Chunk pointer. SET Old parent pointer TO Parent pointer; SET Parent pointer TO Chunk pointer; // C is pointed to by Parent pointer. SET Chunk pointer TOn-th pointer field inC ; SET n-th pointer field in C TO Old parent pointer; This results in the return pointer to the parent P of C being stored in the a-th pointer field in C, which. normally points to D, as shown in Figure 5.14. As far as Parent pointer and Chunk pointer areconcerned,thesituation is nowequivalent to that in Figure 5.13, when we first arrived at C, except that C is now the parent and D is the chunk going to be processed, Figure 5.15 shows the situation when we are about to return from visiting D; the only difference with Figure 5.14 is that the counter in D has now reached the total number of pointers in D. In order to returnfrom D to C, we circularly shift back the pointers:
// C is pointed to by Parent pointer. SET Old parent pointer TO Parent pointer; // C is pointed to by Old parent pointer. SET Parent pointer TO n-th pointer field inC ; SET n-th pointer field in C TO Chunk pointer; SET Chunk pointer TO Old parent pointer; // C is pointed to by Chunk pointer. and increment the counterin C. The whole fancy footwork is then repeated for the rz t-1-th pointer in C, and so on, until all children of C have been visited. We are then ready to return from C, as shown in Figure 5.16. Note that the return pointer to P has by now been stored in and subsequently retrieved from each pointer position inC. Returning to P from C, the marking algorithm is in a position similar to where it was when returning from D to C, Again, the marking algorithm makes sure it brings along the
pointer to its point of departure, C, to restore the pointer in P to C, which for the duration has been replaced byP’sreturn pointer toits parent. It should be pointed out that the above only describesa clever technique for avoiding a stack while visiting all nodes in a graph. To prevent looping on cycles in the graph, nodes must be marked at the beginning of a visit, and already marked nodes must not be visited again. The samemarkedbitcanthen be used during freeing, to indicate that a chunk is reachable.
The mark phase in a mark and scan garbage collector touches the reachable chunks only, but the scan phase touches all chunks. This makes it expensive, since most chunks have a short lifetime and a very large fraction of the heap consists of garbage chunks when the garbage collector is invoked. Two-space c o p ~ exploits i ~ ~ this important observation by avoiding scanning all chunks and only processing the reachable chunks; it manages to do so at the expense of doubling the memory requirements. With memory becoming cheaper every day, however, it has become reasonable to optimize for time rather thanfor memory. The The basictwo-spacecopyingcollector byCheney (1970) worksasfollows. available heap space is divided into two equal parts: the from-space and the to-space, as shown in Figure 5.17. During normal computation new chunks are allocated infrom-space by simply advancing a pointer through the from-space. When the heap space in the frornspace has been consumed, all reachable chunks are copied to the empty to-space by the garbage collector.
p r o g r a ~data area
program data area
11
from space
to space
I
to space
1
from space
Figure 5.17 Memory layout for two-space copying.
The two-space copying operation starts with copying the chunks in from-space that are referenced by pointers in the root set. The copies are placed in theto-space in consecutive positions, starting right at the beginning. The original of a chunk in the from-space is marked ‘copied’ and a forwarding pointer pointing to the copy in the to-space is stored in it; since the chunk has just been copied, its contents can be destroyed. No pointers are updated in the copies,so the pointers in themstill point to chunks in thefrom-space, Then the chunks in to-space are scanned from left to right for pointers to chunks in from-space, using a ‘scan pointer’. Suppose a pointer in a chunk P under the scan pointer
points to a chunk Q in from-space. Now there are two possibilities: in which case it contains a forwarding pointer which is used to upd Q is not marked ‘copied’, in which case it must be copied now. After the copy has been made,theoriginal Q is marked‘copied’and its contentsarereplaced by a forward in^ pointer to the copy. This processis repeated until the to-space co~tainsno more pointersto chunksinfrom-space.Then all reachablechunksinthe from-spac ave beencopied to to-space and all pointers have been updated to point into to-space. xt, the roles of the two semi-spaces are ‘flipped’, and ordinary computation is resumed. When scanning, we may also find that the pointer in a chunk P points to the program data area; such pointers need no attentio~and are ignored. The strict separation of program data and the heap allows the garbage collector to detect efficiently with one compare instruction whether the pointer refers to a chunk in from-space or in program data. The pointer cannot point toa chunk in to-space, sinceP has just been copied from from-space, which contains no pointers to to-space, and no pointersitinhave been updated yet.
r ” ” “ ” “ -
from-space
I
I I
I I I I l I
Figure 5.18 Initial situation in two-space copying.
This algorithm does not need a stack to keep track of which chunks contain pointers that still must be traced. Instead it uses a simple scan pointer in the to-space that shows which of the copied chunks have been scanned and which have not. Figure5.18 shows the from- and to-spaces at the start of a garbage collection operation, Figure 5.19 shows a
from-space
" " " " " " " " " " r
I
I
I I
I I 1 I 1
, '?
:
I I I I I
I
'l
scan pointer
ure 5.19 Snapshot after copying the first level.
snapshot after the three chunks directly accessible from the root set have been copied and scanning started, and Figure 5.20 shows the situation after four chunks have been copied and one has been scanned. The forwarding pointers have been shown as dotted lines with hollow m o w heads, to distinguish them from the 'real' pointers; in an implementation, the 'copied' marker, not shown in the diagrams, serves to distinguish between the two. We see that the collector copies the graph in breadth-first order, and compacts the chunks automatically (i~evitably italso compacts the arrows in. the dia~ram!). It is also easy to see that after all chunks have been copied and scanned, thefrom-space contains no interesting data my more and can be considered empty. A copying garbage collector addresses the fragmentation overhead and time complexity of the mark and scan garbage collector. The key insight is to move chunks so free space will always be consecutive, which makes memory allocation very cheap, since it can be pedormed by advancin~a global pointer. The main problems with two-space copying garbage collection are thatit wastes half of the heap and thatits performance is poor if the heap is nearly full. Two-space copying is a very effective but also very demanding garbagecollectionalgorithm.It is a goodchoicewhen ~ n - t i m espeed is importantand ~ e is not~scarce,o This ~ is fre~uentlythe case with advanced p r o g r ~ n paradigms g
from-space
" " " " " " " " " " " r
I
I
I
root set
I
' I
I
l
I
l
I
scan pointer
Figure 5.20 Snapshot after having scanned one chunk.
on modern machines, whichis where we find two-space copying mostoften.
cti Since the size of the largest free chunk may be considerably smaller than the sum of the sizes of all free chunks, it is useful to move all used chunks together. Doing so moves all free chunks together also, so they can be combined into one ma~imum-sizefree chunk. The compaction algorithm described below can be performed in any situation in which there are only two kindsof chunks, chunks marked in use and chunks marked free, and the pointers in the chunks marked in use are all consistent. Since this is true even if some of the chunks marked in use are actually unreachable, compaction can, technically speaking, be done at almost any moment: it is largely independent of garbage collection and is just a free-list improving technique. Needless to say, it is most effective just after the garbage collector has run, and in practiceit is often integrated with the garbage collector. It is user two-space copying, though. action is most simply done in three left-to-right sweeps through memory, using one additional pointer per chunk. We assume that the used chunks will be moved to the
lower end of the heap. For clarity, Figures 5.21 through5.23, which depict the compaction as if compactionmovesthechunksfrom'old'memoryto 'new' process,aredrawn memory, but actually the 'old, and 'new' in the diagrams designate the same memory, The first sweep calculates the addresses of the new positions of the chunks, the second updates existing pointers to pointto the new positions, and the third actually moves the chunks:
-Address caZc~Zation,see Figure 5.21. The chunks are scanned from low to high and
for C its new positionaftercompaction is computed;theaddress eachusedchunk corresponding to this position is stored in C's administration. Since the new position of the first used chunk is known (the lower edge of memory) and the sizes of the chunks are also known, the address calculationis trivial. -Pointer date, see Figure 5.22. The program data area and the chunks are scanned for pointers that point into the heap; each such pointer to a chunk C is updated to the new address of C found in theadminis~ationof C. -Move c ~ ~ n see ~ Figure s , 5.23. The chunksarescannedfromlowtohighthrough memory and each used chunk is moved to its new position, as found in its adrninistration. Since chunks will only move to the left (or stay in place), this can be done with a single left-to-right scan. All pointers within the chunks now again point to the chunks they pointed to before the compaction started. All of the memory after thelast used chunk now formsa single free chunk.
old
new
f
1
1
2
1
""""""""""".I""""""I
igure 5.21 Address calculation during compaction.
d frees are utilized by an optimization called gen-space copying, it is based on the phenomenon that most allocated blocks fall out of use almost immediately; conversely, if a block has been around for a while already, chances are thatit will stay in business for some time to come, This suggests restricting the -expensive -action of the garbage collector to the most recentlyallocatedchunksonly.Especiallyin a compactinggarbagecollectortheseare
address new
of 1
address new
of 2
old
1"-J
J
" " " " " "
new
I I
1
1
2
1
""""""""""".I""""""I
Fig~re5.22 Pointer update during compaction.
I
" " " " " " " " " " " " " " " " "
old I""""~""~
new
Figure 5.23 Moving the chunks during compaction.
easy to find: they all lie at addresses past the end of the last compaction. Only if garbage collection on this newest generation fails to free enough memory or to meet some efficiency criterion, is a full garbage collection pedomed. Many variants of generational garbage collection exist, some of great sophistication and efficiency, but in its basic form it divides the heap dynamically into two regions, one for an older generation of chunks and one for a younger generation; the routine mal loc ( ) hands out chunks fromthe younger region. As usual, a third region also exists, the program data area. Now a certain large percentage of the garbage collector activationsrestrict themselves to the younger generation region, usinga simple trick. The trick is to consider the older generation region to be part of the program data area, and collect only the chunks in the younger generation region, using any garbage collection algorithm desired. Since the program data area can be seen as an input parameter to any garbage collection process and since it can in principle have any shape and content, it is obvious that this unusual view of what constitutes a program data area still yields correct results. The comp l e m e n t ~smaller percentage of the garbage collector activations are full garbage collections, in whichthe program data area and the heap have their traditional roles. These serve to reclaim any chunksin the older generation region that have b e ~ o unreachable. ~e
Algorithms differ in the answers they give to the questions of what the exact difference between older and younger is, whether more than two generations are distinguish~d, exactly when and how often a full garbage collection is done, and what garbage collection algorithm is used.Theyareoftencombinedwithotheroptimizingtechniques,most of which again exploit statistical propertiesof chunk usage. For example, it is known that very few pointers to young chunks are stored in old chunks and the program data area, so some algorithms keep a list of such pointers to be used as a root set for garbage collecting the young region. This avoids scanning the possibly large program data area and old region upon such occasions, but requires m o ~ i t o r i n ~ all pointer assignments, ene era ti on^ garbage collection algorithms are the fastest and most efficient collection algorithms known. For an example of a mature, h i g h - p e ~ o ~ a n cgarbage e collection system using generational garbage collection, see Ungar (1984.).
This concludes our treatment of memory management with explicit and implicit deallocation. In summary, explicit deallocation is implemented using linked lists of free chunks, and implicit allocation is implemented by computing the transitive closure of the directly accessible chunks and freeingall chunks not in thisset.
-All compilers and many compiled programs need dynamic memory allocation -access to memory chunks the sizesof which are not known statically.
-Usually, all dynamic memory is allocated from a single contiguous segment of ~ e m o r y , called the data segment or the heap.
-Allocatingrequestedmemorychunks
is an ad~nistrativematter;freeing m e ~ o r y chunks that are no longer needed is both an a d ~ n i s ~ a t i vmatter e and a p r o g r a ~ m i ~ g paradigm issue. -Dataallocationwithexplicitdeallocation by the programmer usually uses a mal loc ()Ifree () -like interface, Data allocation with implicit deallocation requires a formof garbage collection. -Memory fragments are called ‘blocks’ if they are in the hands of the user, and ‘chunks’ iftheyarehandledbythememoryallocator. A chunkcontains a blockplussome administration. -Memory is allocated in a region called the ‘heap’. It consists of a contiguo~ssequence of chunks, each chunk marked with its size. A bit tells whethera chunk is in use or free. -A call of mal l oc finds a free chunk of sufficient size and returns a pointer to it, after updating some ad~nistration.A call of free ( ) frees the chunks supplied to it, and may coalesce adjacent free chunks into a larger one. -For increased efficiency, the free chunks can be linked into multiple free lists, one for each size interval.
-~ o m ~ i ldata e r st~cturesthat require dynamic allocation are linked
lists and extensible ked lists of records of a given size can be in~plementedefficiently by batchrds in larger blocks and ~aintaininga free list of them, ~xtensiblearrays l~entedefficiently by increasing the array sizeby a constant factor. on with implicit ~eallocationis very usef~lsince explicit deallocation is very difficult for the programmer to do; explicit deallocation in generated code is often sodifficultas to be infeasible. Also, m e ~ allocation o ~ errorsarehardtofindand correct. collection should free all chunks that the program will not use any more, but r the c~unksthat are u~~~eachable from the program dat ounting garbage collection detects garbage chunks and scan garbage collection id all ~nreachablechunks. Two tion copies all reachable chunks to a different space, to e chunks to the free list leaves free memory fr co~pactionis needed to upgrade thefragl~entedfree list into a sin algol~thmscan be one-shot, on-the-~y,or concurrent. bage collector needs a root set and the pointer layout of all reachable achable pointers must be valid, point to somethin rt is needed for all thee of these features. ted from the p r o g r a ~data area; pointer layout follows from the compiler must provide both to the arbage collector in tbe comachieved in a ty~e-securelanguage by properly initia~izingail
-The pointer layout of chunks is known to the com~iler;it can be su~pliedto the garbage r in the formof a bit map, in theform of a routine, or it can be fixed.
can be se~f-descri~tive, in which case the pointer layout can be obtained from context. d each chunk itself, inter layout must be s ~ ~ ~ lbyi ethe data area i s c o ~ ~ l i c a and t ~ dd y n a ~ i c a l ~ y v ~ i a b l e . e the stack with ~bitraryactivation records and the working stacks nder construction in these activation records. be created by the compiler for every possible tivation record that can occurduring p ~ o g r aexecution, ~ andcodemustbegenerateatkeepstrack of i n is cu~rent. er of required pointer layouts for activation records may be reduced by using k only for simple expressions. It is then empty when the es not allow the root set andlor the pointer layout to be provided or to be ac~ieved,a conservative estimate can be made of the root set in each chunk: any value that looks like a ~ointeris y, thisworkscorrectly; it may still leavesome
-Actual garbage collectors are complicated by baroque language
features such as arrays of dynamical~yalterable size, unions with dynamic types which contain pointers, and exception handlers. The effects of aggressive code optimization can also be a compli-
arbage collection records ineach chunk the numberof pointers that point to it; when the number drops to zero the chunk is garbage and can befreed. -To keep the reference countof a chunk current, it must be incremented when a reference to the chunkis copied and decremented when sucha reference is destroyed. - hen reference counting frees a chunk, the reference counts of all chunks it refers to mustbedecremented,and so on, recursively. A recursiveimplementationcanbe achieved using pointer reversal. -Reference counting cannot free unreachable circular data st~ctures.Also, monito~ng all pointer manipulation may be expensive. -Mark and scan garbage collectionis the: only algorithm thatfrees all memory that can be freed. It is often combined with compaction, and then also provides the largest possible chunk of memo^ available, The marking phase marks all chunks that are still reachable; the scan phase scans the allocated memory andfrees all chunks that are not marked reachable. -~ a r k i n gneeds a stack of u n ~ o w nsize. This stack can be distributed over the chunks, with each chunk having room for a parent pointer in the marking process. The scan phase can combine adjacent chunks to improve the quality of the free list. -The overhead of the room for the parent pointer can be eliminated by using pointer reversal, also called the Schorr and Waite algorithm. In pointer reversal, while visitinga child D of a chunk C, the parent pointer of C is stored inthe location in C which held the pointer to L). The visit toD returns a pointer to D, which is used to restore the pointer in C by swapping it with the parent pointer. This restores the parent pointer, which can then be swapped withthe location of the next pointer inC. Intwo-spacecopying, the availableheap space is divided into twoequalparts:the from-space and the to-space. New chunks are allocated in from-space by advancing a pointer. Upon garbage collection, all reachable chunks are copied from the to the empty to-space, while updating all pointers, including those in the pr area. -Two-space copying copies the graph of reachable chunks in breadth-first fashion: first the chunks that are reachable directly from the program data area, then those reachable in two steps from the program data area, then thosereachable in three steps, andso on. -When a chunk is reached for the first time, through a pointer P,the chunk is copied to the first free position in the to-space, and a forwarding pointer is left in the ori pointing to the copy; the pointer to the copy replaces P, When a chunk is reached t ~ o u g ha pointer ,no copying takes place and the forwarding pointeris used to replace
-
-
Q*
-When
all reachable chunksinthefrom-spacehavebeencopiedtoto-spaceandall pointers have been ~pdated,the roles of from-space and to-space are flipped, and ordinary computationis resumed. -Two-space copying does not needa stack; it avoids the fragmentation overhead and time
complexity of the mark and scan garbage collector. Its main problem is that it wastes half of the heap. It is a good choice when run-time speed is i l ~ p o ~ a and n t memo^ is not scarce. The size of the largest free chunk may be considerably smaller than the sum of the sizes of all free chunks; this is calledfragmentation.Compactionmoves all used c h u n ~ s together; this moves all free chunks together also, so they can be combined into one ma~imum-sizefree chunk. ompaction is a free-list improving technique rather thana garbage collection technique and is largely independent of garbage collection. It can, technically speaking,be done at almost any moment, butis most effective right aftera garbage collection. C o ~ ~ a c t i oisndone in three left-to-right sweeps through memory: thefirst sweep calculates the addresses of the new positions of the chunks in the heap; the second updates existing pointers in the program data area and the heap chunks to point to the new positions; and the third actually moves the chunks. ~enerationalgarbage collection restricts the expensive action of the garbage collector to the most recently allocated chunks only,since these are the most short-lived. In its basic form, generational garbage collection divides the heap dynamically into one ~egionfor an older generationof chunks and onefor a younger generation. ~enerational arbage collector activationsrestrictthemselvestotheyoungergenerationregion by onsidering the older generation region to be part of the program data area. The algor i t h ~then automatically collects only the chunks in the younger generation region. ~enerationalgarbage collection algorithms are the fastest and most efficient garbage collection algorithms known. n s u ~ m a r yexplicit , deallocation is implemented using linked lists of free chunks, and implicit allocation is implemented by computing the transitive closure of the directly accessible chunks and freeingall chunks not in this set.
tailed infor~ationabout the garbage collection algorithms described here and many othcan be found in the book Garbage Collection -Algorit~msfor Automatic ~ y ~ , a m i c mory ~ a n ~ g e m e by n t Jones and Lins (195%). An extensive bibliography on garbage ollection and related topics is supplied by Sankaran (1994), and an interesting survey is ivenbyCohen(1981).Journals to consult are, for example, ACM S I G ~ ~ o ~ ~ c e~s , ~ r a n s ~ c ton i ~~~r so g r ~ m ~~i n g u a g and ~ s ~ystems,and S o ~ ~ r~ e ~ ~ cand t i ~ e ~x~~rience. dea age collection has a long history starting in the 1960s with the development of . The original paper on LISP( ~ c c a r t h y1960) , describes mark and scan garbage collection, called ‘recl~ation’there. The early garbage collection algorithms, such as reference counting, imposed a considerable overhead on the execution time of an application, which led to the (~s)conceptionthat garbage collection is more expensive than manual memory manage~ent. Recentstudies,however,haveshownthatadvancedalgorithms such as generational copying garbage collection perform roughly as well as, and sometimes even better than, manual memory management (Appel,1987; Barrett and Zorn, 1995).
Recent advances have been reported in the Conference ~roceedingsof the AC ~I~~~~ PLAN Inte~ationalSymposiumonMemory ~anagementISMM '98, Notices, Vol. 34, No. 3, March 1999, and in the more algorith~callyoriented c o ~ p u t e r science journals. Much material can also be found in the ~ ~ o c e e ~ i ofthe n g s Inte~~atiun~Z ~ o r k s h on o ~ er no^ ~ ~ ~ a g e r n1992, e n t published as Springer's Lecture Notes in Computer Science No.637. ~~~
5.1
Modify the outline code for the basic Nalloc given in Figures 5.2 and 5.3 for the situation that the least common multipleof the align~entrestrictions of the machine is 32 and the A ~ m i n i ~ t ~ a t isize o n is 4.
.2
In Section 5,l. 1 we suggest calling the garbage collector, if there is one, whenever Mal l o c ( ) runs out of space, hoping that it will free some unused chunks the user has forgotten to free. It would seem, however, that this is incorrect, since the user could free some of these chunks later on in the program, and these chunks would then be freed twice. Why is this not a problem?
.3 In languages with explicit memory deallocation, memory leaks can bea major prob-
lem.Memoryleaksoccurwhenthe programer forgetstodeallocate a block of memory that is no longer used. What can be done to address this problem, apart fromusingagarbagecollector?AnswersaresuppliedbyHastingsandJoyce (1992), Barach, Taenzer and Wells (1982) and in the answers section of this book. (a) Using a loop of calls to mal loc () ,each requesting say l Mb, find out how much memory one can obtainon your system. (b) By timing the access times of random bytes in the chunks thus obtained, find out if the access times of the chunks are all equal. If they are not, what could be the cause? In Section 5.1.2 it is claimed that returning blocks to the standard memory m a n a ~ e ~ would require much administration. Why is it not enough to have a single counter per block, which holds the number of busy records for that block, and to return the block whenit reaches O? 5.
W e r e , in the construction of assemblers as explained in Section 4.3, can one make good useof extensible arrays?
5.7
Name some properties of garbage collection algorithms that make bage collection very hard.
concu~entgar-
A C programer (hacker) decides to exploit the fact that addresses on the target
machine are always less than 2**31, and uses the most significant (32nd)ofbitsome , code like: pointer variables asa flag bit, to store some usefuli n f o ~ a t i o nusing
i n t x, *p; /* s e t flag b i tt h u s : */ = ( i n t *) ( (int) p 0x80000000); /* d e r e f e r e n c et h e pointer t h u s : */ x = * ( i n t *) ( ( i n t ) p & Ox7fffffff);
What will this do to the garbage collection algo~thms?Will conservative garbage collection still work? In
9 ,
thereferencecount of thechunkunder is incrernented before that of decre~ented. What c o ~ ~ con l d ivably go wrongiftheorder consider the assign~entp :=p.
~ e n rec~rsive t descent e s t ~ ~ c t u shown re in t is the comp~exityof nt the Schorr andWa
your irnplementationon
a large complexity of the algow about~ e ~ o i z a t i o n ? algorith~.Answer the same questions
ing.Test
orithm for l~arkingagraphwithoutusinganunbounded stack: use a b o ~stack ~ as ~ exp~ained e ~ by uth (1973) in his Art of ~ o ~ ~Pro~ ~ t e orithm G on page 415. It works as follows. Say, the stack is limited Whet? the stack ove~lows,the oldest entry is discarded, leavin 100 most recent return addresses. When the stack becomes ex austed (under~ows) oritbrn R. The arki in^ a1 o r i t h ~now consists s no more nodes to~ n a k . h will be marked bya single runof the algorith~ A? orithm; hint: you need one more markin than for the ~nboundedstack i~plernen~~tion. JVhat is its complex it^? ility to limit the stop-and-go behaviorof the mark and scan algorithmis to scan phase incrementally. After the mark phase, weno longer perform a scan of the complete memory, but rather modify theMal loc () code in Figures 5.2 .3 to scan until it finds a free chunk of suitable size. Give outline code that ~mp~ements this i ~ c r e m ~ n t a l s c h e ~ e .
onsider the pointer reversal al~orithmof Section 5.2.4.3 for bin ~ o r yfor , e ~ a ~ pon l ea C in the finished inter with ablebe atonode, t we cannot write that pointer in read-only memory. Suppose now that the tree stores the ‘exclusive or’ (0)of the left and right pointers and the parent pointer, rather than the left and right pointers the~selves.When returnin for the last time to a node, we can then find the arent pointer as follows. The right pointer entry holds r e n tp o i n t e r , and since we know right there!), we can c o ~ p u t the e parent pointer:
right
pointerG3 parent pointer
= parent
the equalities A
pointerG3 right
~ =~ ~
pointer
~ =~ OOB A = A B.O Turn~this idea into a complete
C o ~ p a c t i has o ~ occasionally been ch~acterizedas ‘one-space copying’, in analogy e s differences. to two-space copyin . Describe the s i ~ i l ~ i t iand
c is to^
of ~ e ~ ~~ n ~o ~Study e~ Collins’ ~ ~ npaper t : (196O), which i ~ t r o d ~ c e s reference co~nting,and write a s u ~ of ~it in ymodern ter~inology.
n the previous chapters we have discussed general methods for performing lexical and syntaxanalysis,contexthandling,codegeneration,andmemorymanagement,while disregardingthe p r o g r a ~ i n gparadigm from whichthesourceprogramoriginated.In doing so, we have exploited the fact that much of compiler const~ctionis in~ependentof a Java compilerdiffersconsiderablyfrom a Prolog thesourcecodeparadigm.Still, compiler:bothrequire p~digm-specifictechniques, in additiontogeneralcompiler construction techniques. We will explore these paradigm~s~ecific techniques in the next fourchapters.Figure6.1showsthenumbers of thechaptersandsectionsthatcontain ater rial on the subjects of lexicalandsyntacticanalysis,contexthandling,and code gelleration for programs in the four paradigm covered in this book. For each of the three subjects we have already considered the general methods; the p~adig~-specific methods are covered as shown in the table. There are hardly any p~adigm-specificlexical and syntactic issues, so the general treatme~tin Chapter 2 suffices. One exception is the ‘offsiderule’ in some fu~ctional languages, in which the text layout plays a role in parsing; it is described in Section 7.1.1. WO aspects of context handling are almost universal: identification and type checking; of the rather than dedicating a separate chapter to these, we cover them at the beginning present chapter, in Section 6.1. The p~adigm-specificcontext manipulations are described where neededin Chapters 6 through 9. The main differences in compilersfor different paradigms lie in the kindof code they enerate, and this is the primary reason for having a separate chapter for each of them. ost (but not all) compilers for imperative and object-oriented languages produce code at the level of an assembler or lower, whereas many (but by no means all) compilers for func-
identification &
6.1
F i ~ ~ r6.1e Roadmap to p~adi~m-specific issues.
tional, logic, and parallel and distributed languages generate C or C++code as their target code. The general code generating techniques have been discussed in Chapter 4, and the specific foms of code to be generated for the four paradigms are treated in Chapters 6 through 9. ~ e t u ~ i to n gthe structure of the present chapter, we first discuss identifier/operator identification and type checking; operator identification requires special attention since in most languages operators are overloaded, andthe context is needed to make the final identification. The rest of the chapter concerns code generation for the imperative and objectoriented languages. Since programs in these languages work by specifying direct instructions for the manipulation of explicit data, and since both the instructions and the data are relatively close tothose of thehardware,themainissuesin code generation for these languages are the mappings of source language statements to low-level instructions and of source language data onto bytes. These issues are covered in Sections 6.4 and 6.2, respectively. An inter~ediateposition is occupied by the activation records and closures ( tion 6.3). Activation records are data structures used by the run-time system to repre~ent active (‘~nning’or ‘suspended’) routines; closures are a represe~tationof routi~esthat allow some advanced operations on them, most prominently currying. Both combine data and flowof control. Althou~hfunctional, logic, and paralle~distributedprograms often require data structures that are not direct mappings of programer data and instruction sequences thatdo not derive directly from the program, much of the following discussion is still relevant for the more advanced paradigms.
Roadmap
6 ~ ~ p e r a t i vand e object-or~e~ted pro 6.1 Context and ling 6.1.1 ~dentification 6.1.2 Type checking
6.3.2~ o u t i n e s .3~ p ~ r a t i o on n s routi~es
6.3.4 c on-neste~ routines .3.5Nested routines .3.6Lambda lifting
.3.7 ~ t ~ r a t oand r s ~oroutines
6.4.1 Local flow of control
6.5.2 Module initializati~~n .3Code g e ~ ~ r a t fbr i o ~generics
The context handling phase follows the lexical and syntactic phases. Th c e ~ e dwith local pheno~enaand relate each item to its jmmediate neighbors only: for e x a ~ ~ llexical e ? a~alysiscombines letters withadjacent letters andits i ~ t oidentifiers, x analysis c o ~ ~ i n adjacent es identifiersandoperatorsintoressions.Context on the otherhand, is c o n c e ~ ewith ~ lon~-distancerelations: for e x a ~ p l e ,it type of a variable in a declaration to its use in an expression and relates the position of a label to its use in a stateme~lt. The connecto~sinthese aretheidentifiers: all applie c c u ~ e n c eof~ an i~entifieri in an e i o i, ~ and from that socket they obtain infor~ation which fit in one socket, the d e c l ~ ~ tfor ,lifeti~e,and so on. first task of the context ase is to take the a n ~ o t ~ t esyntax d tree and to find the c~urrenceof each appli occurrence of an identifier or operator in the p r o g r a ~ . That done, the context phasecan turn to its second task, perfor ing on each node of the tree? as specified by the rules in the lang areconcernedwithtyp ition ex~ressionin an if-stat a jump to a label ins f o r - s t a t e ~ e from ~ t a goto perform the final steps in syntax checkiven in the ma~ualmay forbid records y more use~-frie~dly to allow such records in the
parser and to issue the error message ‘No fields found in record type declaration’ during context handling than to exclude them syntactically and issue the message ‘~nexpected closing parenthesis’, or something similar, durin Context handling is sometimes also called which emphasizes the for correct meaning se. more that than Of course, correct m e ~ i n gcannotbecheckedinany fo a1 senseoftheword,but often useful w ~ i n g s about dubious ~ e a n i n g scan be given. amples are the detection of unused variables and routines, infinite loops, non-trivial exp ions that have a trivial result (for example EXPR >= 0 , where EXPR is anexpression of type unsigne~integer),andmanyothers.The problem with thisis that much of this infor~ationbecomes available more or less autom ically during code optimization. Issuing the w a i n S in that phase, however, has the a wardpropertythatthe ~ e s s a g e sfail toappearwhentheprogram is compiledwithout opti~izations,but duplicating part of the code opti~zationeffort in the context handler ivethepropermessages is notattractiveeither. The reasonablecompromise doing the analysis and issuing the messages in the context phase and saving the resultsfor the code generator -complicates the interfaces of all the phases between the context phase and the consumerof the i n f o ~ a t i o n ,Such are the worries of the compiler designer, We will now first turn to two context handling issues that play a direct role in static correctness c h e c ~ n gin most languages: identification and type checking.
At a conceptual level, identification is the process of finding the defining occurrence of a givenappliedoccurrence of m identifieroroperator.The o c c ~ r r e n cof~an identifier is the place of its main, and usually only, introduction. This introduction supplies information about the identifier: its kind (whether it is a constant, a variable, a module, etc.), its type, possibly an initial or fixed value, possible allocation properties, etc. The other occurrences of an identifier are its pro lie o ~ c ~ r r e n and c ~ are s the consumers of this information. For example, in month: integerRANGE [l..l21 ; *.. month := 1; ~ H I month ~ E c = 12 DO print-string (month-n~me[month]) ; month := month c 1; DONE ;
the month in the first line is the defining occurrence, the others are applied occurrence This formulation of the problem does not cover all cases. Some languages all tions, which result in identifiers having more than one introduction. Other languages do not require ide~tifiersto be introduced or declared at all; those la have the information about the identifiers built in or the informationis distributed over all the applied occu~ences:type and other properties ofan identifier follow from its form andlor use. At the compiler constructionlevel, these differences disappear: thereis an i n f o ~ a t i o n entry for each named item in the program, in which all information about this item is col-
lected. It is the task of the identification process to connect all occurrences of an identifier to its proper entry. Infor~ationcan then be inserted in and retrieved from this entry as desired. As explained in Section 2.1 .l the l, i n f o ~ a t i o nbase in which all the entries are kept is called thes y ~ b table ~ l or ~ a list. ~ e Not all identifiers must be looked upin the same set of entries: for example, v~riables must be looked up among the local and global identifiers, field selectors must be looked up the field selectors of a given type, etc. Each of these sets of entries defines a ,and the syntactic position of an identifier in the program determines which space it belongs in. The precise rules follow from the language manual, but usually identifiers in the syntactic positions of v able names, routine names, and some others live in onenamespace,the ~ e ~ ~~~e e r ~ ee,~ andfieldselectorsliveinspecialnamespaces belongillg to record types. implies that one can have a variable name i next to a field name i,~ i t h o uthe t g in each other’s way,as shown in theC code fragment struct one-int { int i; 1 i;
...
i.i =
3;
The first i in i .i is looked up in the general name space, leading to its identificationas a variable of type str u c t one-int, The second i is then looked up in the special name space of the field selectorsof the type struct one-int . Specific questions about the name spaces in a particular language must be answered by the lan~uagemanual. Examples of such questions are whether labels live in the general name space or in a special label name space, and whether module names have a name space of their own. In principle any position that can be distinguished syntactically or contextually can have its own name space, C has three main name spaces, one cont~ningthe names of enums, structs, and unions, one containing the labels, and one containing all the rest;thelatterincludesvariableidentifiers,routineidentifiers,typeidentifiers,and enumeration value identifiers. In addition, C has a name space for each struct and each union; these name spaces contain only the field selectors of the co~espondingstructs and unions.
ome name spaces, especially those in block-st~cturedlanguages, are scope-structured. These scopes work in stack fashion: there is a stack of scope elements, one for each scope entered, and all actions are performedon the top element(s). The rules are simple:
a new empty scope element is stacked upon scope entry; declared identifiers are entered in the top scope element; applied identifiers are looked up in the scope elements from top to bottom; and the top element is removed upon scope exit, thus removing all declarations from that scope.
A naive i ~ p l e ~ e n t a t i oofna scope-structured name space is shown in Figure 6.2; on
the left is the scope stack, on the right the linked lists of declaration information records, one for each name declared in each scope. The capital P in a record for a name stands for the properties attached to the name in the declaration.
wrong
r
Itright
level
l
It
...
4 3 ..*
2 1
0
stack scope
1L
..* properties name
Figure 6.2 A scope-structured symbol table.
Five levels of scope have been shown; they represent a possible constellation for a program: level 0 is the library level, level 1the program (routine declaration) level, the formal parameter level, level 3 the local variable level and level 4 a subblock. symbol table might result from the C code in Figure 6.3. This set-up allows easy insertion of new names and easy removal of entire scopes; identifiers can be foundby perfor~inga simple sequential search. The organization shown in Figure 6.2 is simple and intuitive, but lacks two features essential for use in a practical compiler: it has no provisions for name spaces and it does not use the fast symbol table access discussed in Section 2.1 .l l. We have seen there that some identifier identification must already be done between the lexical scan and the parser, long before anything about scope and name spaces is known. This identification is based on the routine Identify (Name ) ,which yields a pointer to a record of type Ident f - i ier inf 0, which gives access to all i n f o ~ a t i o n about the identifier Name. The speed of access is obtained usinga hash table. There are several ways to combine hash table identification, name spaces, and scopes in one implementation. We will discuss one possibility here, suitablefor C-like l a ~ ~ u a ~ e s ; for a partially alternative implementation^ see Exercise 6.2.
void rotate(doub1e angle) {
1
f . .
void paint (intleft, int right) { Shade matt, signal;
... { Counter right, wrong;
C code leading to the symbol cable in Figure 6.2.
6.5 together show an implementation that utilizes the easy access to hash table e x ~ ~ a i n eind~ e c t i o n2.1 11.1 ,provides a fixed n ~ ~ bofe r t of a single field selection, and allows efficient scope stack opera,the d i a g r a ~ sdepict a subset of the symbol table from Figure 6.2: on part: the Identification info recordsare cessed by the hash table. The name spaces are imp~emented as
sible macro definition of I; in C, such a macro definition takes e~nitions.The decl field points to a stack of declarations o f I eral name space, i~plementedas a linked list of declaration i n f o ~ a t i o nrecords. aration informat~o~ record is marked with its scope level, and the linked list is top record is the identification sought: the declaration infoma-
i~plementsthe rest of the scopestack: its structure ~irrorsthe ori cords pointed to by ion info records r e p r i ~ use ~ yof this s scope on top of the stack is to be ist of id~ntifiersdeclared in that e d e c l ~ a t i o ~as ~ ss, h o ~ nby the ment conclude the operation; the These data s t r ~ ~ t u r eallow s fast and easy addition of identifier declarations andthe
Id.info
name hash table
llpaintll
,4 A hash-table based symbol table.
removal of entire scope i n f o ~ a t i o nsets. ~ i t h o u these t data structures,scoperemoval would involve scanning the entire symbol table, which may contain hundreds or thousands of identifiers, to find the ones that are declared on the level beto removed. As we haveseen,recordandmoduledeclarationscreatenamedsubscopeswhich S live in scopes. In idf sel,idf is first looked up as an identifier in the idenives a pointer to a definition of id€,which among others holds a pointer to a record It describin the type of idf. i r , which may be defined in a different (older) scope than idf itself, is then tested to see if it is a st~cturedtype. If it is not, the e~pressionidf .se1 tries to select from a type that has no selectors and an error message must be given. In one possible imp~ementation,shown in Figure 6.7, T provides a pointer to a list of records describing selectors, in which the selector se1 can be looked up by se~uentia~ search. This will then lead to the type of the field. In Figure 6.7, it would be contained in property partP of the second record. Removing a level that holds definitions of st~ucturedtypes entails r e ~ o v both i ~ the ~
di.n f o
( wrong )
Id.info("right")
d ,i n f o ( I t s i g n a l
level
4
3
info d.
(" p a i n t )
I( d " r. iont faot e l f )
2
.*.
1
i n fdo.
( " p r i n t f If
i n fdo.
0
( s i g n a l I* )
..*
scope stack
igure 6.5 Scope table for the hash-table based symbol table.
PROCEDURE Remove topmost scope0 : SET Link pointer TO Scope stack [Top level]; WHILE Link pointer/= No link: // Get the next Identification info record SET Idf pointer TO Link pointer .idf-info; SET Link pointer TO Link pointer .next; // Get its first Declaration info record SET Declaration pointer TO Idf pointer .decl; // Now Declaration pointer .level= Top level // Detach the first Declaration info record SET Idf pointer .decl TO Declaration pointer .next; by Declaration pointer; Free the record pointed at level] ; Free Scope stack [Top SET Top level TO Top level - 1; Figure 6.6 Outline codefor removing declarations at scopeexit.
type definitions and their fields. Thisis not particularly difficult, but requiressome care.
type record
ure 6.7 F i n d i ~ g ” ttype ~ e of a selector.
If the source language manual states that all identifiers in a scope hide the identifiers with the same names in older (more outer) scopes, identificationis relatively simple: the definition record pointed at by the decl field in Figure 6.4 provides the definition sought. In a language with overloading, however, identifiers do not hide all identifiers with the same name; for example, PUT ( S : STRING) doesnothide PUT (i : INTEGER) inAda. Also, in many languages the operators are overloaded: the + in 3 + 5 is different from the one in 3 .l + 5 .l,sincethefirsthastwointegersasoperandsandthesecond operates on two floating point numbers. The ~ ~ b i g u i caused ty by the overloading is resolved by considering the context in which the name to be identified occurs. There are two issues here. The first is that in the presence of overloading the identifier identification process comes up with a set of definitions, rather than with a single identification. These definitions are selected from the list of definition^ supp~iedby the decl field in Figure 6.4. The rules for selecting such a set of definitions from the list of all definitions depend on the source language.The second issue then concerns the reduction of this set to a single definition. This reduction process is again based on language-specific rules that consider the context of the identifier. If, as a result of the reduction process, no applicable definition remains, the identifieris undefined; if more than one definition remains, the identifier is ambiguous; and if exactly one definition remains, it isthe definition of the identifier in the given context. Suppose, for example, that the compiler needs to identify the PUT in the Ada statement PUT ( He1lo It ) . In Ada, routine identifiers can be overloaded and the overloading has to be resolved by considering the types of the parameters. The compiler then calls Identify ( PUT It ) to obtain a list of definitions of PUT ( ) . Now assume the scope stack
level 1: procedure procedure
PUT PUT
= P64 ; (STRING) = P57; (MATRIX)
level 0: procedure PUT (INTEGER) = P48 ; procedure PUT (STRING) = P33 ; procedure PUT(FILE-TYPE/ INTEGER)= P27; procedure PUT(FILE-TYPE/ STRING)= P14;
Here PN,with N a number, represents the propertiesof the declaration in the symbol table; also,theotheridentifiersinthescopestackarenotshown.The call Identify ( PUT 1 willthenreturnthe list procedure PUT (STRING) = P64 ; procedure PUT(~TR1X)= P57; = P48 ; procedure PUT (INTEGER) procedure PUT(FILE-TY~E/ INTEGER) = P27; procedure PUT(FI~E-TYPE, STRING) = P14;
Note that the definition procedure PUT (STRING) = P33 is not included: it is hidden by the definition procedure PUT (STRING) = P64,since the routine name and theparametertypesareidentical.Goingthroughthislistwiththeparameter of type STRING of the call, the compiler rejects the last four entries because of differences in number and types of parameters, but keeps the first entry, P64. So only one definition remains and the call is correct. A call PUT (MY-OUTPUT , 3) will identify P27 , and the PUT in a third call PUT (MY-OUTPUT , 3 .1 4 ) would fail to be identified. A considerable additional complication is that the types of the actual p~ametersthat help determine the definition of a routine name can themselves be the result of overload resolution. The problem is even more severe when identification is combined with coercions; that situationis discussed in Section 6.1.2.3. Now that we have seen how we can still identify names in the presence of overloading, we consider its consequences for removing the top scope upon block exit. The main change is that there cannow be more than one definition with the same level in the definition record chains in Figure6.4. It is easy to remove them all rather than just the first one, when visiting the Identifier info record in the code in Figure 6.6.
Some progra~minglanguages feature constructs thatintroduc~copies of scopes from other places. Examples are the scope resolution operator in C++, the with-statement in Pascal and Modula-2, and the import declaration in Modula-2, Ada, and other languages with modules or packages.Thedetails of thesescopeimporters differ fromlanguageto language. In C++, a scope resolution operator x ::preceding a function definition introduces the field selectors of the class x in a scope around that function definition. The Pascal and ~ o d u l a - 2construct WITH x DO ... END is similar: it starts a newscopein which the field selectors of the record x are available as identifiers in the general name space. The FROM module IMPORT declaration of Modula-2andthe use declaration of Ada, however, do not introduce a new scope but rather merge the new names into the
present scope. Name clashes must be prevented in Modula-2 by explicitly selecting the imported narnes, and are avoided automatically in Adaby the overloading mechanism and the visibility rules associated with the u s e declaration. In all cases the result is that, with some identifiers,the programmer canjust write S e l,instead of i d f .S e l,or more importantly PUT ( " T e x t It ) instead of TEXT-IO .PUT ( " T e x t ) . e obvious im~lementationof this feature is of course to retrieve the scope S to be imported, and to collect all names in it. Next, this collection is filtered as required, and a new scope is created and stacked if the semantics of the importing construct prescribes so. Each name N is then defined in the top-level scope in the usual fashion, with its properties derivedfrom its entryin S. The timerequired by thistechnique is propo~ionaltothe number of names imported.
The previous section has shownus that looking upan identifier leads toa declaration information record a cont~nerfor the properties of the object that the identifier identifies. The most basic and obvious, but therefore somewhat inconspicuous, propertyis the the object -whether it is a constant, a variable, a routine, a type, a module, etc.; first step in checking the proper use of the identifier is checking this kind. Kind checking in itself is almost -but not entirely -trivial: for example, when the context requires a module i~entifier, itis simple to test if a module identifier has been supplied, using the decl~ationinformation. Some of the kinds, however, specify objects that i for exampleconstants,variables,androutines.Thesevaluesbelongto t values and operations -and their use is subject to often complicated and and has to be checked: type checking is not at all trivial. We will therefore first concentrate on type checking, and address oneissue in kind checking at the endof this section. Type checking is involved in large parts of the annotated syntax tree. For example, the language rules usually specify which types can be combined with a certain operator; there are rules for formal and actual parameter types in a routine call; and assigning an expression value to a variable restricts the allowed typesfor the expression. Type infomation in a compiler has to be implemented in such a way that all these and many other checks can be p e ~ o ~ conveniently. ed Types are usually introducedby name through ty
-
TYPE Int-Array = A R W Y [Integer l..lO] OF Integer;
which defines the type I n t a r r a y , but they may also be introduced ~ ~ o n y ~ o ~ as sinl y , the following variable declaration: Y [Integer l. .lo] OF R e a l ;
When a type is introduced anonymously, it does not have a name in the source program, but for uniformity it needs one. Therefore, the compiler must use its imagination to invent a unique name, for example producing such internal names as# t y p e 0 1-in-l i n e - 3 5 . So the above declarationof a is seen as an abbreviationof
TYPE #typeOl_in-line-35 = ARRAY [Integer l..lO] OF Integer; VAR a: #typeOl-in-line-35;
Type declarations often refer to other type identifiers. In some languages they are allowed to refer to identifiers that have not been declared yet; such references are called ces. Forwardreferences enable theusertodefinemutuallyrecursive types, for example: TYPE Ptr-List-Entry = POINTER TO List-Entry; TYPE List-Entry = RECORD Element: Integer; Next: Ptr-List-Entry; END RECORD;
However, forward references writer:
also add some complications to the work
of the compiler
-The forwardreferences must be resolved. When, during processing,a forward reference is met, it is added to the symbol table, marked asa forward reference. Next, when a type declaration for thisforwardreference is met,itssymboltableentry is modifiedto represent the actual type insteadof a forward reference. -A check must be added for loose ends. At the end of a scope in which forward references occur, the forward references all must have been resolved if the language manual says so (and it usually does), This check can be implemented by checking all symbol table entries of the scope for forward references. If a forward reference is met, it is a loose end, and must be reported. -A check must be added for circularity. The consequence of allowing forward references is that the user cannow write a circular type definition: TYPE X = y ; TYPE y = X;
which probably is illegal. We will see below how to deal with this problem.
e Alltypesin a compilationunitarecollectedin a ty le, with a singleentry for each type. For each type, the type table entry might, among others, contain the following:
-its type constructor (‘basic’, ‘record’, ‘array’, ‘pointer’, and others); -the size and alignment requirementsof a variable of the type;
-the types of the components, if applicable.
Various information is being recorded for types:
-for a basic type: its precise type (integer, real,etc.); -for a record type: thelist of the record fields, with their names and types;
-for an array type: the numberof dimensions, the index type(s), and the element type;
-for a pointer type: the referenced type; -for other type constructors: the appropriate information. The type table entry must contain all the type information required to perform the type checking and code generation. The exact content therefore dependson the source language at hand. The representation in the compiler, on the other hand, depends on the implementation language, the language in which the compiler is written. In an imperative implementation language, the representation of a type usually is a record with a variant part (or a union) for those items that depend on the type constructor. In an object-oriented implementation language, a type would be an object class with all fields and methods that all type constructors share, and each type constructor would have its own subclass, with extensions specific for the type constructorat hand. To demonstrate how the type tableis built, let us now consider a simple language with record and pointer as the only type constructors, and one built-in type: integer. Also, for the sake of si~plicity,let us assume that there is only one, global, scope for type identi~ers.In this language, the built-in type integer is a predefined identifier indicating an integer type. The compiler places the identifier in the symbol table and its type in the type table, before it starts processing the input. This results in table: symbol table:type
TYPE
0:
INTEGER
ttintegerlt : TYPE
0
where the type table is indexed by values of the form TYPE n (implemented as integers or pointers) and the symbol tableis indexed by strings, as explained in Section2.1.11.1. Because of possible forward references,it isnot possible tojust build the type table as input is processed. One possible solution is to add identifier references to the type table, which must be resolved when all input has been processed. For example, let us process the following type declarations: TYPE a = b; TYPE b = POINTER TO a; TYPE C = d; TYPE d = C ;
Processing the first type declaration, table and symbol table:
able:symboltable:
TYPE a = b; ,might result in the following type
type
TYPE 0 : INTEGER "integert1: TYPE 0 TYPE 1: ID-REF l t b t t . TYPE l "b" : ~ D E F I ~ E D - T Y P E
Then, processing TYPE b = POINTER TO a ;results in another type table entry for a (becausereferencestoidentifiershavenotbeenresolvedyet). So, weadd TYPE 2 : ID-REF It arttothetypetableandprocess TYPE b = POINTER TO TYPE 2 . The ~ D E F I N E D - T Y P E symbol table entryfor b is now resolved, and we obtain:
ble:
ble:
symbol
table:
type
TYPE 0 : INTEGER TYPE 1: ID-REF r l b l l TYPE 2: ID-REF Irac1 TYPE 3: POINTER TO TYPE2
llintegerll : TYPE : TYPE l I1b": TYPE 3
0
The last two lines of the input are processed in the same way, and our type table and symbol table now look as follows: symbol
table:
type
TYPE TYPE TYPE TYPE TYPE TYPE
0:
1: 2: 3: 4: 5:
INTEGER ID-REF i l b l l ID-REF Irat1 POINTER TO TYPE 2 ID-REF I'df1 ID-REF llcll
"integer": TYPE
0
r l a l:l TYPE l "b" : TYPE 3 r t C 1 l : TYPE 4 r l d l: l TYPE 5
Nowthattheinputhasbeenprocessed,allidentifierreferencesmustberesolved, There are two reasons for doing this, the more important one being cycle detection. The other reason is that it is convenient to have each type referred to by a single, unique, index in the type table. The first step in resolving the identifier references is to replace the identifier references in the type tableby type references: a reference to an entry in the symbol table which refers to an entry in the type table is replaced by a direct reference to this type table entry. Our modified type tablenow has the following contents: type table:
TYPE 0 : INTEGER TYPE 1: TYPE 3 TYPE 2: TYPE 1 TYPE 3: POINTER TO TYPE2 TYPE 5: TYPE 4
Now, cycles can be detected by the closure algorithm presented in Figure 6.8. The algorithm also resolves type references to type references: when it has finished, a type table entry can still be a TYPE,but the entry it then refers to is no longer a TYPE -unless it is a cyclicdefinitioll.Thealgorithmconstructs a set Cyclic containing all typetable entries desc~binga type with a cycle. After executing this algorithm, our type table looks as follows:
type table: TYPE 0 : INTE~ER
TYPE 3: POINTER TO TYPE 2 TYPE 4: TYPE 4 TYPE 5: TYPE 4
and the Cyclic set contains both TYPE 4 and TYPE 5 . For a11 members in the Cyclic set, an error message must be produced. Also, these type table entries can be replaced with a special ERRONEOU~-TYPE entry,
ata
1. Let 7" be a type table that has entriescontain in^ either a type desctiption or a reference (TYPE) to a type table entry. 2. Let Cyclic be a set of type table entries.
nitia~i~ation~: Initialize the Cyclic set to empty.
. *
a TYPE type table entry t l in 7" and t l is not a member of Cyclic, let t 2 be the type table entry referred toby t 1. l.If t 2 is t then add t to Cyclic. 2, If t 2 is a member of Cyclic then add t to Cyclic. 3. If t is again a TYPE type table entry, replace, int l ,the reference to t 2 with the reference referred toby t 2 . S
.8 Closure a l ~ o r i for t ~ detecting ~ cycles in type definitions.
A good next step is to replace any type table reference TYPE n, where the corresponding type table entry n contains TYPE m, with a type table reference TYPE m, Note that there may be many of these references, in the symbol table, in the type tableitself, as well as in other data s ~ ~ c t ~inr the e s compiler. This action converts our type table and symbol table to type table:
TYPE TYPE TYPE TYPE TYPE TYPE
0:
l: 2: 3: 4: 5:
INTE~ER TYPE 3 TYPE 3 POINTER TO TYPE 3 ER~ONEOUS-TYPE ERRONEOUS-TYPE
symbol table: *lintegey:ll: TYPE 0 11 a 11 . TYPE 3 "b" : TYPE 3 l r C " : TYPE 4 Ild" : TYPE 5
and may make similar changes to otherdata structures in theCompiler. Another option is to have a function Actual type ( 1 in the compiler, which is used whenever a type table reference is required and which resolves aTYPE, for example: FUNCTION Actual type (Type index) RETURNING a type index: IF Type table [Type index] is a TYPE reference: RETURN Type table [Type index] .referred index; ELSE Type table [Type index] is a direct type: RETURN Type index;
The first approach is less error prone, but requires an extra pass over the internal compiler data structures. For a different approach to scope table management see Assmann (l 992).
of a routine call, When type checking an expression or the formal and actual parameters often two types must be compared to see if they match in the given context. For example, a routine may be declared to have a parameter of a floating point type, and a call to this routine may have an integer parameter. The compiler has to detect this situation, and produce an error message if the language rules do not allow this. Which combinations are allowed in which context is specified by the language manual. For example, the routine call of the example aboveis allowed in (ANSI) C, but not in Modula-2. An impo~antpart of comparing two types concerns the notion of ty M e n two types are equivalent, values of these types have the same representations: one canbeusedwheretheotherisrequired,andviceversa. The languagemanual,again, specifies when two types are equivalent. There are two kinds of type equivalence, structuralequivalenceandnameequivalence.Virtuallyallmodernimperativeandobjectoriented languages use name equivalence. ~~~e e ~ ~ i v a l e ~ c e Two types are ~ a m e - e ~ u i ~ a l ewhen n t they have the same name. Note that this requires every type to have a name (eitheruser-~eclaredor anonymously assignedby the compiler). In a language with name equivalence, the two types t 1 and t2 in TYPE tl = ARRAY [Integer] OF Integer; TYPE t2 = ARRAY [Integer] OF Integer;
are not equivalent since they have different generated names, but the following two types are: TYPE t3 = ARFUAY [Integer] OF Integer; TYPE t4 = t3;
~ ~ p l e ~ e n t ai name n g equivalencecheck is easy:name-equivalenttypeshavethesame index in the type table, once the ID-REF entries as described above have been resolved.
~ ~ r u ~ te ~~ ru ai vl a l e ~ ~ e u ~ ~ a lwhen e ~ t variables of these types can assume the same values (when the two types have the same structure) and allow the same operations. This makes the types TYPE t 5 = RECORD c : I n t e g e r ; p: POINTER TO t 5 ; END RECORD; TYPE t 6 = RECORD c : I n t e g e r ; p : POINTER TO t 6 ; END RECORD; TYPE t 7 = RECORD
c : Integer; p: POINTER TO RECORD
c : Integer; p : POINTERTO END RECORD; END RECORD;
t5;
all equivalent.
Testing for structural equivalence is difficult. As it is hardly used anymore, we will only discuss it very briefly here. The algorithm basically works as follows: first, all types areplacedinto a singleequivalenceclass.Then,repeatedly,attemptsaremadetosplit each equivalence class E into two equivalence classesE land E 2 , by selecting a single type T as a pivot from E and putting all types that can be shown to be not equivalent toT in E I and all types that cannot be shown to be not equivalent to 7“ in E 2 , according to certain rules.
Type equivalence is only a building block in, not the complete answer to, type checking. In type checking, we want to answer the following question: if we expect a value of type T Iat a particular position in the program, but we find a value of type 7 ’ 2 , is that acceptype expected is sometimes called the a e , but the terms ‘type expected’ and ‘typ tive and less error-prone. If T Iand 2’2 are equivalent, the type check certainly succeeds;if they are not equivalentthe rules are language-dependent. For example, if a is a variable of type real, we expecta value of type real in the right-hand sideof an assignment toa. However, the assignment a :=5 may have to be dealt with as well. Some languages do not allow any typemismatch,othersrequirethecompilertoinsert a data integer to real. Such an implicit data and type conversion is called a eoere Exactly which coercions the compiler can insert dependson the language. In general, more than one coercion may have to be applied. For example, in the assign~entxx := 5 where xx is of type complex, a coercion from integer to real and then a coercion from real to complex may be required. in many languages, the possible coercions also depend on the context, For example, in C, the context ‘operand of a binary operator’ differs from the context ‘right-hand side of an assignment expression’. In the latter context, a coercion from real to integeris allowed, in the formerit isnot.
The presence of coercions complicates operator and routine identification, because operand and result types may need to be coerced to another type, before a matchin ication can be found. Also, unrest~~cted application of coercions may lead to ambiguities. For example, the + in the expression 2-I- 3 indicates integer addition, but, allowing a coercion from integer to floating point for both operands, could also indicate floating point addition. Finding the proper coercions in an AST, for a language with arbitrary sets of types, contexts, and coercions, is a very difficult problem, for which no solution has yet been found. The ap~roachpresented here is s i m i l ~to the one used in Section 6.1.1.2 to handle overload in^ of identifiers,andworks for ~oderately complicate^ coercionrules.It is based on two closure algorithms, to be applied in succession, The first finds all types in each node that might play a role, the second crosses out all inapplicab~etypes; the details depend on the source language rules. If in the end any typeset in a node is empty, no identification could be made, and if any of the type sets contains more than one element, an a~biguityhas been detected. The closure algorithms are given in Figures 6.9 and 6.10. The sets with which the type sets of the nodes are initialized can be built in (as for a leaf 3 .i4) or supp~iedby the identification mechanism. Note that even the leaf nodes get type sets, since they too may be overloaded (for example enu~erationvalues in Ada), andlor coerced. The inference rule in Figure 6.9 addsall types reachable by coercions; usually there are very few of these. The inference rules in Figure 6.10 then remove all types that are upward or downward inco~patible. Note that the algorithm only d e t e ~ i n e sthe types; we have to record the coercions used separately.
Let each node in the expression havea variable type set S attached to it. Also, each node is associated with a non-variable context C. The S of each operator node contains the resulttypes of all identifications of the operator init; the type setS of each leaf node contains the typesof all identi~icationsof thenode. The context C of a nodederivesfromthe lan~uagemanual. For each node N with context C, if its type set S contains a type T Iwhich the contest C allows to be coerced toa type T 2 ,7'2 must also be present inS. The closure a l g o ~ t h for l ~ id~ntification in the presence of overloading and coercions, phase 1.
As said before, the algo~thmpresented here is not the answer to all identification problems in the presence of overloading and coercions. In fact, as it stands, it is not even capable of handlingoperatoridentificationin C. Thebasicproblem is mainlyone of
Let each node in the expression have a type set S attached to it. et S of each node be filledby the algorithmof Figure 6.9. 1. For each operator node N with type set S, if S contains a type T such that there is no operator identified in N that results in T and has operands T I and T 2 such that T Iis in the type set of the left operand of N and T 2 is in the type set of the right operand of N,Tis removed from S. 2. For each operand node N with type set S, if S contains a type T that is not compatible withat least one typeof the operator that works on N,Tis removed from S. igure 6.10 The closure a l g o r i t h ~for identificationin the presence of overloading and coercions, phase 2.
language design and can be sketched as follows. Suppose a language has two types, int and r e a l , andtwo+-operators,withtypes ( i n t , i n t ) -+ i n t and ( r e a l , r e a l ) -+ r e a l. To accommodateexpressionslike 3 .l4 + 5 thelanguagehasto allow a coercion from i n t to r e a l in operand context, to reach the expression 3 .14 + (r e a l ) 5 , which identifies the second +, But having such a coercion in operand context makes the expression 3 + 5 ambiguous, since it allows both 3 + 5 to identify the first + and (r e a l ) 3 + (r e a l ) 5 to identify the second +. This problem in language design is solved by having rules like ‘In operand context, a value can only be coerced rto e a l if the other operand is r e a l without using a coercion’. Such rules make the coercions allowed in one syntactic position dependent on what is found in another syntactic position. This kind of longer-ran~erelationship is not supported by the algorithm presented here and has to be coded separately, either as a third scan over the expression or as ad-hoc code in the inference rule in Figure 6.10. Another potential problem with the a l g o ~ t his ~that phase 1 will not terminate for certain coercion rules. This happens, for example, with the coercion rules of Algol 6 where the inference rule in Figure 6.9 will continue forever to require new types. Fortunately very few programming languages have such a complicated coercion mechanism as Algol 68, so this phenomenonis usually not a problem.
Some languages allow the use of casts. A east specifies the required type explicitly. For the compiler, it justintroduces a different context, with usually different and stronger coercion rules. A cast differs from an explicit e o ~ ~ ~ r s in i othat n it still uses the coercion system of the language. In contrast, an explicit conversionis a function which~ansformsdata of one type to that of another type, and the types of its operands and result are subject to
the normal source language type rules, Of course, the function could be a built-in function of the language, in which case the compiler must know about it. On the other hand, it could just be a library function.
eking is trivial, as explained at the start of this section. The exception concerns constants and variables, and the complication arises from the fact that the actual kinds we are concerned with are locations and values rather than constants and variables. We will now examine the relationships between these notions. In the assignment ~ e s t i ~ ~:=source, t i o ~ we expect a location on the left and a value ontheright. Ifwe adheretotheideathattheassignmentoperator,likealloperators, requires values as input, we expect a value which is the address of a location on the left and a (normal) value on the right. Based on the topology of the assign~entstatement, the first e (pronounced ‘el-value’) and the secondis called an ~ ~ ~ ~ En the assignment p := q ;,in which p and q are variables, we expect an lvalue for p and an rvalue for g. Since p is a variable, it has a location and its address is the lvalue; one even says that ‘a variable isan Ivalue’. On the other hand, q is an lvalue too, but in its position an rvalue is required. The kind checking system solves this by inserting a coercion which retrieves the contents of the location addressed by an lvalue. This coercion is similar to the dereferencing explained below, but is ‘less conspicuous because it is usually inco~oratedin the machine inst~ctions.Although the AST for the assignment p := q; , as amended by kindchecking,containsadereferencenode(Figure 6.11, inwhichthe arrows show the dependencies), this node is not reflected explicitly in the c o ~ e s p o n d i ~ g code sequence: Load-Mem
q, R1
Store-Reg
R1,p
It is hidden inside the machine instruction Load-Nern, as explained in the i~troductionto register machines at the beginningof Section 4.2.4.
de~ef
(location of)
P (location of) q
Figure 6.11 AST for p
:=
q with explicit deref.
~
We are now in a position to formulate the kind checking rulesfor lvalues and rvalues. The table in Figure 6.12 shows the basic rules;a -indicates that no actionis required. The AST of an expression, according to lvalue/rvalue attribute propagates bottom-up in the language-dependent, but usuallyobvious, rules. Some of the rules are stated in the table in Figure 6.13, for C or a similar language; V stands for lvalue or rvalue. The combined rules state, for example, that a [11 is an lvalue if a is a variable; type checking then tells us that a must be an array. This means that a [1] can be used as the destination in an assign~ent,since an lvalue is expected in that position. Suppose, however, that the expression a [1.] is used as an index, for example in a [a [1l l ;here an rvalue is re~uired,so the lvalue needs to be dereferenced. On the other hand, 3 := 4 is erroneous, since 3 is an rvalue, and an lvalue is required; and so is &x := 7 , for the same reason.
expected
lvalue rvalue ~ o ~ n d
Figure 6.12 Basic checking rules forlvalues and rvalues.
expressio~ constr~ct
resulting kind
constant identifier (variable) identifier (otherwise) &lvalue *rvalue VLrvalue] V. selector rvalue+rvalue lvalue:=rvalue
rvalue lvalue rvalue rvalue lvalue
V V rvalue rvalue
Figure 6.13 lvaluelrvalue requirements and results of some expression constructs.
Thisconcludesourdiscussion of contexthandling for imperativeand object~oriented languages. The main problem is the identification of types, operators and identifiers. The problem is complicated by forward references, type equivalence rules, routine and operator identification rules, overloading of identifiers and operators, and c0ntext”dependent coercions. No hard-and-fastgeneralalgorithm is available,but for almost all practical languages the problem can be solvedby some form of inference rules working on typesets in the nodes of the expressions, and for many languages much simpler approaches involving only one type per node are possible,
In this section we discuss someof the data structures that represent source languagedata at run time, and the mn-time manipulation needed to deal with the source language operations on these data. In the source language, a data item has a type, which may be a basic, built-in type of the language, or a const~ctedtype, built using oneof the type const~ctors e, The target language data types are usuallyl i ~ i t e dto single bytes, integers of various sizes, address representations, and floating point numbers of several sizes. verysourcelanguage data type is mappedto a p ~ i c u combination l ~ of target lan~uagedata types, and the run-time representation of a source language data item is the result of the application of this mapping, It is the task of the compiler writer to create such a ~ a p p i n g .We assume here that the target language has the common arithmetic? coma byte-addressable memo^; thisassumption is parison,andcopyoperations,andhas almost universallyjustified.
in a source language are characters, integers of several sizes, and of several sizes. The source language operations on these typically are arithmetic operations? assignment and c o ~ p ~ s o nThe . ~ t h m e t i coperations include addition, subtraction? multiplication, division,andremainder.Allthesecan be mapped directly to the target language data types and operations. Often, the same arithmetic operationscanbeused for unsignedarithmetic,except for comp~ison. Forthese,special i ~ s t ~ c t i o usually ns exist, Characters usually are mapped to single bytes, although some modernsourcelanguagesrequiretwo or fourbytestorepresent all possiblecharacter values, Somesourcelanguages also have a voi e, co~espondingtonovalue at all.In a voidtype is present,butonlyimplicitly. A targetlanguage someotherlanguages representation for a void type is not needed.
defines a set of narnes to be the values of the new data type, The is an integer, with values usually ranging from 0 to the number of enumeration values minusl ,although some languages allow the programmer to specify the integer values corresponding to the names of the enumeration type. In any case, the range of values is h o w n at compile time, so an integer type of suitable size can be chosen to represent the values. for Operations allowed on enumerations usually are limited to copying, comptuison equality, comparison for greater/smaller, and sometimesincre~en~decrement, all of which are readily available in the target language for the integer representation chosen. An enumeration type which is available ages, including some that do not have explicit enu~erationtypes, is with the false and true as enumeration literals. An implementationwith 0 for false and 1 for true suggests itself, but in many cases representing the Boolean value can be avoided, as shown in Section 6.4.1.1.
Mostimperativeandobject-orientedlanguagessupport a Pointers represent the addressesof source language data structures. The ~ n - t i m erepresentation of a pointer is an unsigned integer of a size large enough to represent an address. The integer operations for copying, assi nment, comparison, etc. are available for pointers as well, and some target machines have special inst~ctionsor addressing modes for dealing with pointers. The one operation that is particular pointers to is ,which consists of obtainingthevalue of thedatastructurethatthepointerrefersto. If thevaluetobe obtained is small enough tofit in a register, dereferencing can usually be implementedas a single machine inst~ction,If the value is larger, though, as in dereferencing a pointer to a record or may, thesituation is morecomplicated.Theoretically,dereferencingdelivers the value on topof the working stack, but itis more efficient to find out what the final destination of the value is and to copy it to that place directly.For example, the assignment 9 = *p;
in which g is the name of a record variable of type T and p is a pointer to a record of the same type, may be translated toa call of a library routine byte-copy () : byte-copy(&q, p, sizeof
(7')
1;
or the loop insidebyte-copy may be in-lined. An obvious optimi~ationis available when the record obtained by dereferencing is C has a specialnotation for thissituation, usedonly for fieldselection.Thelanguage pt ->fie r Id, but other languages require the p r o g r a ~ e rto use the formally correct notation (*pt r ) .f ield. Literal implementation of the latter would indeed dereference the record underthe pointer pt to the r top of the working stack and thenreplace the top of the working stack by the selected field. ut when the compiler recognizes the situation it
can first compute the pointer to the field and then dereference that pointer: ( * p t r ) .f i e l d is translated as * ( & ( p t r - > fi e l d ) ) . The above applies when the context requires an rvalue. When the context requires an lvalue for a field selection, for instance in an assignment to p t r -> f i e l d , the required address can be obtained by adding the offset of f i e Id within the record to the pointer ptr.
Mostlanguagesthatsupportpointersactuallysupport ty inters, for example POINTER TO INTEGER. In addition, there is often a ~ ~ ~ ~ r i becoercedtoanyotherpointertype.Forexample, C has a ‘pointertovoid’type, Modula-2 has a ‘pointertobyte’type.Coercingsuch a generic pointer type to another pointer type isa compile-time action, for which no run-time codeis required,
In the above we have assumed that a pointer indeed points to a valid value present in memory. In an incorrect program this need not be the case. Although the compiler can usually make sure that if the pointer refers to a value, that value will be of the right type, there are a number of situations in which the pointer does not referato value: l. the pointer was never initialized; 2. the pointer is a null pointer, normally used to indicate the absence of a value, but the p r o g r a ~ e forgot r to check; 3. the pointer once referred to a value, but that value was located on the heap and has been removed since by a f r e e () operation, leaving the pointer dangling; 4. the pointer once referred toa value, but that value was located in the activation record of a routine that has since been terminated, leaving the pointer dangling. as anyone with programming experience inC can These dangers make pointers hard to use, confim. ~anguagesdiffer in their approach to bad pointers. In C, the actions of an incorrect program are undefined, and the best one can hope of any of the above errorsis that the programcrashesbefore it producesincorrectresults.Thememorymanagementunits ( M ~ I J s of ) most processors will generate an exception or trapif the pointer to be dereferenced is null, thus catching error2 above, but their behavior on uninitialized pointers (error l) will be erratic, and since errors 3 and 4 involve perfectly good pointers whose referents have gone away,no MMIJ assistance canbe expected for them. ~anguagesdesigners sometimes take measures to tame the pointer. Several approaches suggest themselves. Avoiding pointers at all is a good way to solve pointer problems (as done, for example, in Java and in functional and logic languages) but requires a lot of alternative programming support. Automatic initialization of pointerseliminatesuninitializedpointers(error l). The derefere~cingof null pointers (error 2) can be eliminated in the language design by having two kinds of pointer-like type constructors, ‘pointers’, which may point to a value or be null, and ‘references’, which are guaranteed to point toa value, References can be dereferenced safely. Pointers cannot bedereferenced and can onlybe used in a test for null; if the of course, be dereferenced pointer is not null, a reference is obtained, which can then, safely. Having a garbage collector and disallowing calls to f r e e () eliminates dangling
pointers to the heap (error 3). And disallowing pointers to local variables eliminates other dangling pointers (error4). Although these possibilities are language design issues rather than compiler design issues, they are still relevant to compiler design. I~plementingautomatic initiali%ationof pointers is easy, and symbolic interpretation can often establish that a pointer cannot be null. This is especially i m ~ o ~ t ain n ttranslating generated code, in which care has already been taken never to dereference null pointers. Also, there is a technique that avoids dan~ i t disallowing ~ ~ ~ pointers t to local varigling pointer errors due to returning routines ables. Aspects of this technique are importantfor the ~ n - t i m esystems of some i~perative and many functional and logic languages, which is why we will discuss these aspects
Of the four types of errors, error 4 is the most problematic, sincea good pointer turnsinto dangling pointer through a routine exit, an action that is only remotely associated with the pointer. Following the saying that anounce of prevention is worth a pound of cure, a set of rules has been developed to prevent dangling pointers from arising in the first place, the so-called ‘scope rules’. Although no language except Algol 68 has put these rules in th hands of the programmer, they playa considerable role in the more advanced forms of routine const~ctionand calling, and in the imple~entationof functional and logic la Because the possibilities and restrictions of passing routines as p~ametersand r them as values are difficult to understand without these scope rules, it seems ad treat them here, in their more easily understandable data structure version. Values located in activation records have limited ‘lifetimes’, where the entity is the time in the runof the program during whichthe entity exists. Th value is the same as that of the activation record in which it resides. Therefore, pointers to vali of a pointer is equaltothe localvalueshavelimited‘validityspans’;the time of thevalue it pointsto.Forhistoricreasons tyspan of a pointer is calle and the rules for avoiding dangling pointers are called these is unfo~unate,since it inevitably causes confusion with identifier’ and ‘scope rules’, as discussed in Section 6.1.l,but we will conform to it to keep in line with the literature. Where necessary, a distinction can be made by u s i ~ gthe terms ‘identifier scope’ and ‘pointer scope’. The values that pointers refer to can be located in activation records or on the he for the purpose of this discussion, the global data area can be considered as the oldest activation record. The lifetime of a value is equal to that of the container it is located in. For values on the heap, the lifetime is infinite; this does not mean that the data will be kept infinitely long, but rather thatit is impossible to ever find out that its lifetime may not beinfinite. The lifetime of an activation record is governed by the routine calling mechanis~;if the call is a simplesubroutine call, the lifetime of theresultingactivationreco enclosed in the lifetime of the activation record of the caller. If we call the lifetime of an activation record its ‘scope’, as we did before with pointers, we see that one S completely inside another scope. We will call a scope P smaller than a scope P lies entirely inside Q. This imposes an ordering on scopes, but we will s scopes, for example those in different threads, are incommensura~le,so the
partial one in principle. In summary, the lifetirneof a value is the time span during which the valueexists, the scope of a value is the time span during which the valueis valid. The purpose of the scope e sure that any value will be valid during its entire lifetime. We are now in a position to formulate the scope rules (van~ i j n g a ~ d et e nal., 1975): he scope of a location on the heap is infinite; the scope of a location in an activation record is that of the activation record. -The scope of the activation record of the program is infinite; the scopes of the other activation records depend on the callin ~echanismand the i ~ p l e ~ e n t a t i o n . -The scope of a value is the smallest scope of any pointer it contains or infinite if it does not contain pointers. -The scopeof a pointer is that of the location into which it points. -A value with scope V may be stored only in a location with scope L if V>L; in other words, p := q requires ‘scope of p 2 scope of q’. It is incorrect to express the last rule as‘A value with scope V may not be stored in a location with scope L if VcL’. The assignment is also forbidden if V and L cannot be cornpared, since that would imply the possibility that part of the lifetime of L falls outside the lifetime of V. Together these rules ensure that values remain valid as long as they exist, so no danglingpointerscanoriginatefrom ro~tinesreturning. If activationrecordsareputon a stack, the scope of the activation record of a called routine is smaller than that of the activation record of the caller. An i ~ e d i a t coilsequence e of that is that data on the heap cannot contain pointersto the stack, except perhaps tothe global activation record. Another consequenceis that a local pointer variable ina routine R cannot point to data localto a routine called by R. This effectively prevents the notorious dangling pointer gning the address of a local variable to a pointer passed as a p ~ ~ e t e r , e 6.14. Here the scope of &actual-buf erfis that of the activation in-buf fer ( ) , andthat of *buf fer-pointeris that of do-buf f( ) ,which er is larger. This assignment constitutesa scope rule violation;if it is C, the following happens. When allowed pass tounchecked, as it inis obtain-buf f ( ) er returns, actual-buffer [l disappears,but a pointerto it remains in buffer,which is passed to use-buf f( ) . When er use-buf erf( ) then uses its now invalid parameter, the scoperule violation takes its toll. In princi~le,the scopes of pointers, locations, and values could be m~ntainedand arises, as it does checkedat mn time, butdoing so is awkwar~andinefficient.need ymbolic i n t e ~ r e t a t i can o ~ beusedtoavoidgenecheckswheretheyare n practicethescoperulesareused at compiler d me, to designefficient il~p~ementations of advanced routine operations in all paradig~swith the property that no dangling pointers will ever be generated. These imple~entationsare designed so that they never store a short-lived value in a lon er-lived container. We will see several examples in ection 6.3, A run-ti~e a~proach to pointer safety is taken by Austin et al. (1994), who discuss a ~ e t ~ for o dthe detection of all pointer and may access errors by replacin so called ‘safe pointers’.
void do-buf f er char "buffer;
1
{(void)
; obtain-buf fer (&buffer) use-buffe~(buffer);
void obtain-buffer(char **buffer-pointer) { char actual-buffer[256];
1
*buffer-pointer = &actual-buffer; /* this is a scope-violating assignment: */ /* scope of *buffer-pointer > scope of &actual-buffer */
Figure 6.14 Example of a scope violation in C,
tu^^, is a data item in which a fixed number of members of possibly different types are grouped together, In the target language, these members are represented consecutively in a memo^ area that is large enough to contain all members. For example, a variable of the C structure
struct example { int member1 ; double mernber2;
1;
is -in principle -represented as follows:
member2
owever,someprocessorsimposeaddressalignment requirementsoncertaindata accesses. For example, the SPARC processor requires an int (a 4-byte quantity) to have an address that is a multiple of 4 (is 4-byte aligned), and a double (an 8-byte ~uantity)to ned, Memberi of the example above thus must be 4-byte ali easily accomplished if the structure itself is 4-byte aligned, but member2 must be 8-byte aligned, and this c m be accomplished by first making sure that it is 8-byte aligned within the structure by inse~inga gap of 4 bytes between memberi and member2,and then makingsurethatthestructureitself is 8-bytealigned. This way, member2 willalso always be 8-byte aligned. In general, gaps must be inserted between the structure members to make sure that each member is aligned properly within the structure, and then the size and the alignm~nt require~entfor the structure itself must be computed. This alignment requirement is the lowest common ~ u l t i p l e(LCM) of the member ali~nmentrequirements, Often this is just
e largestmemberalignmentrequirement,becausealignmentrequirementsusuallyare small powers of 2 (and thus one is either a multiple or a divisor of the other). Also, a gap can be inserted at the end of the s t ~ c t u r eto make sure that the size of the structur~is a ~ultipleof its alignment requirement. This is convenient when the structure is usedto build other data types, such as an arrayof these st~ctures. o get back to our example structure, on a SPARC it will be represented as follows:
I
I
member2
and it must be aligned on 8-byte boundaries. If x is a vaiable of this type, it has size 16, .~ ~ m lies~ at offset e ~ 0 from l the start of x,and x member2 lies at offset 8 from the I
e mandatory operations record ontypes and are copying; comId from a record is for equality is not always required. The se ~ ~ c o ~ p l i s in h ethe d target languageby co~putingthe address of the field, and thenaccessing the field t ~ o u g hthat address. This is done by adding the offset of the field within the record, which is known to the compiler, to the address of the record. The resulting address is the lvalue of the field (if it has one in the context at hand). To obtain the rvalue of the ress must be dereferenced. Record copying can be accomplished by either elds individually or by copying the whole record, including the contents of e latter may be more efficient if there is an efficient m e ~ o r yblock copy outine available in the target language. possiblepresence of gapsmakesrecordcomparisonmore c o ~ p l i c a t ethan ~ ng. The contents of a gap usually are undefined, so they must be disregarded ring records. Therefore, record comparison has to be done field by field. For record type above, the compiler could generate a routine, r e t u ~ i n g1 if the al, and 0 if they are not, according to the followingC scheme: int compare-example(struct example *SI, struct example *s2) { if (sl->member1 != s2->memberl) return 0; if (sl->member2 != s2->member2) return 0; return 1;
1
enerate code that calls this routine whenever two records of this type must be . It could also generate in-linecode to perform the comparison.
is a data type of which the values areof one of a set of types. For example,a variable a of type
union { int i; float f;
1;
in C can either hold a valueof type i n t or a value of type f loat. To access the value of type i n t , the programmer uses a .i;to access the value of type float the p C, the pro uses a . f . A union cannot hold both values at the same time. In responsible for keepingtrack of which of theunionfields is present:theunion is not ‘discriminated’. Some other languages have a special field, called the always present in unions and indicates which variant the union current1 ‘discriminated’. Each variant of the union has its own (constant) value of the union ta The type of the union tag usually is an enumeration type, and each variant is associated with a particular enumeration value. The run-time representation of an undiscriminated union is very simple: it is like a record, except that all fields overlap. Therefore, its size is equal to the aligned size of the largest variant, and its alignment requirement is equal to the lowest common ~ ~ l t iof~ l e the alignment requirements of its variants. Field selection of an undiscri~inatedunion is simple: the field always residesat offset 0 from the start of the union. Copying consists of a memory copy of the size of the union. Comparison is not available for undiscrimin~te unions The representation of a discri~natedunion is that of a record containin the union tag and the -undiscriminated -union of the variants, as above. The generated code can access the union tag to check which variant is current; whether the programmer can also do so is source-language dependent. In principle a check must be generated for each union-field access to make sure that the selector matches the current variant of the union, and if it is not, a run-time exception must be produced. In an optimizing compiler thischeck can often be avoidedbydoingstaticanalysis,forexampleusing sy~~olic inte~retationas explained in Section 3.2.2.
e describes data structures which consist of series of i same type. An array can have one dimension (a vecto ,or more. An m a y element is indicated through one or more index expressio~s, one for each dimension of the array, The run-time representation of the array consists of a ence of the representation of the elements; the addressof the first element ress of the may. For example, a one-dimensionalamay A: ARRAY [l. - 3 1 OF Integer;
will be stored as
and a two-dimensional array B: ARRAY [l..2, 1. . 3 ] OF Integer;
can beeither stored in
row-
that the elements stored row are hich means that the elements are stored column after column (Figure6.15b). These schemes can easily be extended to more than two dimensions. Below, we will assume row-major order, as this is the order most often used.
ure 6.15 Array B in row-rnajor (a) and c o l u ~ n - l ~ a j o(b) r order,
Note that, as long as the compiler makes sure that the element sizeis a multiple of the nment restriction of the element, there will be no gaps ~ e ~ the e mea y~ elements. This is iln~ortant€or the il~ple~entation of array comparison, as well as the implementation of element selection, as discussed below. A s with records, array copying can be done either element by element or through a memo blockcopy.Array com~arison,ifthesourcelanguageallowsit,canbe done mparison if the m a y elements do not have any gaps, and eleAll Ian ua es that support m a y s also support e ,which is usually A with base address ba.~e(A),where Assume an ~-dime~sionaI may nd LBk and upper bound UBk. The number of elements along dimension k is then LENk=UBk--LBk+l. Nowsupposethatthelocation of element A [i I i ...,i,,] must be computed. Given the base address base (A)and the size el-size of the m a y elements in bytes, the location of the required lement is obtained by multiplying eZ-size with the number of elements in front of the required element, and adding the result to base (A). For a one-di~ensionalm a y this location is base (A)+(i l--LB )xeE-size. In
general, for an dimensional array, the locationof the elementA [i I ,i ,...,in] is:
...
This is a lot of computation just to determine the location of one element. Fortunately, a large part of it can be preco~puted.~ ~ o r ~ a n ithe ~ i expression ng resultsin base (A) -(LB x ~ E ~ ~ x L E N ~ x . . . ~ E N ,
+L ~ ~ x L E N ~ x . . . x L E N ~ + ...
+ L~n)xel-size
+ ( ~ ~ x L ~ ~ ~ ~ ~ ~ i2xLEN3x.,.xLENE+ ~ x . . . x L E .N..+~i,)xel-size + All lines of this expression except the last are independent of the indices in the element selection and depend only on the array A. In fact, the first n lines of the ex~ressioncontain the location of the element A [O,0,...,01, the zerot e ~ e ~ e nOf t . course, it is quite possible that the zeroth element does not exist inside the array, because 0 may not be a m e ~ b e of r the interval LB~ . . .for U ~some ~ k ; this does not prevent us, however, from using its location, ze~oth-eze~e~t, inaddress co~putations, The LEN~x...xLE~, products in thelast line are also independent of the indices, and we can compute them in advance. If we desigC Thave: ~y nate the productL ~ ~ ~ x . . .by~ LE ~N Nn - P R Q ~ ~we
L ~ N - P R =~el-size ~ ~ C ~ ~ ~ E N - P R =Q~ ~ E~ N~ -~ P~ R Q ~ ~ C Tfor~n>k>3 ~ ~ ~ E ~ ~ ~ ~
+ ... With these valuesp~ecomputed,the location of the elemknt A [i i ,.,.,in] can be co~puted by the formula ze~oth-ele~e~t (A)+ i 1 x L ~ N - P R Q ~ + ~. C..T+ ~i ~ x L E N - P R Q ~ U C ~ ~
All these precomputed values for the array can be stored in an arr descriptor should also contain the array bounds (or the lower bound and the len~th)themselves, so arrayboundcheckscan be generated. An example of anarraydescriptor is shown in Figure6.16. The co~putationdescribed above results in the locationof an array element. When an lvalue is required in the contextat hand, this location serves asone. When the rvalue of the
ii"
ii"
Figure 6.16 An array descriptor.
arrayelement is required,thelocation(apointer)mustbedereferenced.Again,this ereferencing rnay be implicit in the selected machine instructions. en the array does not change size during its lifetime, and the bounds are knownat time, it is called a static array. In contrast, when the array rnay change size durtime, or its bounds are determined at run time, it is called a static mays the array descriptor does not actually have to exist at run time. Instead, the c o ~ ~ i l can e r compute all the values the array descriptor would contain and use them as constants in index expressions, with the exception of zerut~-eZe~ent.The compiler can, ow ever, compute the offset of z e r o t ~ - e Z e ~ e ~from t the first element of the array itself, and use thatto-compute the value of zerut~-eZe~ent when needed.
ome languages have set ty es, which are usually limited to set of a small subrange of Theseareprobablybestimplementedasbitsets.Ina ,each valueinthe is represented by a single bit: if the bit is set, the value is a member of the set; if not set, the valueis not a memberof the set. Bitsets can be stored convenientlyin chine words. For example, the set of the integers ranging from 0 to 31 can be d by a single 32-bit word. The usual set operations are implemented by means of bit operations: set union is i ~ ~ ~ e m e nwith t e d a bitwise OR, set intersection is implemented with a bitwise AND, symS Iand~ set E difference is ~ ~ t rset i cdifference is i~plementedwith a bitwiseE ~ C ~ ~ OR, ted with a bitwise NOT followed by a bitwise AND. omelanguagesallowsets of anytype for whichcomparison for equalityexists. sible representations of such sets are linked lists (Tarjan, 1975), trees, and hash sa and Goto, 1976).
Routinesarenotconsidereddatainmany p r o g r a ~ i n glanguages,andinthese es C, aas thequestion ofhowtorepresentroutines as a data typedoesnotarise.In is then data can be implemented simply as a pointer to its code; an indirect routine call used to activate it. The best implementation of routines as data in lan~uagesthat allow more advanced operationson routines -nesting, routines as first-class data, partial~ a r a ~ e terization, etc. -depends in complicated ways on the exact operations allowed an choices made in the i~plementationas to the allocation of the activation records. issue will be covered integrallyin Section 6.3.
built-inmethods,withsomeadditionalfeatures.Itstype is usually called a class, In some languages, an object is similar to an ~ ~ s~~t~ t type ~ in ~ ~ ~ which only the methods are visiblefrom outside and the object fields can only be access from within the method bodies. In other languages, the object fields can be accessed li record fields. The basic operations on objects are ‘field selection’, ‘copying’, and ‘method invocation’. Although many object-oriented languages do not allow direct access to the fields of an object, field selectionis needed for the acc ing is trivial and will not concern us here. F be identified, and then the call must be made but with one difference: as the object fields are directly accessible from within the~ e ~ o d body, an extra, implicit, parameter must be supplied: the pointer to the object.The metho can then access the object fields through this pointer. In this section, we will focus on method selection; the actual routine or method invocation is discussed in Section6.4.2. In most object-oriented languages, objects also have eo which are methods that are to be invoked on object creation tively.As far as thefollowingdiscussion is concerned,thesearemethodswhichare invoked like any other method. If this was all there was to it, implementation of objects would be quite simple. pose that we have an object class A with methods m1 and m2 and fields a1 and a2. run-time representation of an object of class A then consists of a record c o ~ t a i n i nthe ~ fields a1 and a2: a1 a2
In addition, the compiler~aintainsa compile-time tableof methods for class A:
t
where we have appended ‘-A’ to the method names, to indicate that they operate on an object of class A,rather than on a different class that also defined methods with namesm1 or m2. In this simple model, field selection is implemented as record field selection, and object copying is implemented as record copyin ;method selection is done by the identification phase in the compiler. Methods are implemented as routines with one additional para~eter,a pointer to the object. So the method m2-A could be t~~nslated to the C routine void m2-~(~lass-~. *this, int i) {
1
Body of method rnX-A, accessing any object fieldx as this->x
assu~ingthat m2-A ( ) hasoneinteger p~ameterandreturnsnovalueandwhere C l a s s - A is the C type name for class A. The method invocation a.m2 (3 ) ;is then trans~atedto m2-A (&a, 3 ) ;. The problem is, however, that all object-oriented languages have at least some of the features discussed below. They mdse objects ~ u c hmore ~ ~ s etof ~the~ pl r o ~ r a m ~ ebut r, also comp~icatethetask of thecompilerwriter.Thesearethe features thatdistinguish objects from abstract data types.
ted langua~es,allowsthe p r o ~ r a ~ to e base r a class B on a class A, so that B inherits A,addition in to its own :class B extends class 12 fields methods. and This feature is als with more zero or fields and methods. of class B, and class B is ss of class A. Now s~~ppose that class B by adding a method m3 and a field b1. The ~ n - t i m erepresentation of an object of class B is then:
In addition, thecompil~-timetable of methods for class
This can still be i ~ p l e ~ e n t using e d the means for abstractdata types described above,
one or more of A’s methods; this feature a parentclass P ng impliesthatwhen directly on P will have that method, but the i~plementationsof these methods in these subclasses may differ when overridden by ut more precisely, the method is ~ e ~ inZthe~class ~ P, e and ~ then ~e~~~~
in class P and possibly redefined in any of its subclasses. We use the phrase 'the declaration of X'here as a statement saying that X exists, and 'the definition of X' as a statement telling exactly what X is; this usage is in accordance with but somewhat stricter than the traditional usage in pro rammin~languages, where one speaks for example of 'forward f 'forward definitions'. declarations'rathertha B intheexampleaboveredefinesmethod m2, whichwas Nowassumethatclass already defined for objects of class A. Then the definition of method m2 in A is both its only declaration and its first definition; that in class B is a redefinition. Some langua ,allow the method declaration to occur withouta definition; the al, and a classinwhich at leastonevirtualmethodoccurs, is an methods must thenbe defined in classes that extend the abstract class. We will rename the methods so that the name reflects both the class in which it is declared and the class in which it is defined. The names consist of three parts, the method d name, the class it is declared in, and the class it is defined in; the parts are s e ~ a ~ a t eby underscores (J. So a name m2-A-B is used to designate a method m2 declared in class A and defined in class Methodoverridingaffectsthe compile-ti~etable of methods.Undertheabove assumption that class B redefines method m2,which was already declared and defined in class A,the method tableof class A now becomes:
and the method tableof class €3 becomes:
a is an object of class A, object of class B. A method call Now suppose a .m2 ( ...) willthenbetranslatedwith a call to m2-A-A, whereas a method call b .m2 ( ...) will be translated to a call to m2-A-B. This differentiates clearly between m2-A-A, which was declared in class A and defined in class A,and m2-A-B which was also declared in class A but defined in classB. If i n h e ~ t ~ n is c ethe only other object-o~entedfeature in the language, the type of th translation of m2-A-A is
void rn2-A-A(Class-A
*this, int i);
and thatof m2-A-B is void rn2-A-B(Class-B *this, int i);
hen a classB extends a classA and the language allows a pointer of type ‘point e assigned to a variableof type ‘pointer to classA’,the language supports a variable of type ‘pointer to class A’ may actually refer to an object of nsions. The imple~entationof this feature requires a new operation, converting a pointer to an object of subclass B to a pointer to an . This operation is used in assign~ents,for example: class B * b = ...; class A *a = b;
in which the second lineis translated into class A *a = convert-ptr-to-B-to-ptr-to-A(b);
routine convert-ptr-to-B-to-ptr-to-A ( ) is a compile-time type ecause an object of class B starts with the fields of class A,the value of the pointerneednotbechangedandtheonlyeffectconsists of changingthetype of the
pointer to A inside
ut note that we now have identical pointers to objects of different classes,
ow that a pointer p of type class A * can actually refer to an object of c k ~ B, s a question arises about the methods that apply. If the source program applies method m2 to the object referred to by p,should the translation i oke m2-A-A or m2-A-B? There are twopossibleanswerstothisquestion: static bi g, whichmaintainsthatstatically p refers to an object of class A,so m2-A-A should be called; and ~aintai~ that s if the object is actually of class B we should apply m object is actually of class A,we should apply m2-A-A. Static binding is trivial to implement, but most object-oriented languages use dynamic binding, for various important pror~mnlinglanguage design reasons. Dynamic binding has two significant consequences: re are two kinds of As out there, ‘genuine’ As which use m2-A-A and As ‘embedin Bs, which use m2-A-B, and the two cannot be distinguishe~statically. It follows that from now on, the object representation must include dynamic type information, lling if it isan A or a B. methods require a pointer to a B to obtain access to all of B’sfields. As m2 may, however, be called through a pointer of type ‘pointer to A’that ect of class 13, we need yet another operation, stru~tsa pointer to B from the pointer to A. The method invocation p->m2 (3 ) ,where p is statically a pointer to an object of class A,could then be translated to
switch (dynamic-type-of (p) { case Dynamic-class-A: m2-AnA(p, 3); break; case Dynamic-c1ass-B: m2-A-B(convert_ptr-to-A-to-ptr-to-B(p) , 3); break;
1
where the dynamic type information is an enumeration type with the values Dynamic-Class-A and Dynamic-Class-B. When p is statically a ~ointerto we could translate the invocationp ->m2 (3 ) immediately to m2-A-B (p, 3 )
i
Notethatthiscode is consistentwiththedeclarations void m2 * t h i s , i n t i ) a n d v o i d m2-A-B(Class-B * t h i s , int however,that a better ans slat ion is possible.Fornow,pointer su compile-time operation. The switch state~entused to find out which method routine to call is a function that works on a small domain, and lends itself to precomputation. To this end we incor~orate thepointerconversionfrom A to B intheroutine for m2-A-B, whichnowaccepts a pointer toA: void m2-A-B(Class-A *this-A, int i) { Class-B *this = convert-ptr-to-A-to-ptr-to-B(this-A) Body of method m2-A-B, accessing any object fieldx as this-=.x
;
1
More in general, every method translation m-X-Y gets as its first parameter a pointer to ClasS-X, which is then immediately converted to a pointer to C1asS-Y by convert-ptr-to-X-to-ptr-to-Y( ) . If X and Yare the same, the conversion can be omitted. With this modification tom2-A-B ( ) ,the method invocation p ->m2 (3 ) ,where p is statically a pointer to an objectof class A,can be translated as (dynamic-type-of(p) == Dynamic-c1ass-A ? m2-A-A
:
m2-A-B)(p, 3);
whichfeatures a computedfunctionwhich is calledwiththeparameter list (p, 3 ) , Rather than computing the function from the dynamic type i n f o ~ a t i o nof p each time an operationon p is pe~ormed,wecanincorporatetheresulting f~nctionaddressin t dynamic type information. The type information for an object of class B is then a recor and m3-B, containing the addresses of the with three selectors, ml-A, m2-A, be called for methods ml, m2, and m3. Theseare ml-A-A ( ) , m2 m3-B-B ( ) ;each of these routines has as its first parameter a pointer to an A. Such a record with addresses of method routines is called a i n f o ~ a t i o nin each object is implemented as a pointer to its Figure 6.1’7. The type information for an object of class A is a two-field dis~atchtable containin the addresses of the routines ml-A-A ( ) and m2-A-A ( ) ,both of which have a pointer to an object of class A as their first parameter. So in both dispatch tables the selector ml-A
r-----------,
I
f
r---------"-I I
d~s~at table c~
I I I
I I I I I I I I I I
P
. I
7 The representation of an object of class B.
selects routines of the same type; the same applies to m2-A. This in turn means that the selection p - x i i s p a t c h - t a b l e - >rn2-A,where is oftype class A *,yieldsthe proper routine with the proper type, regardless of whether p points to an object of class A , The method invocation p ->m2 (3 ) can now be translated very effi(p->dispatc~-tabLe->m2-A) (p,
3) ;
have modified the routine m2-A-B () ,the translation given above for the ->m2 (3 ) ,when p is statically a pointer to B,is no longer correct. The routine now expects a pointer toA,and must be given one: m2-A-B (convert-ptr-to-B-to-ptr-to-A
(p), 3 ) ;
Until now, the effectof the conversion routines has been thatof type conversion only; no actud codewasinvolved.Wewillnowturntoanother object-o~ente~ feature that requires the con~ersionroutines to have substance.
e , inwhichanobjectclassmayonly inheritfrom a singleparent class. Inthissection, we willdiscusstheconsequences of g objects to extendmorethanoneparentclass.This feature is called ce, and is supported by several impo~ant object-oriented programming languages. Suppose, for example, that we have an object class C with fields c1 and c2 1 and m2,an object class D with field d l and methods m3 and m4,and an hich extends both C and D, adds a field el,redefines methods m2 and m4, and adds a method m5 ,as shown in Figure6.18. ~ n l i k ethe situation with single inheritance, it is no longer possible to represent an object as a pointer to a dispatch table followed by ail object fields. In particular, the 'D inside E' object must start with a pointer to its dispatch table, followedby its object fields. is stillpossibletocombinethedispatchtables for E, 'C inside E', and 'D inside E'. owever, they are no longer all indicated by the same pointer. The dispatch table for E: becomes: *
class C { field cl; field 62; method m1 0; method m2 0;
h class D { field dl; method m3 0; method m4 (1 ;
1; class E extendsC, D { field el; method m2 0 ; method m4 0; method m5 0;
1; igure 6.18 An example of multiple inheritance.
ml-C-C m2-C-E m3-D-D m4-D-E m5-E-E
so the dispatch tables forE and ‘C inside E’still have the same address. m e dispatch table for ‘D inside E’starts right behind the dispatch tablefor ‘C inside E’. The representation is Note that the pointer supertyping and pointer subtyping operations, which up to this point did not require code, now in some instances get substance: supertyping: convert-ptr-to-E-to-ptr-to-C(e) convertJtr-to-E-to-ptr-to-D(e) subtyping: convertJtr-to-C-to-ptr-to-E(c) convert-ptr-to-D-to-ptr-to-E(d)
= =
e e + sizeof (class C)
=c =d -
sizeof (class C)
When an object class E inherits from both class C and class D, an a~biguitymay arise.Forexample, C and D may bothcontain a methodwiththesamename. Class inherits both these methods, but when applying a method with this name to an object of class E, only one method can be applied, andit m a y not be clear which versionis intended. The language rules should indicate when a conflict or ambiguity arises, and the compiler has to detect these situations. Often the compiler task is complicated enough to warrant
r-”---------~
I
1
r-----------I I ~ i s ~ a table tc~
P
I I I I
I I I I I I I I I I I I
P
ure ~.~~ A representation of an object of class E.
ific publications, for example ~amalingamand S~nivasan(1997) or Boyland and Casa (1997). These are very language-specific problems, however, and we will not discuss them here.
ant issue that
multipleinheritance,onewhich we haveignoredinthe erit~nce.Forexample, if bothclass C and D of the sions of a class A, then what does ‘A inside E’mean? Depending mean one of two things: either an object of class E contains two contains one object of class A, In the former case we speak of e , inthelatter we speak of even a mixture of the two: som ay be inherited dependently’ others independently. Inde~endentmultipleinheritance is implementedexactlyasdescribedabove.The nly co~plicationis in the identification, and the language rules should prescribe when and ow the p r o ~ r a ~ should e r use qualification, exactly as if the fields andlor methods of class A resulted from different classes, but with the same names. W let us turn to dependent inheritance. The added complexity here is not in the selection, but in the representation of the object data. We can no longer use the ‘ p ~ ~objects nt first’ scheme, because, in the example above, we would get two copies of will have to arrange the object data so that it only has one copyof the lish that by placing the components of the object data in the object in the f o l ~ o ~ order: i ~ g the first entry is the pointer to dispatch table of E (and of ‘C inside );the next entries are the fields of A; then follow all fields of C not inherited from A;the try after that is the pointer to dispatch table of‘D inside E’ (which points to within E’s all fields of D not inherited from A; and, finally, all fields of E not *
*
1s order is correct for the E object and the ‘C inside E’object, but what about the ‘D
inside E’object? The compiler has to decide what a D object looks like when compiling the D object class, at a time when it does not know about eitherC or E. Assuming that a D consists of a pointer to its dispatch table followed by its fields will not work, since when a D resides inside an objectof type E, the fields it inherits from fi are some distance aheadof the dispatch table pointer and D’s own fields follow it. So when producing code to access the object fields, the compiler has no idea where they are. As usual, the answer to this kind of problem is a run-time descriptor. This descriptor should allow a method to find the object fields, given a pointer tothe object itself. For each field, the descriptor must contain the offset of the field from the object pointer. We enumerate the object fields, so that we canusetheenumerationindex as anindexinan o le. Inadditiontoapointertothe dispatch table, the object representation now must also contain a pointer to the offset table. Since we do not know beforehand which object classes will be involved in multiple inheritance, we must follow this two-pointer schemefor all objects. Now let us return to our example of Figure 6.18, and extend it by assuming that both object class C and object class D extend an object class A, which has fields a1 and a2 and methods m1 and m3. So, object class C redefines method m1,and object class D redefines method m3 ;see Figure 6.20. An object of class E has the representation depicted in Figure 6.21. Assuming that all pointers and fields have size 1, field a1 has offset 2 from the E pointer, field a2 has offset 3, etc., and the index tableof class E contains 2 3 4 5 8 9
The index table of the ‘Dinside E’class contains - 4- 3
2
Note that for an object of class E, there is an ambiguity onm1 as well as m3. The language rules or the progra~merwill have to specify whichm1 and m3 are intended whenap~lying it to an object of classE. y now it rnay be clear why some languages (notably Java) do not allow multiple inheritance: it causes many complications and adds some method invocation overhead. On the other hand, the benefitis added flexibility.
All the manipulations discussed above make a method invocation more expensive than a routine call: the method is called through a dispatch table, and the method itselfrnay have to adjust the object pointer thatis passed to it. Therefore, some languages have a mechanism for indicating that a method may not be redefined by a subclass. Such an indication allows the compiler to identify the method directly, as it would identify any routine; the compiler can then use the routine call mechanism instead of the dispatch table. In Java, a method can be marked ‘final’, indicating that it may not be redefined. In C++, methods that may be redefined must be marked ‘virtual’. Static analysis canalso be of assistance in allowing the compiler to find out exactly to which subclass a method is applied. It may also be able to determine the method called, and ~eneratea direct routine call. ore i n f o ~ a t i o nabout the implementation of inheritance can be found in papers by
class A { field al; field a2; 0 method m1 method m3 0
1;
; ;
class C extends A { field cl; field c2; () ; method m1 method m2 0;
1;
class D extends A { field dl; method m3 0 ; method m4 0;
1;
class E extends C, D { field el; method m2 0; method m4 0; method m5 0;
1;
Figure 6.20 An example of dependent multiple inhe~tance.
Temp1 (1993),whichcoversindependentmultipleinheritanceonly,andVitekand Horspool (1996), which covers implem~ntationof dynamic binding in dynamically typed object-o~entedlanguages.
Java has incorporated an extension that relieves theli~itationsof single inheritance somewhat, without adding the complexity (and the power) of multiple inhe~tance.The extension consists of so-called interfaces. An inte ce is like m object class in that it consists of a number of method specifications. In contrast to an object class, however, it cannot have object fields. An interface may extend a single parent interface. An interface is not instantiated like an object, but one can declare Java variables of an interface type, and invoke methods from the interface specification on them; Java variables are actually pointers to objects. The trick i s that an object class may specify that it implements one or more of those interfaces, and that an interface type is compatible with any object type that implements thisinterface. So, for example, given an interface
I _ " " _ " " "
I
~""""^"""""""""
I
1
l
P I I I
I
/
I\;
-
d i s ~ ~ ttable ch
1
I I I I I
l
I I I I
pointer to
I I
I
Bobject L"""""; I
I
I
\
1
~-"""""""""""""I
E-class
I
igure 6.21 An object of class E, with dependent inheritance.
public interface Comparable{ public int compare (Comparable 0);
1
it is possibletodefineanobjectclassthatimplementsthis interface, while it still can extend another objectclass, The compiler must generate a separate dispatch table for each interface that an object class implements. This separate interface dispatch table only contains entries for methods that are specifiedinthe interface specification,butthe entries refertomethods of the object type. A variable of an interface type can be represented by a record containing two pointers, one to the interface dispatch table, and one to the object. Method invocation on an interface then goes through the pointer to the interface dispatch table. Conversion from an interface value to an object class value requires a check that the object class actually co~espondsto the type of the object referred to by the interface value, or is a parent class of it. The reverse conversion consists of a check that the object class actually i~plements the interface type, and of locating the proper dispatch table within the object class.
Routines have been with us since the first programs were written and will probably continue to serve usfor a very long time. It is therefore amazing to see how co~plicatedthese seemingly basic entities are. A routine call is a combination, a successful combination one
4
~~~~1~~ 6.3
~ o u t i n ~an^ s t h ~ iactivation r
must admit,of at least four loosely related features: 1. supplying a new computing env~onmentcontaining at least some temporary memory, the local variables; 2. passing information to the new environment, the parameters; 3. transfer of the flow of control to the new environme~t,with -in principle -guaranteed return; 4. returning information fromthe new environment, the return value(s). Some of these features areavailableinisolationinsomelanguages.Creating a new environment (l)is available in C and many other languages as block entrance. Code that can be transferred to with guaranteed return (3) without creating a new environment is known as a ‘refinement’ (Bal and Crune, 1994). But the package deal of the routine call bas been far more in~uential.
The list of the four features above shows that the basic ingredientof a routine activation is the new environment. The data structure supporting the new environment is the a ~ t i ~ a t i o ~ ,also called frame. An activation record holds the data pertinent to an invocation of a routine or object method; it represents an activated and not yet terminated routine. In particular, it contains user data -local variables, parameters, return values, register contents -andadministrationdata -code addresses,pointerstootheractivationrecords. Managing the userdata is discussed in Section 6.4.2.2. In non-concurrent code, only one of the activation records represents a running routine; all the others are suspended: no instructionsof those routine activations are beingexecuted. The instruction being executed is located in the code of the running routine and the c o ~ ~ t ePC, r , points at it, or -more usually -just after it, depending on hardware conventions. The activation record of the running routine is indicated by a f ~ a m e FP. The f r m e pointer, which usually resides in a dedicated register, is used at run time to access the contents of the activation record of the running routine; its use is explained in more detail in Section 6.4.2.2. Inthissection we concernourselveswithallocating,deallocating,andotherwise organizingtheactivationrecordsinefficientways.Inmanylanguages,routinesare activated in a strictly last-in-first-out order: whena routine A invokes a routine B,A cannot continueuntil B hasfinished. For suchlanguages a stack is thepreferredallocation scheme for activation records. In other languages complications exist, due to features like nested routines, iterators, coroutines, routines passed as parameters, routines returned as routineresults,partiallyparameterized calls, non-localgotos,andcontinuations.Such features complicate the use of a stack for the activation records. In our implementations, as long as possible, for efficiency reasons. we will try to use stack allocation Wewill first discuss the contents of activation records. Next, we consider several f o r ~ of s routines and make an inventory of the operations available on routines. Finally we consider implementationsof these operations on the various forms of routines.
~lthoughthe contents of an activation record depend on the source language, the compiler, and the target machine, the dependency is not a strong one, and all types of activation records have much in common. They almost invariably include the following components:
-the local variablesof the routine; -the p~ametersof the routine; -the working stack; and
-an ad~inistrationpart.
The first three components correspond closely to the local variables, parameters, and intermediate results in the source program. These components will be examined in more detail in Section 6.4.2.2, where we discuss the actual routine call. Some aspects of the administration part are discussed further on in this subsection. shows a possible structure for an activation record. The peculiar order of thecomponents is helpfulinthecreation of activationrecordsinroutinecallsandin addressing the user values they contain. It is the most natural order on most machines and it is the order used in our treatment of routine calls in Section 6.4.2.2. Other orders are possible, though, and may have their advantages occasionally. An activation record is accessed through its frame pointer. In our treatment this frame pointer points to the last byteof the administration part, just before the first local variable. This is convenient since it allows addressing local variable k by FP + off set (k)and p ~ a ~ e tp e rby FP -t- sizeof (administration part) + offset (p), regardless of how many parameters or local variables there are. The offsets of local variables are negative, those of the para~eterspositive. The direction of low to high addresses is in accordance with that on most machines. The exact contents of the administration part of an activation record A resulting from the invocation of a routine R are machine- and imple~entation-depelldent. They always include either return or continuation information, and a ‘dynamic link’; they may include a ‘lexical pointer’. In some implementations, the administration part also contains copies of values kept in registers, but since these again represent local variables, parameters, and intermediate results from the source program, their treatment is also deferred to Section consists of a return address, which is the code address in the o which the flow of control will return when R terminates. consists of a continuation address, which is the code address in ntinue when R is resumed. Which of the two is present depends on ‘~plementation,as explained in the next few paragraphs. In both cases the is the frame pointerof the caller of R. The issue of storing return addresses in the activation records of callees or continuation addresses in the activation record of callers may be confusing and a few words are in order. In principle, any operation that suspends a routine R for whatever reason can store the conti~uation(resumption) information in the activation record of R; the information can then serve to continue R when that is required. Such operations include calling another routine, s u s ~ e n d i nin~aniterator, transfe~ingtoanothercoroutine,andevenathread
464
~ ~ C T 6.3 l O ~o~tinesan^
high addresses
t ~ e i a~tivation r
parameter k
parameter 1
" _ lexical _ _ " _ _ _ _ _ _pointer ___" return information "__"__""______ dynamic link
__._""."_______
registers &L misc. Q=--"
framepointer FP
local variables
addresses low
il
stackpointer
SP
working stack
Figure 6.22 Possible structure of an activation record.
switch (Section 9.12). Storing continuation information aboutR in the activation recordof
R has the advantageof keeping R as a conceptual unity. Storing return i n f o ~ a t i o nin the activation recordof the c a k e has a much more limited applicability: it supports only routine calling. The reason is simple: only the callee can get at the infomation and the only thing it can do with it is to use it to return to the caller. Hardware support is, however, much stronger for storing return i n f o ~ a t i o nthan for storing continuation information, and in virtually all implementations the call to a routine S stores return information in the activation record of S. The adminis~ationpartmaycontainanotherimportantentry:the'lexical pointer', also called 'static link,; this is the f r m e pointer of the enclosing visible scope. The lexical pointer allows access to the local variables of the lexically enclosing routine and toits lexical pointer, as will be explained in Section6.3.5, For some languages, no lexical pointeris needed, In C, for example, routines can only be defined on the top level, so a routine has only one enclosing visible scope, the global scope, in addition to its own local scope. The frame pointer of the global activation record is constant, though, and need not be stored;it can be incorporated into the machine addresses of the global items. The frame pointer of the local activation recordis available as the current frame pointerFP. Activation records can be allocated on a stack or ontheheap.Stackallocation is cheaper but limits the usabilityof the activation records. As said before, we will try to use stack allocation as long as possible in our implementation.
utine is just a piece of code, reachable through a pointer to itsfirst instruction, its ess. When a routine R is called (invoked, activated) an activation record is create it and the flowof control is transferred to its first instruction by setting the program counter to the routine code address; the routine is now runnin . When R calls a routine S, an activation record for S is created and the flow of control S transferred to the first instmcR is S ~ a r e n of t S, and S is the is now running and When S finishes, it is term . The ancestors of a roudefined as the parent of R and the ance nt of R. Routinesthatare active. A routine can be suspended have activation records; they are called e than once simultaneously, in which case it is recursive; only one invocation can be ~ n n i n gat any given time. When a routine returns, its activation record is removed. When the last activation record of a routine has been removed, the routine becomes i tive again, The above is the behavior of the classical su routine; there are several other kinds of routines, which exhibit additional features. The simplest is the ite~ator,a routine that can suspend tempora~lyandreturntoitsparentwithoutlosing its activation record, This allowsthe iterator tocontinuewhere it suspended itself, when it is calledagain.As a result, the iterator can yield a succession of values; hence its name. The temporary return statement is usually called ‘suspend’ or ‘yield’. An example application of an iterator in C notation is given in Figure 6.23. The iterator get-next-int ( ) is started by the first call to it. After having yielded an 8 upon the first call and a 3 upon the second, it yields a 0, after which it terminates, due to the implicit return at the end. The returned value 0 stops the while loop. In languages that lack iterators, p r o g r a ~ e r usually s implement themby using global variables or by in~oducing additional parameters, as shown for example in Section 1.4.2, but the general implementation requires the retention of the activation record. A well-known example of an iterator is the UNIX routine getchar ( ) . A third variety of routines is the coro~tine.Like the iterator it can suspenditself, but unlike the iterator suspending does not imply a return to the parent but rather transfers the control to a named coroutine, which is then resumed, This form of flow of control is often used in si~ulationprograms, and was introduced by Simula 67 (Birtwistle et al., 1975). There the coroutine transfer statement is called ‘resume’. The statement resume (C ) in a coroutine X leaves X t e m p o r ~ l yand resumes the coroutine C at the same point the last resume statement in C left C temporarily. A simplified example application in a C-like notation concerns producer/consumer communication,as shown in Figure6.24. The example shown there is overly simple, since in practice the resume (Consumer) statements may be hidden in routines called directly or indirectlyby Producer ( ) ,and a subsequent resume (Producer) must continue inside those calls, with the complete environments intact. This requires retaining the activation records of those calls and all calls that lead to them. Independently of these variants, routines can be global or nested. The code in a al routine only has access to global entities and to its own local entities. The code routine, declared on the same level as local variables, has access to these sameglo-
void
use-iterator int i;
(void) {
while ( (i = get-next-int 0 ) printf ( II%d\n", i) ;
1
3
0)
{
1
int get-next-int (void) { yield 8; yield 3; yield 0;
1
Figure 6.23 An example application ofan iterator in C notation.
char
buffer [l001
;
void Producer (void) { while (produce-buffer ( ) ) { resume (Consumer) ;
1
1
empty-buf fer ( ) ; /* signal end of stream */ resume (Consumer) ; Consumer (void) { resume (Producer) ; while ( !empty-buf f er-received (1 ) { consume-buf f( ) ; er resume (Producer) ;
void
1
1
Figure 6.24 Simplified producer/consumer co~municationusing coroutines.
bal and strictly local environments, but also to entities declared in lexically intervening routines, as shown in Figure 6.25. Since j ,k, and l reside in different activation records, providing access to them requires some thought.
h addition to declaring a routine, which specifies its name, parameter types, and return type and which is a compile-time action, several operations of increasing complexity can
7
int i; void level-0 (void) { int j; void level-l (void) { int k; void level-2 (void) { int 1;
/* codehasaccess
...
1 ...
1 ...
to i, j, k, 1 */
k = 1; j = 1;
/* code has access toi, j, k */
j = k;
/* code has access toi, j */
} Figure 6.25 Nested routines in C notation.
be performed on routines. First of all,aroutinecanbe . Thisdiffersfromdeclaringaroutineinthat definingitsuppliesthecode of the . Also,definingmaybearun-timeaction:when the routine level-1 () in Figure 6.25 is recursive, each recursive invocation defines a different routine level-2 ( ) ,each having access to a different incarnation of variable k. A routine ~efinitionresults in a defined routine, possibly represented at run time as a routine value. No doubtthemost impo~antoperationonadefinedroutine is c it.Callinga definedroutinecreatesanactivationrecord for it andtransferscontrolcode.Details of routine calling, which include parameter passing, passing back a return value and returning from the routine, are covered in Section6.4.2.2. Once a defined routineis seen as a value, two more operations onit become possible: er to a routine in a call, andr call.An importa~tdifferencebetweenthesetwooperthatpassingaroutinevalue as a p ~ a ~ ~ eintroduces ter the value into a s ~ ~ l l pointer er scope, whereas returning a routine as a value introduces the value into a larger scope. (Pointer scopes were treated in Section 6.2.3.2.) Consequently, returning routines as values is funda~entallymore difficult to implement than passing them as parameters. It turns out that once we have implemented returning routines as values, we can also store them in arbitrary data structures, more in particular in global variables. Routines passed as parameters are important in the i~plementationof l c languages, routines stored in data structures are essential for functional languages, occur h in some imperative languages, for example Icon and
Algol 68. Routine values resulting from these two operations must allow the same operations as any other defined routine. In particular, it must be possible to call them, to pass them on as parameters, and to return them as values. A lessfrequentoperation is jum out of a rout in^. The destination of such a jump, also called a non-local goto, is a ~onmlocal label. A non-local label is a label in another routine. A variation on the code from Figure 6.25 is shown in. Figure 6.26. The routine leve1-2 () contains a non-localgotostatementtolabel L-1 in routine level-1 () . The goto ~ - 1statement t e ~ i n a t e sthe activation of level-2 ( ) and transferscontrolto ~ - 1in routine level-l ( ) ;if level-2 ( ) is recursive,several invocations of leve1-2 () will have to be terminated. If leve1-1 () is recursive, each i n c ~ a t i o ndefines a different non-local label1;-1. In addition to being directly visible, the non-local labelcan. also be passed to the running routine as a p ~ a m ~ t e When r. the non-local label is returned as a value or stored in a data structure with a scope not smaller than that of the label itself, it is called a ~ o ~ t i n u a tion. ont ti nu at ions allowsidebranches of computationstoberesumed,andsupport a r e m a r ~ a b ~ e p r o g r a ~ i n g p ~(Appel, a d i g m 1992). void level-0 (void){ void level-1 (void){ void level-2 (void){ L . .
goto L-1;
...
..*
L-1: . . *
1
...
Figure 6.26 Example of a non-local goto.
Designing a non-local goto mechanism requires finding a representation for the nonlocallabelthatsupportstransfer of control,beingpassedas a parameter,andbeing returned as a value. Note that this representation does not need to be the same as that of a routine value. The last operation on routines we will discuss in this sectionis In partial paramete~%ation, one or more actual parameters are supplied to a defined routine, but the routine is not called, even if all actual ~arametershavebeensupplied. (I
Instead, a new defined routine results, with n-m parameters, if the original routine had y1 parameters, of which m have been supplied; again, m can be equal to n. An example in C notation would be: extern int add(int i, int j ) ; int (*inc) (int i) ;
/* yields i + j */ /* a routine variable inc
*/
int main(void) {
...
inc = add( , 1 ) ;
...
/* supplythesecondparameter
*/
printf ( Ir%d\nI1 , inc (5)) ;
inwhichanexternalroutine add (i n t , i n t ) is parameterizedwith 1 as itssecond parameter, to yield a new one-parameter routine, i n c ( ) . No call is involved, just the creation of a new defined routine. The last line of the code calls i n c () with one parameter, 5, to print the result 6. Partial parameterization is also called ‘currying’, although this tern is more often used more specificallyfor partial p~amete~zation in which just the first parameter is supplied, We will often prefer the tern ‘curried routine’ over the somewhat unwieldy term ‘partially parameterized routine’. Currying plays a large role in functional languages. Since a curried routine is just an ordinary routine, it is important that curried routines be implemented so that all operations available on routines are available on them. This includes calling, further currying, beinga parameter, being a return value, and perhaps others.
We will first discuss the i ~ ~ l e ~ e n t a tofi onon-nested n routines since they are simpler than nested routines, Non-nested routines can be i~plemented using stack allocatio~ exclusively, except when currying is among the required operations. A non-nested routine is represented at run time simplyby the start addressof its code. When called, a new activation record is stacked, as described in Section 6.4.2.2. If the code address is known at compile time, a routine call instruction can be used to transfer control to the routine; if the address results from run-time computation,an indirect routine call instruction must be used. A non-nested running routine has access to two environments only: the global data area and its own activation record. The global data area is addressed directly. The routine’s own activation record is accessible through the frame pointer FP, which is kept pointing to the activation record of the running routine. Direct addressing and the frame pointer together provide access to the complete environment of a running non-nested routine. A non-nested routine can be passed on as a parameter or returned as a value by just passing on or returning its code address. The same operations are possible on this passed is all that or returned routineas on the original routine, since in both cases the code address is needed. Jumpingout of a non-nestedroutine is not a naturalconcept,sincebesidesthe
routine's own code there is no other syntactically visible code to jump to. Still, it is occasionally useful to terminate a running routine and transfer control to a marked code locationinanancestorroutine.Twopossibleapplicationsare:stopping a recursivesearch when an answer has been found; and handling exceptions (Section 6.4.3.2). The C programing language has the s e t j ~ p ~ o n ~ rrnnpe ~ h ~ nfor i s ~this. A call to the built-in routine setj mp (env) saves information about its code position and stack environment in a 'jump buffer' pointed toby the parameter env,and returns 0; it marks a possible place in the execution of the program to which control may be transferred by performing the non-local goto. A later call to the built-in routine longjmp (e m , Val) restores the environmentsaved by the last call of setjmp (env), andreturnsfromthe call to setjmp ( ) as if it returned Val. This effectively implements a non-local goto, with the jump buffer representing the non-local label. A condition for the proper functioningof this mechanism is that the routine that called setjmp must still be active at the time the corresponding 1ongjmp is called. The mechanism is demonstrated in Figure 6.27. The routine find-div-7 ( ) implements a recursive search for a number divisible by 7, and is symbolic for any such search process. When a number divisible by 7 has been found, a longjmp is performed to the label (*jmpbuf-pt r) passed as a parameter, otherwise the search continues with the next higher number. Without thelongjmp call, the search recurses intoinfinity.
#include csetjmp.h> void find-div-7 (int n, jmp-buf "jmpbufstr) { if (n % 7 == 0) longjmp(*j~pbuf-ptr/ n) ; find-div-7 (n -I 1, jmpbuf-ptr) ;
1
int main(void) { jmp-bufjmpbuf ; int return-value;
/* typedefinedinsetjmp.h
*/
if ((return-value = setjmp(jmpbuf)) == 0) { /* setting up the label for longjmp0 lands here */ find-div-7 (1, &jmpbuf) ;
1
else {
1
1
/* returning from a call of longjmpo lands here */ printf ("Answer = %d\n" return-value) ; I
return 0;
Figure 6.27 emo on strati on of the setjmpfiongjmp mechanism.
The driver establishes the 'non-local label' by calling set j ( ) ;themp actual label is not textually visible and is located after the else. The driver then initiates the search, startingat 1; thenon-locallabel is passedas a parameter.Whenthesolutionis found, find-div-7 perfoms the non-local goto, which lands at the else branch. Note that the traditional C programming technique of allocating the jmp-buf data s t ~ c t u r eamong the global variables constitutes a violation of the pointer scope rules. The jump buffer will contain pointers to the activation record of the routine that fills it, so its scope is smaller than that of the global data area. If the jump buffer is filled in the routine main () in C the problem disappears, since there is no code on a global level that could access the jump buffer, but if the jump buffer is filled in a subroutine, a pointer scope violation can easily occur, resulting in a jump to a routine that has already been ter~inated. assing the jump buffer address as a parameter to all interested routines as in Figure 6.27 solves the problem andis safe, but annoying. Now that we have seen the feature and its use, we turn to its implementation. The i~plementationof set j mp must at(env) least save theframe pointer of its caller and its (env, own return address in the jump buffer env. The i~plementationof longjmp val ) retrieves the destination activation record frame pointer and the return address from the jump buffer env. It thenunstacksactivationrecordsuntil it finds thedestination activation record and transfers to the return address. "he i ~ p l e ~ e n t a t i must o n also deliver val in the function result register. ~ u ~ i of n gnon-nestedroutines cannot be i~plementedsubstantiallymoresimply than that of nested ones. We will therefore postpone its discussion to the next section.
Not all o~erationson nested routines can be implemented using stack allocation for the activation record, but much can still be done to preserve stack the regime. In addition to the usual code address, the routine descriptor used to represent a defined nested routine R must contain enough infor~ationto provide access to the data that are visible from the point of its definition. These are the constants, variables, para~eters,routines, etc., of the lexically enclosing routines of R and reside in activation records of these routines, The straightfor~ardway to provide this access is to include the frame pointer of the invocation of the routine in which R is defined (another solution, using 'closures' is ,Section in 6.3.5.5).This pointer is called the lex or S to Figure 6.25, the descriptor of the routine level sts 0 1eve1-2 ( ) and a lexical pointer, the frame pointer of the enclosing routine level-1 ( ) . Figure 6.28 shows such a two-pointer routine descriptor. Several points are worth noting here. The first is that it is conceptually convenient to imaginethedefinition of a routinetocorrespondto ~ n - t i m ecode,whichproduces a correctly filled local routine descriptor in the activation record, just as the definition int i = 5 ;produces a correctly filled localintegervariable.Figure 6.29 showspossible code for the const~ctionof a routine descriptor for the routine level-2 ( ) from Fig. Further optimi~ationmay of course render the explicitconst~ctionof the routine descriptor supe~uous,just as constant propagation can remove the allocation of i from
2 8 A routine descriptor for a language that requires lexical pointers.
the progrm if i turns out not be to modified anywhere. Also, the value level-2-as-a-value could be const~ctedon the fly when calling routine A () in Figure 6.29, void level-1 (void) { int k; void level-2 (void) { int 1;
1
...
routine-descriptor level-2-as-a-value = { FP-of-this-activation-record(), /* FP of level-l() */ level-2 /* address code of level-2()*/
1;
1
A (level-2-as-a-value)
;
/* level-20 as a parameter */
Figure 6.29 Possible codefor the const~ctionof a routine descriptor.
A second point is that if leve1-1 ( ) is recursive, each incarnation has a different activationrecordwith a differentaddress,sotheroutinedescriptors for thedifferent level-2 () S in them differ,as they should, Another point is that when nested routines havea two-pointer descriptor, thenso must the non-nested ones, for reasons of u n i f o ~ i t y .The top-level routine descriptors can get a null lexical pointer, since all data visible from outside a top-level routine is accessible by direct addressing and the lexical pointer will never be consulted. data of Since the code of routine level-2 ( ) hasaccessnotonlytothe level-1 ( ) but also to those of level-0 ( ) ,it would seem that s~pplyinga lexical pointer to just the activation record of level-1 ( ) is not enough, We will see now, however, that it is.
When a routine R defined by a two-pointer routine descriptor L) is called, a new activation record is created; the present pro~ramcounter PC, the frame pointer FP and the lexical pointer from L) are stored in the administration area of the new activation record; FP is made to point to the new activation record; and control is transferred to the code address from D. See Figure 6.30,
Activation record of lexical parent of R
Activation record of lexical parent of R
Activation record of the caller Q of R
Activation record of the caller Q of R
FP
code of Q
code of Q
Activation record of R
PC
of R
code of PC
a. Before calling
R,
b.After calling R.
igure 6.30 Calling a routine definedby the two-pointer routine descriptor D.
As said, the point of having a lexical pointer is the access it allows to all lexically enclosing environments. We will now first see how the access to l and k in the statement k = 1 in Figure 6.25 can be obtained. The variable l is located in the activation record so it canbereachedthroughtheframepointer: l is * (FP + oftherunningroutine, off set (1) ) ,where off set ( X ) is the offset of entry X from the frame pointer of the activation record. The variable k is located in the activation record of the imediately enclosing routine, which can be reached through the lexical pointer, which in turn can be found through the frame pointer: k is * (* (FP + off set (lexical-pointer) ) + offset ( k ) ) inroutine level-:! 0 . (Of course, k i s * (FP -I- offset ( k ) ) inroutine level-1 () .) So the assignment k = l is translated to intermediate code as shown in Figure 6.31. The translation of the statement j = l is similar, except that j must be found by following the lexical pointer twice (Figure 6.31). These translations may look imposing, but BURS techniques can often find good code for such forms, by exploiting advanced addressing modes.
FP
+
off
1
set
(lexical-pointer)
+
off
l = *(FP
(k) set
+ offset ( 1 ) )
Figure 6.31 Intemediate code for the non-local assign~entk = 1.
We see that storing the lexical pointer to the activation record of the lexically enclosing routine in the activation record of the running routine builds a linked list of those activation records that hold thedata visible from the running routine. The length of the list is equal to the lexical nesting depth of the running routine. The number of times the lexical pointer must be followed to reach an entry in a routine R from a routine S is equal to the difference in lexical nesting depth between S and R, It is therefore convenient to represent local addresses in the compiler as pairs of nesting difference and offset; since the nesting difference cannot benegative, a value of -1 can be used to code direct addressing. So, inside routine level-2 () ,1 is represented as [0 , offset(l)],k as [l, offset(k)], j as [2, offset(j)], and i as [-l, (i ) ]. Note that the nesting differenceis a compile-time constant. off set
*( *(
*(
FP
+
off
set
(lexical-pointer)
1
+ 1
off
set
(lexical-pointer)
+
offset ( j ) ) =
*(FP
+
offset(1))
Figure 6.32 Inte~ediatecode for the non-local assignment j = 1.
Passing a nested routine as a parameter is simple now: just pass the two-pointer descriptor. No matter to what static or dynamic depth the routine level-:! ( ) is passed on, when it is finallycalledtheabovecallingschemeconstructsanactivationrecordwith a lexical pointer that indicates the activation record of level-1 ( ) and thus restores the proper environment for a call of level-2 ( ) ;see Figure 6.33. Since the two-pointer routine descriptor contains a pointer of possibly limited scope, the lexical pointer, we have to consider the pointer scope rules. The scope of the routine descriptor is the same as that of the activation record in which the routine was declared. W e n passing the routine descriptor to a child routine, it is passed into an environment of smaller scope, so no scope violation can occur, regardless of whether we use a heap or a stack for the activation records.
Returning a nested routine as a value is equally simple: just return the two-pointer descriptor. Now, however, the two-pointer descriptor is passed to an environment of potentially larger scope, so there is the danger of a pointer scope violation. The violation will indeed occur under a stack regime: when routine leve1-1 ( ) returns routine leve1-2 ( ) as a value to its parent level-0 ( ) ,and this parent calls the returned routine, the call will construct an activation record for level-2 ( ) whose lexical pointer refers to the activation record of level-1 ( ) ,which is long gone! We see that returning a nested routine as a value is incompatible with the stack regime; it requires the activation records to be allocated on the heap. Heapallocationindeedsolvestheproblem. Since all environmentshavethesame infinite scope,nodata is introduced into environments of largerscope,and no pointer scopeviolationcanoccur.Morein detail, theactivationrecord of the call of routine
outine~and t ~ e iactiv~tion r
record activation of level-l ()
routine
nested level-2 () passed as a parameter
activation record
activation record
activation recordof level-2 ( )
\H Figure 6.33 Passing a nested routine as a parameter and calling it.
leve1-1 ( ) in our example will be retained automatically after the call has t e ~ i n a t e d , since it is still accessible from the program data area:FP, which is in the root set, points to the activation record of level-0 ( ) ,which contains the routine value level-2 ( ) , which contains the frame pointer of leve1-1 ( ) as the lexical pointer. Such operations m a y seem weird in a C-like language, but are c o ~ o n p l a c in e functional languages, and in some imperative languages,for example Icon and some variants of A1
The main operation a non-local label in a given routine invocation I must support is the non-local goto. ~ e ~ o ~a inon-local n g goto to this label t e ~ i n a t e szero or more routine invocations until routine invocationI surfaces and then transfers control to the local label. This can be implemented by representing the non-local labelas a t~o-pointerdescriptor ~ o n t a i n i ~the g frame pointer of the routine invocation that holds the label,and the code address of the label. The imple~entationof a non-local goto to a non-local label L must then travel back through the activation record chain as defined by the dynamic links, until it finds the activation record with a framepointer that is equal to that inL. Each activation
record met on the way must be released, since the activation of the co~espondingroutine is t e ~ i n a t e dimplicitly by the non-local goto. Once the proper activation record has been in found, FP is made to point toit and execution proceeds at the code address indicated L. If routine level-1 ( ) in Figure 6.26 is recursive, each invocation defines a different label L-1. Theirrepresentations differ by pointingtodifferentactivationrecords;performing non-local gotos on them will terminate different n u ~ b e r of s active routines. The two-pointer non-locallabel descriptor can be passedas a parameter or returnedas a value in the same way as a routine descriptor can, and the same pointer scope consideraa routine descriptor holds a tionshold. The twoare,however,fundamentallydifferent: pointer to an activation record that will be the parent of a new running routine and a code address to be jumped to by a routine call instruction, a non-local label descriptor holds a pointer to an activation record that will itself be running again and a code address to be jumped to by a jump inst~ction. As withroutine definitions, it is oftenconvenienttoallocate the non-locallabel as showninFigure 6.34, Again, d e s c ~ ~ t oasr a co~stantentryintheactivationrecord, however, values like L-1-as-a-value may be constructed on thefly. void level-l (void) { non-local-label-descriptor L-l-as-a-value = { FP-of-this-activation-record(), /* FP of level-l0 */ L-1 /* address code L-l of */
1;
void level-2 (void){
... non-local-goto(L-l-as-a-value)
1
...
;
/* goto L-l; */
.".
L-1 :...
1
... Possible code for the construction of a label descriptor.
The data structure representing a curried routine must be capable of holding an indeterminate number of actual parameters, so the above two-pointer descriptor does not- suffice and will have to be extended, The extension consists of space for all its ~arameters,plus a mask indicating which p ~ a ~ e t ehave r s already beenfilled in. Figure 6-35 shows the result of partially parameteri~inga 5-parmeter routinewith its secondandfourth p a r ~ e t e r values.
F i ~ 6~.35r A~ closure for a partially parameterized routine,
This represen~t~on of a partially parameterized routine is called a e tunately it is the same tern as used in 'transitive closure' and 'closure a1 which it has nothing to do. The closure can be allocated on the stack or on the heap, with t conse~uencesfor the pointer scope rules. nce we have this representation, implementing the desired operations on it is simple. u ~ y i n ga closure further is implemented by scanning the mask to find the locations of the free p ~ a l ~ e t e and r s filling them in. The required type checking has already been done since the curried routine just has a new routine type with a subset of the original parameters. If currying in the stricter sense of supplying only the first actual parameter is used, the mask can bereplaced by a counter and finding the locationof the first free parameter is trivial. Calling a curried routine implies completing the closure by supplying the last of the ~ a m e t e r sand then invoking it. The invocation of a completely curried routine can be one remar~ablyefficiently: allocate room for the activation record and block copy the closure into it. If the layout is chosen properly, the parameters and the lexical pointer are already in place, and the mask can be overwritten by the dynamic link; such a layout is already shown in Figure6.35. ote that it is possible to have a closure that represents a fully parameterized routine, which has nevertheless not been invoked. Such a routine/closure acts as a parameterless routine, and can be treated as such. assing a curried routine as a parameter or returning it as a value can be implemented in the obvious way: pass a copy of the closure. Since the closure contains the requiredlexical pointer, it can be invoked in the same way as described above. This implementation of r e ~ ~ i r ethat s aEZ routines must be treated as curried and closures must be used instead two-pointer routine descriptors everywhere.
In this subsection we have considered representations for nested routines and implementations for theoperationsonthem. Some of theseoperations,notablypassingthem as parameters and r e t u ~ i n gthem as values, give rise to new routines and we have been careful to ensure that our implementations of the operations also work on them. Most operations on routines and non-local labels are compatible with a stack regime, escept returning them as values, which requires heap-allocated activation records. Since heap allocation of the activation records slows down the calling mechanism,it is important to seek ways to avoid it and maintain a stack regime. We will now discuss such a way.
The problem with two-pointer routine descriptors is that they contain a lexical pointer, which points into the stack and which reduces its pointer scope. As a result, two-pointer routine descriptors cannot be moved around freely, which in turn limits their usability. One simple solution was given above: allocate all activation records on the heap, ~e will now consider another simple solution: do away with lexical pointers and make all routines global.This i ~ e d i a t e l yreintroducestheoriginalproblem:howtogainaccesstothe non-local data; but we now givea new answer: pass pointers to themas parameters. Figure 6.36 shows the result of this transformation on the routines of Figure 6.25. At the top we see the innermost routine level-2 ( ) ,which used to have access to two nonlocalnon-globalvariables, j and k, andwhichnowhastwopointerparameters.When called (in level-l () ), the actual parameters are supplied from j ,which level-1 ( ) has received as a parameter, and the address of k. A similar explanation applies to the call of level-1 ( ) in level-0 ( ) . It is clear that the two assignments *k = l and *j = l assign the valueof l to the proper locations. The transfo~ationshownhere is called la ng. The namederives from the lambdaexpressions in Lispandotherfunctionalthatareliftedtogloballevelby C programmers for ages. it, but the technique has been in use with Lambda lifting has effectively ridus of two-pointer routine descriptors with their lirnited scopes, but the moment we try to pass a lifted routine as a parameter, we run into a new problem: how do we pass on the extra parameters that result from the l a m b ~ alifting? When passing level-2 ( ) as a parameter, how do we pass the j and &k that go with it? There is a stunning answer to this question: use partial parameterization! Rather than passing the code address of level-2 ( ) ,we pass a closure C containing the code address, j , and &k. Notethattheseclosures,unliketheearlierclosures, do notcontainlexical pointers. When the routine parameter corresponding to C is finally called, the run-time system recognizes it as a closure and performsa closure invocation. The scope of closure C is d e t e ~ i n e dby the scope of & j ,which, unfortunatel~is again that of the activation record of the lexically enclosing routine, leve1-1 ( ) . This still prevents the closure from being returned as a value, due to pointer scope ~roblems. This problem is solved by another drastic measure: all local data thatis used non-local allocated on the heap and the co~espondinglocal entries are replaced by pointers. ure 6.37 shows the final result. In particular, the closure passed in the call of A () has
int i; void level-2 (int *j , int *k) { int 1;
...
/* code has access to i,*j, *k, L */
*k = l;
void level-l (int *j) { int k;
...
level-2 (j, &k) ; A (level-2) ;
/* /* /* /*
code has access to i, * j , k */ was: level-2 ( ) ; */ level-20 as a parameter:*/ this is a problem */
void level-0 (void) { int j;
/* code has access toi, j */ /* was: level-1 (1 ; */
Figure 6.36 Lambda-lifted routines in C notation,
infinite scope: it contains a code address and two heap pointers. Thus, the closure can be The activationrecord of moved to anyenvironmentandcalledwhereverneeded. level-1 ( ) may long have disappeared; the required ‘locals’ in it still exist on the heap. We have now achieved an implementation in which the stack mechanism is used for the routine invocation administration and the strictly local variables, and in which variables that are used non-locally are allocated on the heap. Such an implementation is advantageous for languages in which efficient routine calling and free move~entof routine variables are very i ~ p o ~ a n The t . implementation is especially profitable for languages that require currying anyway; in short, for i~plementingfunctional languages and advanced imperativelanguages.Themainproperty of the i ~ p l e ~ e n t a t i oisnthatroutinesare represented by closures that do not carry lexical pointers; this gives them the oppo~unityto acquire infinite scope.
The above techniques give us enough material to implement both iterators and coroutines in a simpleway. The twoimplementationsarealmost identical. Theinvocation of an iterator or coroutine creates an activation record in the usual way, and its address is stored in a variable, allocated for the purpose; the activation record will have to be allocated on the heap, unless the functionality of the iterator or coroutine is restricted severely. When
int i; void level-2 (int *j , int *k) { int 1;
..*
1
/* code has access toi,
*k = l; *j = 1;
*jl
*k, 1 */
void level-l (int *j) { int *k = (int *)malloc(sizeof (int)) ;
1
... /* access code has to i, * j l *k */ level-2 (j, k); /* was: level-2 0 ; */ vel-2, j, k)); /* was: A(leve1-2); */
void level-0 (void) { int *j = (int *)malloc(sizeof (int));
/* code has accessto i, *j */ /* was: level-1 0 ; */
~ i ~ 6. u37r ~~a ~ b d a - l i froutines t e ~ with additional heap allocation inC notation.
either is suspended, continuation information is stored in that activation record, The iterator returns tempor~ilyto its caller, whose activation record can be foundby following the dynamic link; the coroutine transfers to another coroutine, whose activation record can b found in its coroutine variable. When the iterator or coroutine is resumed the continuation i ~ f o ~ a t i oisnretrievedandactedupon.Whenthe iterator or coroutine t e ~ i n a t e s its , activation record is deleted, and theiterator or coroutine variable zeroed. This concludes our discussion of non-nested and nested routines and the implementation of several ope~ationsapplicable to them. The basic data structures in all the implementations are the activation record and the routinerepresentation, The latter exists in two variants, the two-pointer routine descriptor and the closure. The detailed allocation design decisions are based on the pointer scope rules explained in Section 6.2.3.2.
In Section 4.2, we discussed code generation for expressions and basic blocks. In this section, we will concentrate on code generation for statements that affect the flow of control, and thus de~arcatethe basic blocks. Three levels of flow of control can bedistin~uis~ed:
-local flow of control, which determines the statement inside a routine or method to be executed next (Section 6.4.1);
-routine calls and method invocations, which perform the parameter transfer and flowof-control manipulation needed to activatea new routine (Section 6.4.2);
-non-local jumps, which transfer the flow of control out of the currently running routine into an ancestor routine (Section 6.4.2.3). We assume that all source code expressions, with one exception, have already been evaluated and that the results have been stored in an appropriate place, usually a register. The exception is formed by Boolean expressions used for flow control, for example the control expressions in if-statements; they are treated separately in Section 6.4.1.1. In addition, we assume the existence of a mechanism in the compiler for allocating temporary variablesandlabels.Thisdescription is intentionallynotveryprecise;detailsalways depend on the source and target languages. To describe the code generationfor the flow of control, we will use thefour statement types below, written in a Pascal-like notation. Each has a s t r a i g h t f o ~ ~equivalent d on virtually all processors.
-a simple goto statement: GOTO label;
-an indirect goto statement: GOTO label register; -a conditional goto statement, in two forms:
IF c o ~ ~ i t register i o ~ THEN GOTO label and IF NOT c o ~ d i t i oregister ~ THEN GOTO label; -an assignment statement: destinatio~:= source. Wewillsometimeswritesimpleexpressionswhereregistersappear in theabove s~~tenlents, especially when these expressions derive from the code generation mechanism itself rather than from the source code; this increases the readability of the code samples.
Thetwomainmechanisms for influencingthelocalflow of controlinimperativeand o~ject-orientedlanguages are ‘selection’ and ‘repetition’. Selection causes a piece of code to be selected for execution, based on the value of some expression. Repetition causes a iece of code to be executed zero or more times, based on the valueof some expression or expressions. More often than not these expressions are Boolean expressions, and in many cases it is useful to translate them in special ways. We will therefore first consider code ene era ti on for Boolean expressions used in controlling program execution, and then turn to code for selection and repetition statements.
ooleanexpressionsarenodifferentthanotherexpressions:evaluating ean value, on the stack or in a register. Most Boolean expressions are, however,usedtoaffecttheflow of c trolratherthantoprodu a value; wewill call ooleanexpressionsusedinthat way oolean co~trolexpressi S. There aretworea-
sons to treat Boolean control expressions specially. The firstreasonhasto do withtwoproperties of machine inst~ctions. expressionsoftenconsist of compa~sonsandthecomparisoninstructions machines produce their results in special condition registers in a special format rather than as 011 integer values; so an additional conversion is required to obtain a genuine And the most usual way of affecting the flowof control is by using a conditional j the machine instructions for conditional jumps base their decisions on condition registers rather than on 011 integer values; so jumping on a Boolean value requires an additional conversion of the Boolean value to a value in a condition register. It will, however, be clear that the naive code sequence for Boolean control expressions
/* code for the Boolean expression: */ comparison code, yielding a condition value conversion from condition value to Boolean /* code for the conditional jump: */ conversion from Boolean to condition value jump on condition value is to be avoided. The second reason is related to a property of some p r o g r a ~ i nlanguages. ~ Several program~inglanguages (for example C, Ada, Java) feature lazy Boolean operators, operators that evaluate operands only when their value is needed. Examples are the && and I I operatorsin C. Suchoperators do not fit thetranslationmodeldiscussedinSections 4.2.4.1 and 4.2.4.2: exprl GC& exprz cannot be translated as code to compute expr, in loc, code to compute expr, in loc, code for the&& operator on loc, and loc,
since that would result in the unconditional evaluation of both expressions. Instead, code intermingledwithconditional jumps mustbegenerated;againthese ~onditiona~ jumps reacttovaluesinconditionregisters.Inshort,Booleancontrolexpressionsareti interrelated with conditional jumping. This relationship can be exploited conveniently when we know the labels to which control must be transferred when the Boolean expression yields true or false, before we generate the code. We can then use acode generation technique like theone shown in Figure 6.38. The procedure Generate code for Boolean control expression gets two parameters, True 1 abe1 and Fa1 se 1, labe in addition to the usual Node pointer. A special value No label is available for these parameters, to indicate that control must continue at the end of the expression: the control must ‘fall through’ to the end. We assume a single condition register here, although most machines have several of them, with assorted semantics. If the node represents a (numeric) comparison operator, we generate code for this operator; this leaves the result in the condition register. Then, depending on the presence or absence of the true and false labels, we generated zero, one or two jump inst~ctions. The use of the value No label is shown in the entry for the lazy && operator. First we generate code for the expression before the && operator, such that when the expression fails control is transferred to the Fa1se label and when the expression succeeds, the
~
~
~6.4T Code I o~ e ~n e r ~ t i o for n co~trolf l
o S~
PROCEDURE Generate code for Boolean control expression ( Node, True label, False label ):
SELECT Node .type: CASE Comparison type: // c , >, ==, etc. in C Generate code for comparison expression (Node .expr); // The comparison result is now in the condition register IF True label/= No label: Emit ( It IF c ~ n ~ i t i o register n THEN GOTO" True label ); IF False label/= No label: Emit (ltG0TO" False label) ; ELSE True label= No label: IF False label/= No label: Emit ( I NOT F c ~ n ~ i tregister ~ ~ n THEN GOTOII False label) ; CASE Lazy type: and // the && in C Generate code for Boolean control expression ; (Node .left, No label, False label) Generate code for Boolean control expression (Node .right, True label, False label) ; CASE . . Negation CASEtype: // the ! in C Generate code for Boolean control expression ; (Node, False label, True label) Figure 6.38 Code ge~erationfor Boolean expressions.
control falls through, to reach the expression after the && operator. Similar entries can be c o ~ s t ~ c t for e d the I I and ? :operators of C. The last entry in Figure 6.38 shows that the i ~ p l e ~ e n t a t i oofn the negation operator comes free of charge: we just swap the true and false labels. As an example, thecall Generate code for Boolean control expression ( Parse ( I ' i > 0 & & j =. O t t ) , No label, Else label ):
in which we assume that Parse (string) produces the parse tree for string,yields the code sequence Compare-greater i, 0 IF NOTc ~ ? ~ ~ i register t i ~ n THEN Compare-greater j , 0 IF NOT c ~ ~ n ~ register ~ t i ~ nTHEN
GOTO lseE GOTO
Else
labe l label
There are also occasions when we have to construct a genuine Boolean value, for example to assign it to a variable or to pass it as a parameter. We can then use conversion
instructio~sif the target machine has them, or use the above scheme to produce code to jump to places where the proper values are constructed. S
The two most c o m ~ o nselection statements are the if-state~entand the case statement. The i f - s t a t e ~ ~ selects nt ne of twostatementsequences(one ofwhichmaybeabsent), based on the value of a oolean expression; the case statement (also sometimes called a ‘switchstatement’or a ‘selectionstatement’)selectsoneout of several statem~nt of an integer or enumeration expression. sequences, based on the value
~~e ~ - s t ~ t e ~ e ~ t The general form of an i IF Boolean e x ~ r e ~ ~ ~THEN s i o n true
f sequence
ELSE false seque~ceEND IF;
which results in the AST
Boolean e x ~ ~ e s ~ ~ i o nsequence true
.false sequence
Code generation for an if-statement is simple: allocate two new labels, fa1se-labe l -labe l,and generate the following code: Boolean control code( ~ o o l e a ne x ~ r e s s i o n / 0 code fortrue seque~ce GOT0 end-label; false-label: code forfalse seque~ce end-label:
I
false-label)
oolean control code is thecodegenerated by theprocedure Generate code for Boolean control expression withtheparametersshown. If the else-part is absent (or empty), the fa1se-labe l is not needed and we generate Boolean control code (Boole~n ex~ression/ 0 , end-label) code fortrue sequence end-label:
e case s t a t e ~ ~ ~ t We will consider se S
CASE case expression IN I , : state~entsequence I
.I..
I, : ~statelnentsequence, ELSE else -state~entsequence END CASE;
where I I ...I, are case labels -integer or enumeration values representing compile-time constants. The expression case expression must be evaluated. If its value is equal to one of the values I I ,...,I,, the corresponding statement sequence is executed. If not, the statement sequenceof the else-part is executed. here are many code generation schemes for case statements and it is the task of the c o ~ p i l e rto select an efficient one for the statement at hand. The choice depends on the number of case entries, the range (or reach) of the case labels, and on the density o f the case labels within the range. he followi~gis a simple scheme that works well when there are only a few case entries (n being small, say 10 or less). First, nt-2 labels are allocated: label-l through label-n, label-else, and label-next. Also, a temporary variable is allocated for the case expression. Then, the code of Figure 6.39 is generated. This scheme implements a linear search through all the case labels. Note that we allocate a temporary variable for the case expression. Usually, the language manual specifies that the case expression must be evaluated only once, and, even if the language manual does not, it may still be more efficient to do so.
tmp-case-value := case ex~ression; IF tmp-case-value = 1, THEN GOTO label-l;
...
IF tmp-case-value = 1, THEN GOTO label-n; // or insert the code at label-else GOTO label-else; label-l : code for stute~entsequence l GOTO label-next;
...
label-n: code for.state~ent sequence, GOT0 label-next; label-else: code forelse -state~ent sequence label-next: Figure 6.39 A simple translation schemefor case statements.
The execution timeof the above scheme is linear inn, the number of cases in the case statement.Caseselection in constanttime is possibleusing a jum le, asfollows. First
the compiler computes the lowest case label Iiowand the highest case label Ihigh.Then the entries, to be indexed with indices ranging from compiler generates a table of [hjgh-lio,+l 0 to I h ~ ~ h -The I ~ entries ~ ~ . in this table are code labels: label-k for an entry with index Ik-IloW,for k ranging from l to n, and label-el se for all others. Finally the follow in^ code is generated: tmp-case-value := case ex~~ession ; IF tmp-case-value I,, THEN GOTO label-else; IF tmp-case-value > I,,,,, THEN GOT0 label-else; GOTO table [tmp-case-value - Iiowl ;
If Ihigh-~~ow is much larger than n, many of the jump table entries contain label-el se, and the table may be deemed too space-inefficient. In that case, the case labels can be organized into a balanced binary tree, in which each node of the tree represents one case label I, therightbranchindicates a subtree with case labels larger than I, andthe left branch indicates a subtree with case labels smaller than I. For each node node-k in the binary tree, the following codeis generated: label-k: tmp-case-value IF THEN GOTO label of left branch of node-k; IF tmp-case-value > I, THEN GOT0 label of right branch of node-k; code forstate~entsequencek GOTO label-next;
Ifthe left branchand/ortherightbranchdoesnot exist, the c o ~ e s ~ o n d i nGOTO g is replaced by GOTO label-else. Many more advanced translation schemes for case statements exist. Several translation schemes for the case statement were analyzed and compared by Sale (1981). Very sophisticated techniques for producing good code for the case statement are desc~bedby Hennessy and Mendelsohn(1 982), Bernstein (1985), and Kannan and Proebsting(1994).
ts are the while statement and the for-state~e~t. while statement executes a statement sequence an indete~inatenumber of times (including 0 times), as long as the while expression remains fulfilled. The for-state~entexecutes a statement sequencea fixed numberof times.
The w ~ i state~ent l ~ WHILE
oozea an
expression DO state~entsequence END
WHILE ;
can be processed by allocating two labels: end-label and test-label,and generating the following code:
test-label: Boolean control code( ~ o o l e a nexpression, code forstate~ent sequence GOTO test-label; end-label:
0,
end-label) ;
In many cases, however, the following scheme results in more efficient code: allocate two labels: sequence-label and test-label,and generate the following code: GOTO test-label; sequence-label: code forstate~ent sequence test-label: Boolean control code ( ~ o o l e a ne~pression, sequence-label,
0) ;
This scheme is usually more efficient when there are several iterations, because it only executes a single conditional jump instruction per iteration, whereas the first scheme executes a conditional jump inst~ctionand an ~nconditionaljump inst~ctionper iteration. If ~ u u Z e ~e~pression n evaluates to f a1 the sefirst time, there will be no iterations, and the first scheme is more efficient. Which scheme is more efficient also depends on the target processor. Note that the branch in the conditional goto statementof the first scheme is actually taken only once. On many processors, in particular processors that maintain an instruction. look-ahead cache, a conditional branch instruction is more expensive when the branch is taken than when it is not, because when it is, the instruction look-ahead cache must be flushed. On the other hand, some modem processors perform look-ahead at the target of the branch instruction as well, andlor havea sophisticated branch prediction mechanism. We will first consider the following typeof ~ o ~ - s t ~ t ~ FOR
i IN lower b o u n ~ . . upper b o u n ~DO state~ent sequence
END FOR;
where i is thecontrolledvariable of the for-state~ent,the implicit step size is 1, and Zu~erb o u n ~and upper b o u n ~are to be evaluated once upon starting the for-statement. Code generation for a for-statement is quite tricky, because care must be taken that the controlled variableof the for-statement does not cause ovefflow. tiveapproach is to allocate a temp or^ variable tmp-ub for theupper erate the following code: i := lower b o u ~ ~ ; tmp-ub := upper b o u n ~ ; WHILE i c = tmp-ub DO code for~ t ~ t e ~ esequence nt i := i+l; END WHILE ;
where the while statement is handled as described above. Unfo~unately,this scheme will ~ oproduce u n ~the largest not always work. In particular, the computation of u ~ ~ e r may value representable for the type of the controlled variable i. After a while i will reach the value of tmp-ub. Then, on machines that detect overflow, the increment of the controlled variablewill cause anexception. On machinesthat do notdetectoverflow,the forstatement will never t e ~ n a t e because ? i can never become larger than the largest value representable for its type, and can thus never become larger than tmp-ub. Therefore,theloopterminationtestmustcomparethecontrolledvariable i with tmp-ub for equality, and be executed after the statement sequence, but before the increment of the con~olledvariable. However, moving the loop t e ~ n a t i o ntest to the end of the statement sequence means that another testis required to detemine if the loop is to be entered at all. This leads to the following, improved, scheme (with the tempor~yvariable tmp-ub allocated as above, and labelsloop-label and end-label): i := lower b o a n ; ~ tmp-ub := apper ~ o u ~ ~ ; I F i > tmp-ub THEN GOTO end-label; loop-label: code for s t ~ t e ~ e n~t ~ e ~ a e ~ ~ e I F i = tmp-ub THEN GOTO end-label; i := i 3- 1; GOTO loop-label; end-label:
In this gene ratio^ scheme, the first I F clause makes sure that the statement sequence will neverbeexecutedwhen lower ~ o u is n larger ~ than u ~ ~~e or u Also, ~ ~the. controlled variable i will never be incremented beyond upper ~ o uthus ~ preventing ~ , overflow and its consequences. i An issuethat we haveignoreduntilnow is whatvaluethecontrolledvariable should have after the loop is finished. In some languages? the for-loop declares the controlled variable implicitly, so it no longer exists after the loop. Some other languages not specify what value the controlled variable should have afterwards, orexplicitly specify it are e~oneous). Whenthe that it is notspecified (so thatprogramsthatdependon language manual does specify the value, the implementation should of course follow the manual, if necessary by adding the properassign~entat the end-label. Many languages offer more general formsof for-statements, for example providing an explicit step size. An explicit step size causes additional complications, as is illustrated by the following for-statement: FOR i I N 1..6 STEP 2 DO
... END FOR;
The complication here is that the controlled variable never becomes exactly equal to the upper bound of the for-statement. Therefore, the scheme described above fails miserably. With an explicit step size, we cannot compare the upper bound and the controlled variable for equality, and, as we have seen above, we cannot compare for greater/s~allereither, because of possible overflow problems. A solution lies in com~utingfirst the number of times the loop will be executed. An
~
~6.4 1Code 0 ~ e~n e r ~ t i oforn c~ntrolflow S
extra temp or^ variable is neededtoholdthisnumber.Notethatthistemporaryloop count must have a range that is large enough to represent the difference between the maximunl value and the minimum value representable in the type of the controlled variable. or example, if the controlled variable is of type integer, the value PIAX (integer) MIN (integer) must be representable in this temporary variable. ~emarkably,this can be a c c o ~ ~ l i s h ebyd making the loop count an unsigned integer of the same size as the integer: if therepresentablevalues of aninteger range from -2’’ to 2n-1, thenthe re~resentablevalues ofan unsigned integerof thesame size range from 0 to 2”+”-1, which is exactly large enough. Also, if the step size expression is not a constant, the step size needs a temporary variable as well. All of this leads us to the scheme of Figure 6.40. Note that, although we no longer use the controlled variable to detect whether the for-statement is finished, we still have to keep its value up-to-date because it maybeusedin the state~~ent sequence code. Also note that the compiler can switch to the simpler and more efficient scheme discussed earlier when it can determine that thestep size is l.
i := lower bound ; tmp-ub := upper bound; tmp-s t ep-s zei := step-size ; IF tmp-step-size = 0 THEN ,.. probably illegal; cause run-time error ... IF tmp-step-size <; 0 THEN GOTO neg-step; IF i > tmp-ub THEN GOTO end-label; // the next statement uses tmp-ub - i // to evaluate tmp-loop-count to its correct, unsigned value tmp-loop-count := (tmp-ub - i) DIV tmp-step-size + 1; GOTO loop-label; neg-s t ep : IF i <; tmp-ub THEN GOTO end-label; // the next statement uses -i tmp-ub // to evaluate tmp-loop-count to its correct, unsigned value tmp-loop-count := (i - tmp-ub) DIV (-tmp-step-size) + 1; loop-label: code forstate~ent sequence tmp-loop-count := tmp-loop-count - 1; IF tmp-loop-count = 0 THEN GOTOend-label; i := i + tmp-step-size; GOTO loop-label; end-label: Figure 6.40 Code generation fora general for-statement.
o~etimes,what looks like a for-statement actually is not. Consider, €or example, the for-loop of Figure 6.41. Here, exprl and expr3 may contain any expression, includ-
ing none at all. The expression expra may be absent (in which case we have an infinite loop), but if it is present, it must return a value of a type that is allowed in a condition context. If expr2 is present, the C for-loop is almost, but not quite, equivalent to the while loop of Figure 6.42; for a difference see Exercise 6.27. for (exprl; expr2; expr3) { body;
1
exprl ; while (expr2) { body ;
1
expr3 ;
Figure 6.42 A while loop that is almost equivalent to the for-loop of Figure 6.41.
Code generation for repetition statements is treated in depthby Baskett (1978).
t i ~ i z ~ t i ofor ns r ~ ~ ~ t i t i o n
st~t~~~nts
As can be seen from the above, the administration overhead for for-loops can be consider-
able. An effective optimiz thatreducesthisoverhead,atthecost of increasing code ~nro~lin~, thebody of theloop is replaced by several size, is loopunrolling. In copies of it, and the administrationcode is adjusted accordingly. For example, the loop *
FOR i := 1 TO n DO sum := sum + a [il ; END FOR;
canbereplaced
by thetwofor-loopsshowninFigure 6.43. Inthisexample, we have etor of 4. Note that, in general, we still need a copy of the original loop, with adjusted bounds, to deal with the last couple of iterations. If the bounds are compile-time constants, it may be possible to avoid this copy by choosing the unrolling factor to be a divisor of the loop count. This opti~izationis particularly effective when the body is small, so the unrolled loop still fits in the inst~ctioncache. It may also be used to increase the size of the basic block, which may improve chancesfor other optimizations.
FOR i := l
TO n-3 STEP
4
DO
// The first loop takes care o f the indices 1 .. (n div4) * 4 sum sum sum sum
END FOR;
:= sum
:= sum := sum
:= sum
+ + + +
a Cil ; a[i+ll ; a[i+2] ; a[i+31;
FOR i := (n div4) * 4 // This loop takes
sum := sum
END FOR;
+ a [il ;
+ l TO n DO care o f the remaining indices
Figure 6.43 Two for-loops resultingfrom unrolling a for-loop.
Until now we have discussed code generation for statements that affect the local flow of control. In this section, we will discuss routine calls and object method invocations. Execution of a r ~ ~call ~ transfers i ~ e the flow of control to the start of the called routine, The called routine will eventually return the flow of control to just after the routine call. Two important issues in calling a routine are routine identification -finding out which routine to call -and how to perform thecall and return. A third operation connected withthe flow of control in and around routinesis the non-local goto statement, which terminates the running routine and transfers control to a labeled code location in an ancestor of the runnin~ routine. This deviates from the simple call-return scheme described above, since the call is not terminated by the expected return. In o~ject-orientedlanguages, we invoke methods on objects, The effect of a method invocation with regard to the flow of control is identical to that of a routine call. Method invocation and routine call differ in degree in that the routine tobe called is almost always determined statically and the method to be invoked is often only deter~inedat run time, by using a dispatch table. An additional difference is that a method has direct access to the fields of the object. This access is implemented by passing a pointer to the object as an additional parameter, as discussed in Section6.2.9. We will now first consider what to call and then how tocall it.
efore wecantranslate a routine call, the routinemustbeidentified.Usuallythishas already been done during semantic checking, A routine name may be overloaded, but if the input program is correct, the language rules allow the compiler to identifya single routine to becalled, In languages that allowroutine variables, the routine to be called maybe the result of an expression, Inthis case, thecompilerwillnotbeabletoidentifytheroutinetobe called, and must produce code to evaluate the expression. This should result in a (run-
time) routine value, as discussed in Section 6.2.8; the routine value can then be called through an (indirect) routinecall, for which an instructionis available on all reasonable target machines. In object-oriented languages, method identification is not so simple. As we have seen in Section 6.2.9.4, in many casesa dispatch table must be consultedat run time to find the method to be invoked.The result of this consultation is again a routine value.
Calling a routine is not just a transfer of control to the code of the routine; the call in^ sequence must also create the components of an activation record, as described in Section 6.3.1 .l. Part of the calling sequence is performed at the routine call site and part is performed at the entry point of the called routine. Beforecreatingthecomponents of the activationrecord, space for theactivation record itself must be allocated. If a stack is used to store activation records, the space is allocated more or less automatically: the components are pushedonto the stack in the right order. If the allocation is explicit, the caller must allocate a chunk of memory of a suitable size, large enoughtocontain all components, This meansthattherequired size must somehow be made availableto the caller, for example as a run-time constant. In the following discussion we will assume that activation records are allocated on a stack; this is the most usual situation. Heap allocation of activation records can be derived easily from the discussion below; where special measures are required, these are described. A stack is a memory area onto which items can be pushed and from which items can be popped. Associated with a stack is a stack ointer, SP,which points to the ‘top’ of the stack; the stack pointer resides in a dedicated register. There is someconfusionaboutwhichwaystacksgrowinmemory.Inabstract descriptions of stack machines, the usual conventionis that pushing an item onto the stack raises the stack pointer; we used this convention in the pure stack machine of Section 4.2.4. On almost all real-world machines, however, pushing an item onto the stack Zowers the stack pointer. This implies that the stack grows from high-numbered addresses to lownumbered addresses. Since we are considering code for real machines here, we will follow the hardware convention in this section. There is an independent confusion about drawing stacks on paper: whichway do they grow on paper? Although the word ‘stack’ would suggest otherwise, stacks traditionally grow downwards in drawings, and we will follow that convention. Together the two conventions imply that memory addresses decrease from top to bottom in drawings, as was already illustrated in Figure 6.22. The major advantage of using a stack for the activation records is that they do not have to be explicitly allocated and deallocated, thus saving considerably on allocation and deallocationtimes.Anotheradvantage is thattheworkingstack of the caller andthe parameter area of the callee can be combined, which may save some time and space, as parameter values no longer have to be copied. We will now discuss the individual components of the activation record, in the order in which they are usually created.
The parameters of the routine must be stored in or pushed to the location in which the callee expects them. The compiler must impose a rule for accomplishing this. An example of such a rule is: when parameters are passed on the stack, the last parameter is pushed first. Such a rule is suitable for languagesthallowavariablenumber of parameterstobe passedin a routinecallandresults in the terarea showninFigure 6.22 whenthe activation record is constructed on the he compiler could also reserve a fixed number of registers for parameter passing and push the rest of the parameters onto the stack. Whatever scheme is chosen, it is essential that the caller stores the parameters in locations where the callee can access them. If the routine returns a result, the parameter area may also contain a pointer to the in which this result must be stored. some languages, a value can have a dynamic component, a component the size of which can only be determined at run time; the prime example is a dynamic array. For a parameter with a dynamic component, the dynamic component is allocated on the heap. The parameter part of the activation record then contains a pointer to the component, or a descriptor of known size witha pointer to the component. Usually, the language manual specifies which parameter passing mechanism used. The simplest and most common parameter passing mechanism i rvalue of the actualparameter is used as theinitialvalueofthe CO
is to be
any languages also support some form of output parameters, allowing a routine to values of some of its actual parameters by assigning to the corresponding formal parameters. There are two common mechanisms for this: ea change the is effected im change It, which the in is effected upon return from result be can byall call. the im~lementedby passing the lvalueof the actual parameter insteadof its rvalue. The usual context condition here is that the actual parameter has an lvalue. In call by reference, an assignment to the formal parameter is implemented as an assignment to this lvalue. Call by result is implemented by allocatinga local variable for the parameter, using this local variable throughout correspond in^ lvalue parameter in the routine, and assigning this local variable through the the return sequence. Another common parameter passing mechanism is c combination of call by value and call by result. It is impl~mentedusing the scheme of call by result, with the addition that the local variable ~llocatedfor the parameter is initialized with the valueof the actual parameter.
~~e a ~ ~ i ~ i s t ~ a art tion includes the frame pointer of the caller, which represents the dynamic link, and the return or continuation address. In languages that require so, it may ain the frame pointerof the lexically enclosing routine, which represents the static ing the initiali~ationof the administration part, controlis transferred to the callee, after saving the return address in the activation record of the callee or the continuation address in the activation record of the caller. §ometime during this phase, the old frame
pointer is stored in the administration part and FP is set to indicate the new activation record. The administration part also often contains space to save some machineregisters. For example, sometimes machine registers are used for local variables and temporaries. When this is the case, these registers must be saved on routine call entry and restored on routine exit. There aretwofrequentlyusedschemes for registersavingandrestoring:‘caller saves’and‘calleesaves’.Inthe ce-savesscheme, theroutineen saves all registers that may be corr ted by the callee’s code. In the the routine call code contains code which saves the registers that the caller requires to be unharmed upon continuing after the call. In this scheme, the callee is free to use and corrupt all machine registers, since thecaller has saved the onesit needs. The caller-saves scheme usually requires fewer register saves and restores during run time, but may require more instruction space, because every calling sequence also contains code to save and restore registers. The callee-saves scheme only has code for this purpose at the routine entry and routine exit. Note that in the caller-saves scheme, the registers are saved in the caller’s activation record, whereas in the callee-saves scheme, they are saved in the callee’s activation record.
The local varia~1e area e control has been transferred to the callee, thecallee can start building th ,the part of the activation record in which local variables and CO poraries reside. Usually, the compiler can determine thesize of this component by considering it as a record of which the local variables of the routine are thefields; given the sizes and alignment requirements of the local variables, the size of the local variable area follows. The compiler also knows what temporary variables are required for the code of the routine, by keeping track of their number, size and alignment requirements during thecode generation phase for this routine. This i n f o ~ a t i o nis very dependent on the details of the code generationprocess,since o~timi%ationsmayintroduceorremovetemporaryvariables. Space for the local variable area is allocated by decreasing the stack pointer by the proper amount. As with. parameters, one or more of the local variables may have a dynamic component; this dynamic component can be allocated on the heap, butfor local variables there is another option, allocating it in the ~ y ~ aaZZacatia~ ~ i c part of the activation record, as explained below.
The work in^ stack The local variable area is followed by the ork king stack, which is used for anonymous i n t e ~ e d i a t eresultsfromexpressions.Itmayalsobeused for thebounds of forstatements, although it is more common to put these in temporaries. Going through the code of a routine, the compiler can easily keep track of the size of the working stack and record its maximum size. At the end of the scan, the ma~imumsize of the working stack for the routine is known; it can then be incorporated as a fixed-size block in the activation record. The top of the workin stack is indicated by the stack pointer as left by the allocation
of thelocalvariablearea,andtheworkingstackitself is thespacebelowit.Machine instructions with push and pop properties can be used to access the working stack.
The ~ y ~ aallocatio~ ~ i c part If the target language allows activation records of dynamically extensible size, the activation record may also have a separate art for local variables. These dynamic parts are then stored in the dynamic allocation part instead of on the heap. Since the dynamic allocation part is the only part the size of which cannot be deter~inedstatically, it has to come at the end of the activation record, as shown in Figure 6.44. In this set-up, the stack pointer points to the 'top'of the dynamic allocation part.
j
parameter 1
" - FP
local variables
working stack d y n ~ allocation ~ ~ c
"""_"""" addresses
I
-"-- S??
An activation record with a dynamic allocation part.
Some processors do not allow dynamic allocation parts, because they require the size of the activation record to be specified when it is created, by the compiler. For example, C processor has a savelrestore ~echanismfor creating new activation records; the save part of the mechanis~requires the size as a parameter. If the target language allows it, however, storing the dynamic part of local variables or para~etersin the activation record has the advantage that it does not have to be deallocated explicitly on routine exit. Instead, its deal~ocationis implicit in the deallocation of the activation record. Also, allocation onthe stack is muchfaster, since it only involves resetting thestack pointer.
A disadvantage.is that the stack pointer SP is now no longer available for manipulating the working stack: a working stack pointer must be implemented in software. But it is still not necessary to check for stack overflow, since sufficient space is guaranteed to be available.
~ e t u rf ~~n ci t i~o n~results If the callee returns a value, the result can be made available to the caller in several ways. For a ‘simple’ result type, such as an integer, real, or pointer, the compiler usually uses a machine register, called the ~ u ~ ~ result t i o re~ister.The callee simply stores the result in this register, and the caller accesses this register to obtain the result. This register may serve other purposes as well, for example as a scratch register in the evaluation of expressions. For a ‘compound’ result type, which means any result that does not fit naturally into a register, the situationis more complicated, There are three reasonable solutions:
-If the compiler knows the
size of the result, it can allocate a temporary variable in the data space of the caller, and pass its address as an extra parameter to the callee. The callee then stores the result through this address. -Space for the result can be allocated dynamicallyby the callee, and a pointer to the allocated space is returned through the function result register. -The result can be left on the working stack of the callee, in the dynamic allocation part of its activation record, or in oneof its local variables, with a pointer toit in the function result register. Note that when activation records are allocated on the stack, the return sequencemustmakesurethatthememoryareainwhichtheresultresides is not overwritten. Also, the caller must then copy or use the result before it can use the stack again. When activation records are allocatedon the heap, the caller must copyor use the result before releasing the activation record of the callee.
The call in^ and return s e ~ u e ~ c e s To sumarize, the calling sequence consistsof the following steps:
-Create an activation record. -Evaluate parameters and store them in the activation record. -Fill the administration part of the activation record. Entries include the frame pointer of the caller and the return address. They may also include the lexical pointer if required, probably some machine registers, and possibly also the old stack pointer. -Transfer control to the callee. Make the frame pointer FP point to the activation recordof the callee. -Update the stack pointerSP,allowing enough space for the local variable part. The dynamic allocation part, if present, is filled as the values computed. The return sequence consistsof the following steps:
of the local variables are
’l Some data types,for example a record withjust four character fields, fit unnaturally into a register. Most compilers will consider such result types ‘compound’.
-If the callee has a return value, store the result in the area designated for it. This is discussed below in more detail.
-Restore the machine registers that were saved in the administration part. -Update the stack pointerS P so that it indicates the frame pointerFP. -Restore the frame pointerFP from the administration part.
-Transfer control to the caller, using the saved return address. Note that
the ad~inistration part can still be accessed, because S P now indicates the activation record. -Release the activation recordof the callee.
ctiv~tionrecords and stack la you^ A typical layout of a stack with activation records is shown in Figure 6.45; the diagram depicts two activation records, that of a caller and that of its callee. We see that the two activationrecordsoverlappartially,asaresult of thefollowingtechnique. "'he caller evaluates the actual parameters of the call one by one, in last-to-first order, using its working stack. After evaluating a parameter, the resultis left on the working stack, and the next one is evaluated on top of it. In this way, when all parameters have been evaluated, their values lie on the top of the caller's working stack. The working stack of the caller can now be used as the parameter area of the callee, resulting in overlap between the two activation records. In this set-up, local variables are addressed through the frame pointer FP using negative offsets, parameters are addressed through FP using positive offsets, and the working area is addressed through the stack pointerSpa The position of S P in Figure 6.45 indicates an empty working stack. If the implementation places activation records on the heap instead of on the stack, some of the techniques described above are not possible, Since activation records on the heap are separate entities, they cannot overlap and the working stack in one cannot serve as the parameter area in the other. Since records on the heap cannot be extended, activation records on the heap cannot have dynamic allocation parts. Routinecallingandparameterpassinghavereceivedampleattention in theearly literature; some pertinent papers are by~ i c h m ~ (1977) nn and Kowaltowski (1981). S
The flow-of-controlaspectsofnon-localgotostatementshavealreadybeencovered in Section 6.3.5.4. We will consider here the datah ~ d l i aspects. n~ The main data issue of the implementation of a non-local goto statement is that the registers that were saved in the intervening activation records must be restored when these activation records are destroyed. In the absence of special measures there isno proper way to do this, because there is no way to find out which registers to restore from a given activation record. A special measure that can be taken to prevent this problem is to record ineachactivationrecordwhichsavedregisters it contains.Restoringtheregistersthen becomes trivial, at the cost of some extra book-keeping at each routine call. For some target machines, it may be feasible to always, in each activation record, save and restore the same set of registers, so that it is not required to record which registers are saved. Another option is not to use register variables in routines that can be the target of a non-local goto.
" " " " "
parameter n parameter 1 ad~inistrationpart
i
old FP
activation recorc of the caller
localvariables
working stack
" " " " " -
parameter k
parameter l activation recon of the callee
ad~inistrationpart
i l
dynamic link
I Q+"
FP
local variables
" -SP I working stack
5 Two activation records on a stack.
These routines must then also save all registers on routine entry, and restore them on routine exit. This technique allows the implementation not to restore registers from intervening activation records. We will now turn toa form of control flow in which the non-local goto plays a significant role.
There are many ways a program can get into trouble. Some examples of problems are: integer overflow, an array bound error, memory exhaustion, and power failure. ~epending on circumstances, the generated code, the run-time system, the operating system, or the hardware may detect the error, and then some level of the system will probably cause some error message to be displayed and the program to be terminated. Or the error may go unnoticed. Since this is often not the desired situation, some languages allow the programmer to specify which errors mustbe caught and also to specify the actions to be performed upon the occurrence of these errors or exceptional situations. So, handling of a r u n - t i ~ e
error consists of two parts: the error must be detected and the actions for theerror must be performed. Both have their problems: detecting the error may be difficu~tand some of the required actions are veryinconvenient.
A run-time error can be caused either by the program itself or by external circumstances. Detecting a run-time error caused by external circumstances is very system-specific. For example, a m e ~ o r yfailure -the ~alfunctionof a mem~rysection -may or rnay not be made available to the program. It is up to the run-time system to make errors caused by external circumstances availableif it can. Often, the operating system makes exceptional external circumstancesavaila~leto the UNIX operating system, a run~timesystem by causing an interrupt. For example, on the usercan i n t e ~ p ta runningprogram by typing a control-Corsimilarkeystroke. The default action on aninterrupt is to terminate the program,but UNTX includes a mechanism for making the interrupt available to the program. The p r o g ~ ~ismthen able to detect that an i n t e r ~ phas t occurred, and take appropriate action. Exam~lesof run-time errors caused by the program itself are: integer overflow, array bounderrors,division by zero,anddereferencing a nullpointer. S o ~ of e theseerrors causetheoperatingsystemtogivean interrupt, butoften,thecompilermustgenerate checks to detect these run-timeerrors. Several error conditions can be checkedfor by examining certain bits in the processor status word. For example, there is usually an overflow bit, indicating that the last instruction performed caused an overflow, Most processors lack special facilities for higher-level error conditions, for example array or range bound errors. If such errors are to be detected, the compiler must generate code to perform a check, testing that a certain integer falls within a range. Such a check basically looks as follows: IF (Val lowerbound) OR (Val THROW range-error; END IF;
>
upperbound) THEN
where T H R O is ~ a basic statement to signal anerror. These checks can bequite expensive, which is why many compilers have a switch that disables the generationof these checks.
When a programming language does not have facilities to allow the user to deal with runtime errors, the story ends here. The ~ n - t i m esystem can make sure that it produces an intelligible error messa~e,and then rnay termi~atethe program, producing some kind of memoryimage,usuallycalled a ‘memorydump’,whichcanthenbeexaminedusing a post-mo~emdebugger. Many languages, however, include a mechanism for dealing with run-time errors. We will now discuss two such mechanisms, signal routines and exceptions.
A simplemechanism for dealingwithrun-timeerrors
is the nal statement. A signal statementbindsaspecificclass of errorconditionstoaspeciuser-definedroutine,the ler. Wheneverarun-timeerroroccurs,thecorrespondingsignalhandler is led. The signal handler might then close open files, free system resources, print a message, and terminate the program. Alternatively, it could deal in some other way with the error, and just return. In the latter case, the pro ram will resume just after the point where the error occurred. To implement the signal statement, the run-time system maintains a program-widelist of (errorconditionclass,signalhandler)pairs, so it cancallthe co~espondingsignal handler when a run-time error occurs. Sometimes it is convenient to have the program continue at a point different from where the run~timeerror occurred, For example, a program could be in a main loop processing commands. A run-time error in one of the commands should then cause the program to print an error message and continue with the main loop. The programmer can implement this by using a non-local goto statement in the signal handler to transfer to the main loop of the program. In a non-nesting language like C, the s e t j m p ~ o n ~ m mechanp ism can be used,as shown in Figure 6.46.
#include <setjmp.h=. jmp-buf j mpbuf;
/* type defined in setjmp.h
*/
void handler (int signo) { printf (“ERROR, signo = %d\ntt , signo); longjmp (jmpbuf , 1 );
1
int main(void) { /* install the handler ... */ signal (6, handler) ; /* ... for some signals*/ signal (12, handler) ; if (setjmp(jmpbu~)== 0) { /* setting up the label for longjmp() lands here */ /* normal code: */
... } else { /* returning from a call of longjmp() lands here*/ /* exception code: */ ...
Figure 6.46 An example program usings e t j mp/longj mp in a signal handler.
A more flexible feature, available
in amming languages, is the facility to specify ‘exception handlers’. An ~x~~ specifies particular a error condition and a set of statements that will be executed should this error condition occur. This allows an exception handler to access the local variablesof the block or routine in which the error condition occurred. The statements in the exception handler replace the rest of the statements in the block or routine, should the corresponding error condition, or ‘exception’, occur. ~ s u a l l yeach , block of statements or each routine can have its own exception handler. It can even have several exception handlers, for different error condition^. When an exception E occurs, the chain of activation records is followed, releasing activation records on the way as in the implel~entationof a non-local goto, until one is found that has a handler for the exception E. This handler is then executed, and execution resumes at the end of the block or routine that the handler is associated with -unless the exception handler terminates the programor causes another exception. For each exceptionhandler,thecompilergeneratesthe code co~espondingtothe statements in the exception handler,term in at in^ it with a jump to the endof the block associated with the handler, and labels this code with a unique handler label. Also, for each block or routine with exception handlers it generates a table of (exception, handler label) tuples, with one entry for each exception handler. In addition, the administration partof each activation record must contain a pointer to the table of exception handlers currently installed for its routine. If the exception handlers are associated with a routine, this is a constant; if they are associated with a block, the pointermustbeupdatedasblockswithexceptionhandlersareenteredandleft.Now, when an exception E occurs, the chain of activation records is examined as follows, most recent activation record first:
I The pointer to the table of exception handlers is extracted from the currently examined activation record.
2 The table referenced is searched for a handler for exception E. If such a handler H is found, step 3 is performed. If not, step l is perfomed on the parent activation record is ter(the one of thecaller). If thereare no moreactivationrecords,theprogram minated, possibly after printing a message. 3 A non-local goto to the exception handler H is performed. This automatically releases all e~aminedactivation records that turned out not to have a handler for exception E. Note that the ~eneratedcode for the exception handler takes care of the continuation of the program execution,once the handler is finished. The discussion above leads us to the impo~antobservation that routines with exception handlers can be the target of a non-local goto. So, depending on the implementation of non~localgotos, the compiler may or may not keep variables of routines with exception handlers in registers, in accordance with the preceding discussion on non-local gotos. A disadvantage of the method described above is that, depending on the i ~ p l e m e n t a ~ tionofthenon-localgoto,activationrecordsmaybeexaminedtwice,oncetofinda handler, and once to perform the non-local goto. An optimization to this a~proachis to restore information from an examined activation record when it turns out not to have a
handler for the exception, and to then discard the activation record. The exception handler mechanism is more flexible than the signal handler mechanism. Its increased flexibility comes, however, at the cost of a slight increase in the number of instructions required to perform a routine call or block entrance; these instructions are necessary for theconstruction of theexceptionhandlertable.Otherexceptionhandler implementations are feasible, of which we will discuss one example. An alternative implementation replaces the (exception, handler label) pairs mentioned above by a program-widelist of tuples(exception,handlerlabel,startaddress,end ere ‘startaddress’indicatesthebeginning of theblockprotected by the handler, and ‘end address’ indicates the end of that block. When an exception occurs, the run-time system uses this list of tuples to determine which handler should be invoked, as follows. First, the value of the program counter at the time the exception occurredis compared with the start and end addressesof blocks protected by a handler for the exception at hand. If it falls within a start address-end address range, a handler is found, and a nonlocal goto is performed, as discussed above. If not, the return address is extracted from the activation record at hand. This return address representsthe position in the code where the caller currently is. This position may again be protected by a handler, so, again, it is compared with the start and end addressesof blocks protected by a handler for the exception at band. Again, as above, if a handler is found, a non-local goto is performed. If not, the return address ofthe caller is extracted, etc. Ultimately, either a return address (and an activation record) witha handler is found, or execution terminates. This scheme is more efficient as long as no exceptions occur. However, searching a handler may be more expensive, depending on the exact implementation of the exception handler table.
es) are syntactic structures in whicha number of related items are grouped together. They often restrict access to their contents by providing an explicit a interface, which is then the only means of accessing the module. The related items in module could for example be variables and routines affecting these variables. Modules are somewhat similar to objects, but there are also considerable differences. In some respects theyaresimpler:theyusually cannot becreated dynamically, andlack all the objectoriented features discussed in Section 6.2.9. On the other hand, modules are compilation units and it must be possible to compile them separately; second, modules often require an explicit initialization. Regarding code generation, modules introduce two problems to the compiler writer:
-Thetargetlanguageusuallyhasone,
flat, namespace. ”herefore, thecompilermust make sure that if two different modules export an item of the same name, they have a different name in the generatedcode. -The compiler must generate code to perform the module initialization; a co~plication here is that at module initialization modules may use items from other modules, so these other modules mustbe initialized first.
Usually, the rules that the characters in a name must obey are more strict in the source language than in the target language. Often, there is a character c that is allowed in names in the target language but not in names in the source language; examples are the .(dot) andthe $ sign. If so, this feature canbeusedtocreateuniquenames for itemsin a module: simply concatenate the item name to the module name, using c as a separator. Note that this assumes that module names are unique, which they usually are. If there is no such character, there may be some other rules in the source langua can be exploited in the compiler to produce a unique name. If all else fails, the compilation process could havea phase that analyzes the complete program and does name generation.
Most source languages allow modules to have explicit i ~ i ~ i e , for example for lobal variables. Even if this is not the case, a language might require implicit initiali~ation of these variables. Note that a module, and more specifically the initialization code of a module, could use items from other modules. This means that those modules should be initialized earlier.Thiscanbeaccomplished by havingthe initiali~ationcode of each module call the initialization code of all the modules that it uses. Then, the whole initialization phase can be started by calling the initiali~ationphase of the module that contains the main program. When adopting this solution, there are two issues one should be aware o f avoiding multiple i~itiali~ations and detecting circular dependencies. If module A uses module B and module C, and moduleB also uses module C, the initialization code of module A calls that of module B and that of module C. The initialization code of module B also calls that of module C, so C’s initialization will get called twice. This canbeprevented by having a module-s~eci~ic variable This rnodul e has been initial zedi in each module, which is set to true once its initialization code is called. The initialization of a module then becomes: IF NOT
This module has been initialized: SET This module has been initialized TO true; // call initializationo f the modules usedby this module // code f o r this module’s own initializations
ircular dependencies between modules ~ e c ~ c ~ tare i ousually ~s detectedby the compiler.
hen compiling a module specification A that imports a module specification B, the com-
piler usually demands that module specification B is present for e~aminationor has already been compiled. If module specification B then requires module specification A, the compiler can easily detect thatit already was compilingA, and deal with this according to what the language manual prescribes (whichis probably to disallow this). Formodule i ~ ~ Z e ~ e ~the t situation ~ t ~ ois~ sdifferent. W e n compiling a module
implementation A, which uses module specification B, the compiler can see what module specifications are used by the module specification for B,but it cannot know what module specifications are used by B’s implementation, since the latter need not even be available yet. Now suppose the present or a future B implementation uses A’s specification. Then we have a circular dependency, which is not detected at compile time, because when cornpiling B’s implement~tio~, the compiler only readsA’s specification, which in our example does not useB at all. One way to detect circular dependencies is to postpone this check to run time, during module initialization. Eachmodulethenhas a module-wideBooleanvariable This module is being initialized, and the initialization code then becomes: IF This
module is being initialized:
// circular dependency; deal with it
IF NOT This module is initialized: SET This module is initialized TO true; SET This module is being initialized TO true; // call initialization of the modules used by SET This module is being initialized TO false; // code for this module‘s own initializations
this module
A disa~vantageof this approach is that the error does not become apparent until the compiled program is run. A more elegant solution to this problem is to have the compiler produce a list of the modules that each module uses. A separate compilation phase, which is invoked after all module implementations have been compiled, may then read theselists and detect circular dependencies. Such a phase could also be used to determine a global initialization order, by imposA
Many languages offer generic units. A enerie unit is a recipe for generating actual versions of the unit; it is parameterized with one or more generic parameters, usually types. Generic units are usually limited to routines, modules, and object classes. A generic unit must be instantiated by supplying the actual parameters to produce an instance of the generic unit,whichcanthen be used like any other unitintheprogram.Section2.1.12.3 presented the classic exampleof a generic list. When generating code for a generic unit, we are faced with the question of how to implement a generic parameter, especially when it is a type. There are two fundamental ways to deal with this problem. The first way is to not produce code for the generic unit itself, but rather to produce code for every inst~tiation. This can be viewed as expanding the generic unit as if it were a macro, as already suggested in Section 2.1.12.3 in the con-
text of macro processing. Alternatively, the compiler writer can design run-time representations of the generic parameters whatever they are, and pass them as normal parameters duringruntime.Giventheunusualnature of genericparameters,theyhaveunusual representations9called ‘dope vectors’.
~n§ion9nocode is generated for thegenericunit,which only. Suppose we have a genericunit G with a generic parameter type T, and suppose the generic unitis instantiated with actual parameter typetp. The instantiation then consistsof the following steps:
-Create a copy of the AST of the generic unitG. -In the copy of the AST, replace every occurrence of identifier
T which is identified as the generic parameter typeTby the type indicated by the identifier tp at the instantiation point. rocess the resulting AST as if it were an ordinary AST,resulting from an ordinary unit.
An important issue that must be treated with care is name generation, also discussed in Section 6.5.1. Usually, the target language has a single, flat, name space. Therefore, the compiler must generate a unique name for each instantiatio~.We cannot just use the generic unit’s name -which is supposed to be unique -since a generic unit may be instantiated more than once. However,the instantiation probably also hasa narne, and the unit in which the instantiation takes place also has a name. A combination of all these names inst~tiatedunit. allows the compiler to produce a unique name for the items in the Instantiation through expansion is relatively easy to implement and there is no runtimeoverhead. It may,however,increasethe size of thegenerated code significantly, especially when a generic unit contains instantiations of its own. Of course, the compiler could keep track of which instantiations it has already performed and reuse those when possible, but this will not always help, because all instantiations may be with different parameters.
e ~ e ~ t o rthe § , compiler actually produces code for the gengenerated code will have to utilize run-time descriptors for the instantiation parameters. The ~ n - t i m edescriptor of an instantiation parameter must contain enough infor~ationto support all possible usages of a generic parameter of its kind: for a constant, it should contain its value; for a routine, it should contain its address a lexical pointer, if applicable; and for a type, it should contain a so-called If a generic parameter is a type tp,the compiler will have to g~neratecode to allocate, initialize, and deallocate variables of type tp, and to perform operations that involve tp, for example assign~ent,comparison, copying, and possibly a~thmeticoperations of various lengths. Since these are operations that very much depend on the natureof the type tp, it is con~enientfor the compiler to generate calls to routines which perfom these operations. The addresses of these routines can then be found in the run-time descriptor of the instan-
tiationparameter,thedopevector.Thismakes a dope vector very similar to a method table with entries for a constructor, a destructor, a copy routine, etc. So, the dope vector has an entryfor each operation that is allowed on a value or variable of a genericparametertype,includingallocation,deallocation,andcopying.Each entry indicates a routine implementing the specific operation,as depicted in Figure6.47.
boo1 compare (tp *tpl, tp *tp2) { ... }
I
void assign
I I
I
~
(tp *dst, tp *src) {
,..
}
void dealloc(tp *arg) { ... }
+
.
tp *alloc(void) { ... }
+
.
void init ( t p *dst) { ... }
Figure 6.47 A dope vector for generic typetp.
The main advantage of the use of dope vectors is that the executable code of the generic unit is shared by all its instantiations. A disadvantage is the added overhead: often, dynamicallocationschemesmust be used for types for whichsimplestaticallocation would be sufficient. Also, there is the routine call overhead for simple operations such as assignment and comparison. Note that this overhead cannot be eliminated by in-lining the call, since the routine to be called is not known until run time.
This brings us to the endof our discussion of the strongly interrelated issues of intermediate code generation and code generation for run-time systems, for imperative and objectoriented languages. We have seen that the most important run-time data structure is the activation record: it allows both data and the flow of control to move from routine to routine. Code for its manipulation can be generated in-line or be contained in run-time library routines. The design of other run-time data structures, which derive from source language type constructors, is fairly s~aightforward.We have seen that all source language flow of control is expressed by simple unconditional and conditional jumps in the intermediate code; the generation of code for expressions has already been covered in Section4.2. Special attention was paid to the selection of the proper method to call in an object-oriented method invocation; dispatch tables of varying complexity are used to implement specific object-oriented features. Another area in which intermediate code and run-time system are interwoven is exception handling, Finally we have discussed issues in the implementation of modules andof generics.
-Static correctness checking during context handling consists mostly of two issues: identification and type checking.
-Identification matches an applied occurrence of an identifier or operator withits defining occurrence, and with a symbol table entry, in which the i n f o ~ a t i o nabout the item is collected. -A scope stack allowsfast and easy manipulationof scope i n f o ~ a t i o nsets. -Somelanguagesallowidentifiers or operatorstohaveseveraldifferentdefinitions through overloading. Overload resolution is the process of reducing the set of possible definitions of an operator or identifier to a single one. -Type checking is the process thatd e t e ~ i n e which s type combinations inthe source program are legal according to the rules of the source language. It does so by d e t e ~ i n i n g which types are equivalent, and deciding which coercions are allowed in which context. -Each node in the AST of an expression represents either an lvalue (a location) or an rvalue (a value). These attributes propagate bottom-up, according to languagedependent, but usuallyintuitive, rules.
-Characters, integers, floatingpointnumbers,enumerations,andpointersallhavean obvious corresponding data type in virtually all target languages.
-The target representation of a record type consists of the consecutive target representations of its members, possibly separated by gaps to obey alignment re~uirementsof the target machine. -The target representation of an array type consists of the consecutive target representations of its elements, either in row-major or column-major order. In addition, a run-time array descriptor is required for dynamic arrays. -The run-time representationof an object consistsof two parts: (1) a pointer to a table for the present implementation of the object type, and(2) a record holding the object fields. he table is a dispatch table for the methods, and possibly an index table for the fields. -~n~eritance and p o l y ~ o ~ h i smake m it necessary to enumerate object methods, and use o dobtain its address at run time from the dispatch the numeration index of a ~ ~ t h to table. Dependent mu~tipleinheritance requires a similar technique to find the offsetof a field within an object representation, using thefield index table. -An activation record holds the data pertinent to the invocation of a routine or object method. It includes the parameters, an administration part, local variables, and a working stack. -If r o ~ t i ~ are e s activated in a strictly last-in-~rst”outorder, a stack can be used to store the activation records.
-An activation record
is accessed through a frame pointer, which usually resides in a dedicated machine register. -A routine is a piece of code, reachable through its code address. W e n a routine is called, an activation record is created for it and flow of control is transferred to its first instruction. -"he above actually is a sub~o~tine, Other kinds of routines are iterators and coroutines. -An iterator yields a succession of values. -A coroutine can suspend itself, and does so by transferring control to another coroutine, which then resumes at the point whereit was last suspended. -Routines can be global or nested. A global routine can onlyaccess global entities and its own local entities. A nested routine can access those, and can in addition access entities in lexically intervening routines. -Operations on routines include declaringa routine (specifying its name, parameter types, and return type), defining a routine (declaring it and supplying its code), calling a routine, passing a routine as parameter, r e t u ~ i n ga routine as a value, jumping out of a routine (a non-local goto), and partial parameterization (creatinga new routine with someof the par~metersof the original routine already supplied). -Nestedroutinesrequireaccess to lexicallyinterveningroutines. This canbeimplemented by means of a lexical pointer, which is a copy of the frame pointer of the lexically enclosing scope. -Passing a nested routine as a parameter or returning it as a result requires a two-pointer value: a lexical pointer and a routine address, R e t u ~ i n ga nested routine as a result may violate pointer scope rules. -A closure is a representation of a partiallyparameterizedroutine,withspace for all para~etersplus a mask indicating which parameters have been filled in. -A nested routine can be lifted out of its parent by isolating the activation record of the parent routine as a data structure and passing it to the lifted routine. This allows all the access the nested routine had. -Any place a lifted routine is called, the activation record of its parent must be passed to it. In p ~ i c u l if~ a, lifted routine is passed around, the activation record of its parent must be passed along with it; they form a (routine address, activation record address) pair.
-The two main mechanisms
for local flow of control are selection (if- and case statements) and repetition (for- and while statements). Code is generated for them by mappingthemtolower-leveltargetlanguageconstructs,such as conditional jumps and assignments. f-statements is trivial, using true and false labels. The jump can be integrated oolean expression by passing the true and false labels to the routine that generates code for it. -Case statements can be implementedas linear lists of tests, jump tables, balanced binary trees, and others.
-Code for the while statements is similar to that for the if-statement, including the incorporation of the jump into the Boolean expression.
-The for-statement poses two problems: ina naive implementation the controlled variable may overflow where it should not, and for non-unit step size, the upper bound may be overstep~ed. Bothproblemsareremedied by a careful implementation;theresulting code is s u ~ ~ s i n gcomplex. ly e code for repetition statements can be optimized by ‘unrolling the loop’, generating e code for the loop body multiple times. -The translation of a routine call or method invocation consists of two steps: identifying the routine or method called, and performing thecall. -The routine to be called can be a named routine, a routine variable or a method. The first two identify themselves,the third is looked up in the dispatch table. -A routine is called in three steps: an activation record is allocated, it is partly filled with i ~ l f o ~ a t i oand n , control is ans sf erred to the stat of the code of the routine; filling the a activation record rnay then continue. Many detailed decisions are required to design complete routine call protocol. -The i n f o ~ a t i o nstored in the activation record may include parameters, return or continuationinformation, a lexicalpointer, a dynamiclink(oldframepointer),dumped register values, local variables, working stack, dynamic allocation part, etc. -Storing of register values rnay be done by the caller (‘caller saves’) or bythe callee (‘callee saves’), with subtly differenteffects. In both cases it is the register values of the caller that get saved. -A dynamic allocation part may be present to hold the dynamic parts of local variables and perhaps parameters. -Tn most languages activation records can be allocated on the stack. If routines can be returned as values (functional languages), or if more than one thread is present (coroutines) this is not possible and heap allocation must be used. -A return value can be delivered in a function result register, on the top of the working stack, or undera pointer passed as an input parameter. -While p e ~ o ~ i an non-local g goto, the registers stored in the activation records that are moved must be restored. Special code is required for this. n-time error handling (exception handling) consists of two parts, detecting the error and processing the error. tection of run-time errors depends in complicated ways on language properties, the operating system, the run-time system, and compiler-generated checks. Two different mechanisms€ordealingwithrun-time errors are: signal handlers,andexception handlers. -A signal statement binds a specific class of error conditions to a specific user-defined routine, the signal handler. Whenever a run-time error occurs, the corresponding signal handler is called. -To implement the signal statement, the run-time system maintainsa program-wide list of (errorcondition class, signalhandler)pairs, so it can call thecorresponding signal handler when a run-time error occurs. Signal handlers often performa non-local goto. -An exception handler specifies a particular error condition and a set of statements that willbeexecutedshouldthiserrorconditionoccur.Thestatementsintheexception
handler replace the rest of the statements in the blockor routine, should the corresponding error condition, or exception, occur. -Whenanexception E occurs,thechain of activationrecords is followed,releasing activation records as in the implementation of a non-local goto, until one is found that has a handler for the exception E. This handler is then executed, and execution resumes at the end of the block or routine that the handler is associated with. -The adminis~ationpart of each activation record containsa pointer to the tableof exception handlers currently installed for its routine. If the exception handlers are associated with a block, the pointer must be updated as blocks with exception handlers are entered and left. -The presence of modules in a source language forces the compiler to generate unique names, because the target language usually has a flat name space. -The initialization phase must make sure that modules are initialized in the proper order, obeying module dependencies, -The two common implementations for instantiation of generic units are expansion, in which the compiler processes the generic unit at the instantiation point as if it were an ordinary unit, and dope vectors, in which the compiler generates run-time descriptors, the so-called dope vectors,for the generic parametersof the generic unit.
Formanyyears,theonly p r o g r a ~ i n glanguages in. existencewereimperativeones. Examples of imperativeprogramminglanguagesare:FORTRAN,Algol 60, Pascal, of these languages are still Algol 68, Modula-2, C, Ada, and many, many more. Some quite popular. Consequently, there exist many compiler construction books that discuss the implementation of i ~ p e r a t i v e p r o g r a ~ languages. ing Examples are the booksby Louden (1997), Terry (1997), Parsons (1992), and of course Aho, Sethi and Ullrnan (1986), but there are many more. The first object-oriented language was Simula. It introduced classes, inheritance, and polymo~hism.It was originally designed specifically for simulation problems (hence the name),butwas later developed into a completeprogramminglanguage.Other,more well-known, object-oriented languages are Smalltalk, C++, and Java. There are no books thatarespecificallyaboutthecompilation of object-orientedlanguages,althoughboth Appel(1997) and Wilhelm and Maurer (1995) pay considerable attention to the subject. Many advances in the compilation of imperative and object-oriented languages are reported in the conference proceedings of the ACM S I G P Conference ~ ~ on Programming ~ n g u a g Design e and Implementation -PLDI, the Conference on Object-Oriented Programming S y s t e ~ s ~ , n g u a g e sand ~pplications-OOPSLA, the IEEE International Conference on Com~uterLanguages -ICCL, and the European Conference on ObjectOriented Program~ing-ECOOP .
Why is the result of the expression EXPR >= 0 trivial, when EXPR is an expression of type unsignedinteger? In Section 6. l.1,Figure 6.4, we first follow the hash table and then split according to the name space desired. Perhaps a more intuitive way of implementing name spaces is to have a separate hash table for each name space, md pass a pointer to it as a ~arametertotheroutine Identify ( ) . A call Identify ( N a m e space, inf o, Discuss Name ) would then return a pointer to the proper Declaration the pros and cons of this idea. The following declarations are givenfor a language that uses name equivalence: A, R : array [1..10] of int; C: array [l.. l01 of int; D: array [l..l01 of int;
Explain which of these four variables have the s m e type and which have different types. Explain the transfo~ationsthe type of a routine undergoes in partial p ~ a m e t e r i ~ a tion. ~ e t e ~ i which n e nodes c m y lvalues, which carry rvalues and where dereference operatorshavetobeadded,intheAST for theexpression a [a [1 ]3 := S I: 0 1 .sel; assume that a is an integer array variable and S is a variable array of structures. Designrules
for thelvalue/rvaluechecking
of the C conditionalexpression
~ u ~ ~ i t i o ~ 1?: eexpr x ~2.r Compare your rules with the behaviorof the GNU C com-
piler, with and without-arm i and -pedanticflags.
One of the scope rules in Section 6.2.3.2 says ‘The scope of a value is the smallest scope of any pointer it contains or infinite if it does not contain pointers.’ What if the value contains pointers withinco~mensurablescopes? In C, why is it that assignment (=)is defined for structs and equality (==)is not? (Hint: this is a compiler construction question.) Given the C structure struct example2 { int memberl; double member2; int member3;
1
wherean int is 4 bytesand a double is 8 bytes,andan int mustbe4-byte aligned, and a double must be 8-byte aligned. (a) What is the size and alignment of st t ruc examplea? (b) What happensif fields member2 and member3 are interchanged? (c) Couldanoptimizingcompilerconsiderreorganizingrecordfieldstoachieve ~ n i m asize? l Section 6.2.6 suggests that the compiler can compute the offset of z e r o t ~ - e Z e ~ e ~ t from the first element of the may itself. Let us call this offset z e ~ a t ~ - u ~ s eGive t. the formula for z e r ~ t ~ - u ~(A) s e tand the f o ~ u l for a the address of A[i ,.,.,i,] usin base ( A ) and z e ~ u t ~ - (~A~) instead ~ e t of z e r o t ~ - e Z e ~ e(A). ~t .l1 Refer to Section 6.2.7 on set types. Discuss the implementation of set union and set intersection when i ~ p l e m ~ n t i nagset as a linked list. Consider the following classes (presented ina Java-like syntax): abstract class Shape{ boolean IsShapeO { return true; } boolean IsRectangleO { return false; } boolean IsSquareO { return false; } abstract doubleSurfaceAreaO;
{ class Rectangle extends Shape ouble Surf aceArea () {
1
1
....
boolean IsRectangleO { return true; }
class Square extends Rectangle { boolean IsSquareO { return true; }
1
Give the method tablesof the classes Rectangle and Square.
6.13 M e n a language supports polymo~hism,as described in Section 6.2.9.3, it is sometimes necessary to examine the actual type of the object at hand. The example of Exercise 6.12 has methods like 1 ssquare that enable the user to do this, Some languages have a built-in operator for this. For instance, Java has theins tanceof operator, The expression A instanceof C is a Booleanexpressionyielding true if A indicates an object that is an instance of class C. Design an implementation for this operator. Refer to Figures 6.18 and 6.19 concerning multiple inheritance. Given an object e of class E, give the compiled codefor the calls e .m1 ( ) ,e .m3 () ,and e .m4 () . Refer to Figures 6.20 and 6.21 concerning dependent multiple inheritance. Suppose
method m5 in class E is defined as void m 5 0 { el = dl +- al;
1
where all fields have type int. Give the compiledC code for m5. Explainwhythe 'caller saves'schemeusuallyrequiresfewerregistersavesand restores at run time than 'callee saves'. Given the GNUC routine: void A(int A-par) { void B(void) { printf ("B called, A-par = %do, A-par)
1
;
void C (int i) { if (i == 0 ) ]B(); else C(i-1);
1
1
C(5) ;
raw the stack that results from the call A (3 ) . (b) How does thefinal call of B ( ) access the parameter &-par of A? (c) Repeat parts(a)and (b) for the routine void A(int A-par) { void C (int i) { void B(void) { printf ('!B called, A-par = %do, A-par)
1
1
1
;
if (i == 0) B(); else C(i-l);
C(5) ;
(F.E.J. usem man Aretz, mid-1960s) Write a program that reads an array of integers o ~ an t array, linked t e ~ i n a t e dby a zero, and prints its middle element, ~ i t ~ using list, or otherwise allocated data. Hint: Read the elements of the array recursively, on each level of recursion prepare a routine element (int ) i ,which returns the ith element, and pass this routine to the next level.
Why is it that the closure must becopied to the activation record in the invocationof a curried routine? If the closure is allocated in an extensible m a y on the heap, it could itselfbe turned into an activation record, couldit not? tudy the coroutine mechanism of Simula 6'7 (Birtwistle et al., 1975), and design detach and resume commands. inte~ediatecode sequences for the new,
6.21 Routine i~vocation imp~ementation project. (a) Study the routine call and routine exit inst~ctionsof a machine and assembler available to you. (b) Design an activation recordlayout, simple parameter passing and routine call and routine exit sequences for recursive routines, and test your design by running the translation (manual or otherwise) of, for example, the mutually recursive routines shown in Figure 6.48. Here showSP ( ) is an ad-hoc routine which allows you to monitor the progressby showing the valueof the stack pointer in some way;you will probably have to improvise. Startby calling A (1 0 ) . void A(int i) { showSP ( "A.SP .entry" ; if (i > 0 ) {B(i-l);} showSP ("A.SP. exit" );
1
void B(int i) { showSP ( !!B.SP.entry") ; if (i 0 ) {A(i-l);} showSP ( IIB. SP exit ;
1
Figure 6.48 Mutually recursive test routines.
(c) Design a format for routines as parameters and test your design with the code from Figure 6.49; start by calling A (l0 , B) . void A(int i, void C()) { showSP ( "A.SP .entry"1 ; if (i > 0) {C(i-l, A);} showSP("A.SP.exit");
1
void B(int i, void C O ) { showSP ( IfB. SP .entry") ; if (i > 0) {C(i-l, B);} showSP ( !!B.SP .exit ) ;
1
Figure 6.49 Routines for testing routines as parameters.
esign a formatfor labels as p~ametersand test your design with thecode from Figure 6.50. A (int i, label L) { showSP ( "A.SP .entry") ; if (i 0) {B(i-l, exit); return;} exit: showSP(lfA.SP.exitl') ; goto L;
void
1
void B(int i, label L) { showSP ( IIB. SP .entry") ; if (i > 0 ) {A(i-x, exit); return;} exit: showSP(IlB.SP.exit"); goto L;
1
Figure 6.50 Routines for testing labels as parameters.
What code should be generated for the Boolean assignment b := x > y,in which b is a Boolean variable with the representation 'true' = l and 'false' = 0, and x > y translates to Compare-greater x, y, which leaves the result in the c o ~ ~ ~register? t i o ~ The discussion of case statements in Section 6.4.1.2 mentions using a balanced tree for the case labels as a possible i~plementationfor the case statement. Why does the tree have to be balanced? Discuss the translationof a case statement by means of a hash table. A repeat statement allows the programmer to execute a sequence of statements an indete~inatenumber of times, until a ~ o o Z ee~~~ ~r e s s isi ofulfilled. ~ Give a translation of the statement REPEAT .state~entsequence UNTIL Boolean e ~ ~ r e s s i ~END ?n
REPEAT;
to intemediate code. Note that thes t ~ t e ~ e ~ ~ is executed s e ~at~least ~ ~ once, c e The C languagehas a breakand a continuestatement.Thebreakstatementterminates the closest enclosing loop, and the continue statement proceeds with the next iteration. In fact, these are both jumps. In the code generation schemes of Section 6.4.1 3, where would these go to? In other words, where should the compiler place the break-label, and where the continue-label? Why is the C while loop of Figure 6.42 not exactly equivalent to thefor-loop of Fig6.41? Hint: consider the effectof a continue statement inside the body.
The optimization explained in the last subsection of Section 6.4.1.3 replaces a loop with a step size of 1 by a loop with a step size of 4, thus apparently introducing all the complexity of the code of Figure 6.40. Devise a simpler code scheme for such loops. Make sure it is still ove~ow-resistant. In Section 6.4.2.2 we claim that pushing the last parameter first onto a stack is a technique suitable for languages that ailow a variable number of para~etersto be passed in a routine call. Explain. Refer to the subsection on signal handlers in Section 6.4.3.2 on handling run-time errors. Consider the following code fragment in a language that allows a exception handler to resumeat the point where theerror occurred: X :=
A
Y := A
/ B; / B;
An optimizing compiler wants to transform this code into: X := A / B; Y :=
x;
Explain why, if an exception handler is defined for the code fragment, this optimization is incorrect. .32 A language designer is designing a p r o g r a ~ i n glanguagethat
features, a m o ~ ~ other things, genesic routines with generic type parameters. Helshe considers adding generic routines as generic p~ameters(non-generic routines can already be passed as normal parameters in the language) and comes to you, a compiler designer, for advice. Evaluate the proposed addition froma compiler const~ctionpoint of view. i of i ~~~ e ~ t~~t ui vn ~ e ~ i ~~~u~ ~ l ee ~ e ~ Study t ~ t Shesidan’s i ~ n : paper (1959) on the IBM F O R T R compiler, ~~ and identify ands u ~ m ~the z etechniques used.
3 ~
unctional languages are based on the idea that a program is a function with one input parameter,itsinput,andoneresult,itsoutput.Running a program is seen as the application of a function to the input, resulting in the output. This computatio~almodel of functionapplication,simple as it is, is veryexpressiveandbuildsupontheresultsof mathematical research into computable functions. The at he ma tical worldview of functionallanguagesshowsupin a number of places. For example, syntactical constructs have been included in the language to ease the s~ecificationof sets and recursive functions. More important, however, is the idea that a p r o g r a ~ e should r not be bothered with implementation details of a certain function, but should rather concentrate on specifying its input-output relation. In particular, a ~rogrammer shouldspecify w h ~ tto compute,ratherthan how, ~ h e ~ and e , when. ~onsequently,there are no ; -operators, for-loops, or other statements that specify how the co~putationproceeds from one statementtotheother;morespecifically,thereare no statements at all. Also, there is no assignment operator specifying a variable or memory location where a computation mustbe stored. The difference in approach to computing is illustrated by the following example that lists progr~msections for the factorial function f ac in the functional language Haskell (Figure 7.1) and the imperative language C (Figure 7.2).
fac 0 = 1 f a c n = n * f a c (n-1) Figure 7.1 Functional specification of factorial in Haskell,
int fac (int n) { int product =
1;
while (n > 0 ) { product *= n; n--;
1
1
return
product;
Figure 7.2 lrnperative definitionof factorial inC.
The functional specification of f ac is a good example of how recursion is used to tackle problems. The function is defined by two equations. The first equation states that O! = 1, while the second equation states that n != nx(n -l)!. Unfolding the recursion then leads to n != nx(n--l)x(n--2)x,..x1. This definition of factorial is also the basis for the C code, but this is not directly visible because of the interleaving of control statements and computation rules. The C code uses a while loop to avoid expensive recursive function calls; the i n t e ~ e d i a t eresults produced by the loop are accumulated in the local variable product. Efficiency considerations often lead imperative languagep r o g r a ~ e r to s compromise on the readability of their code. Functional language programmers, on the other hand, put the burden of finding an efficient computation order on the compiler. Fortunately, cornpiler technology for functional languages has matured enough to handle cases like factorial: the techniques discussed in the remainder of this chapter are capableof generating C code from the Haskell specification thatis equivalent in efficiency. Functional languages raise the level of abstraction at which programmers should be concernedaboutperformance.Functional p r o g r a ~ e r smust still beconcernedabout O(n’) versus 0 (nlogn) algorithms, but need no longer worry about reducing the number calls, managing memory efficiently, and other low-level programing aspects. A consequence is that the interpreter or compiler must handle these issues and must handle them efficiently. The first functional language compilers were not up to this task, and generated very slow code; a slow-down of a factor of 100 compared to C was no exception. Fortunately, today’s compilers are capable of generating code quality that matches, and sometimes even exceeds, thatof plain C for a considerable class of programs. On the one hand, compilers for functional languages are more complicated than their imperative counterparts. The front-end has to handle additional syntactic constructs capturing concise mathematical notations, whereas the back-end has to handle the higher level of abstraction.Ontheotherhand,functionallanguagescompilersaresimplerthantheir imperative counterparts. The simple and regular syntax leads to a small and clean frontend without the many exceptional cases typically found in imperative compilers, and the functionalparadigmeasesmany of theadvancedanalysesemployed byan o ~ t i m i ~ i n g back-end, since, for example, no memory aliasing can occur.
efore discussing the additional techniques employed in a functional language compiler, we present a short tour of the p r o g r ~ n glanguage Haskell (Peyton Jones and Hughes, 1999) to highlight the aspects of functional languages that require special care. Haskellwasdefined byan intern at ion^ standardcommittee.It is representative of the ‘purelyfunctionallazylanguages’andcontainsmost features fromthefunctional languages developed so far.
The following sections emphasize the aspectsof functional languages thatraise the level of abstraction above that of imperative languages. Some aspects, such as pattern matching, have a modest impact on the compiler and are typically handled in the lexical analyzer and parser,Thisso-called s~ntacticsugar includes:the offside rule, list notation, list co~prehension,andpatternmatching.Otheraspects of functionallanguages like lazy evaluation cannot even be handled completely at compile time and require conside~a~le run-time support. m e syntax of function application is explained below.
Thesyntax for functionapplication is veryconcise,butpossibly misleading for the unwary. Applying a function f to the two arguments 1.1and 13 is written in Haskell as: f 11 1 3
Note that there are no brackets around the arguments to f , nor are the ~ g u m e n t sseparated by a comma.Thereason for simply juxtaposing function and arguments is that it allows currying to be expressed naturally, as shown in Section 7.1.7. so g n+l is Function applicatio~bindsstrongerthananyoperator, parsed as ( g n) + 1 rather than g (n+l ) as the layout may sugges Also, it is left-associative, so g g n is parsed as ( g g ) m,rather than as g ( g n),and parentheses are required to obtain the other interpretation.
The layout of a Haskell program matters to parsing. Consider the following definition d i v i d e , which handles divisionsby zero: divide x 0 = i n f d i v i d e x y = x/y
of
The definitionconsists of twoequations.An *on consists of a left-hand side, followed by the = token,followed by theright-ha . Notethat there is no explicit token to denote the end of each equation. In many languages a line break is treated as ordinary white space, which would allow incorrect parses suchas: (divide x 0 = infdivide ( y = x/y)
x)
Ontheotherhand,treatingevery line breakas a equation t e ~ i n a t o ris ’ since a long expression may need to span several lines for clarity. The trols the bounding box of an expression. Everything below and to the ri is defined to be part of the expression that makes up the right-hand side of the equation. The right-hand side teminates justbefore the first token that is ‘offside’ -to the left -of the = position. The offside rule has to be applied recursively when handling nested equations, as in: f a c n = i f ( n == 0 ) t h e n 1 e l s e prod n ( n - 1 ) where prodace n = i f ( n == 0 ) t h e n acc e l s ep r o d( a c c * n )( n - 1 )
The offside rule can be handled conveniently in the lexical analyzer by having it insert explicit end-of-equation tokens (the ;in Haskell), as follows. The lexical analyzer maintains a stack of offside markers. When the lexical analyzer detects a token with assoc’ offside semantics such as =, it pushes the current character position onto the marker Upon detecting a line break, the lexical analyzer skips all white space and records the character position of the next token. It compares the position against the top of the marker stack. If the new position is less than the marker, the lexical analyzer pops the marker and returnstheend-of-equationtoken.Itcontinuespoppingmarkersandreturningend-ofequation tokens until the stackis empty or the position is greater than thetop marker.
The list is a powerful data structure that has been part of functional languages from day one with the in~oductionof the list processing language LISP (McCarthy, 1960). Since lists are used so frequently, Haskell includes special notation for them.Theempty list, also known as ‘Nil’, is denoted by []. Non-empty lists are written as an opening bracket f , the elements separated by c o m a s , and a closing bracket 1,as shown in the examples below:
Strings are actually just lists of characters, so redt1is a convenient notation for [11 r l l , It e I t , d It I . The last exmple shows a shorthand notation named
forthelist [l, 2 , 3 , 4 , 5 , 6, 7 , 8 , 9 , 101. Theelementsofalistcanbeof any type -integers, strings, user-defined types, etc. -but all elements of a given list must be of the same type; the polymo~hictyping of Haskell, discussed in Section 7.1.5, does not allow lists to contain elements of different types. Listscan also beconstructedelement by elementusingthe : infixoperator.This operator combines an element and a list into a new list. For example, the list E 1, 2 , 3 I is equivalent to the expression (1. : ( 2
: (3 :
[l)))
S a secondexample,considerthe sequence [n .. m] dynamically:
range function,which
r a n g e n m = i f n>m t h e n [ l e l s e( n : range
constructs thearithmetic
(n+l) m )
at closely matches mathematical set notation. For example, the setof squares of all odd numbers up to 100, defined mathematically by S = ( n 2I ne { l ,...,lOO}
A
odd n }
can be expressed in Haskellas follows: S
= [nA2
1
n <- [l. .l001 , odd nl
where the - symbolizes the element sign ‘E ’rather than an arrow. The list comprehension is read as ‘nsquare, suchthat n is an element of 1 to 300 and n is odd’. List c o ~ ~ r e h e n s i oisn convenient for developing programs that can be expressed in terms of sets. Note, however, that list comprehension, as the name suggests, generates lists ratherthansets:ordering is importantandelementsmayoccurmultipletimesin list co~prehensions. It is especiallyconvenienttousewhengenerating new listsfromold ones.Forexample,thequicksortalgorithmcan be specifiedconciselyusing list ~ o ~ ~ r e h e n s i as o nfollows: s qsort qsort
= [l E1 (x:xs) = q s o r t [ y ++ [XI
++
qsort
y <- xs, y
[y
1
x]
y <- xs , y >= x]
re the pivot of the quicksort is taken to be the head of the input list; it is isolated from list by using pattern matching on the infix operator :(see below). The list comprehension for all elements that go left of the pivot x iterates over the tail of the input list xs filtering out all elements less than x, The second list comprehension filters out all elements greater or equal to x, Both are then sorted recursively and concatenated with the the concatenating function++ . List comprehension is considered syntactic sugar since it can be t r a n s f o ~ e drelatively easily into a simpler expression. This transfo~ationwill be discussed in Section 7.4.3.
t way to express recursive functions. In fact, we have already met a case of pattern matching, the factorial function in Figure 7.1 reproduced here: fac 0 = l fac n = n * fac (n-l)
The first equation defines the base case of the recursive specification, which matches the argument with pattern 0 ; the match will only succeed if the actual parameter has the value 0. The second equation covers all arguments by matching against the patternn;this match will always succeed. Another example is the length function = o length E ] = 1 (x:xs) length
+
length xs
which computes the length of a list by pattern matching on the empty list [] and on the infix list constructor :. In general a function is specified by several equations containing patternsat ~ g u m e n t positions in the left-hand side. A pattern can be a constant such as 0 and [1,a variable such as n or S , or a constructor whose elements are themselves patterns such as (x:xs ) . Function equations are matched from top to bottom; the patterns in them are matched from left to right. Function definitions based on pattern matching can be translated easilyinto equivalent definitions based on if-then-else constructs. For example, the fac function without pattern match can be definedas fac n = if (n == 0) then 1 else n * fac (n-1)
The length function requires some more work since the pattern in the second equation (x:ss) introduces two identifiers that are subsequently used inthe right-hand side of the equation. We use the let-expression of Haskell to introduce the localdefinitions we need: length list = if (list == [ l ) then 0 else let x = head list xs = tail list in 1 + length xs
Further technical detailsof translating pattern matching will be presented in Section 7.4.2.
es, is a strongly typed language. This means that for each expression or definitionwe can check at compile time whether or not it obeys the typing rules of the language. Haskell supports the usual monomorphic types consisting of basic types, for example Char and Int,and user-defined structured types, for example recordsandunions.Inaddition,Haskellsupports polymor hie types. An expression is
said to be polymorphic if it ‘has many types’. For example, the empty list [1 has many types: list of characters, list of numbers, and an infinite number of others. The exact type depends on the context in which E 1 is used. An impo~antsource of polymorphism is the class of list handling functions that operate on any type of list. Consider the length function again: = o length [ l length (x:xs) = l
+
length s s
The function length can be applied to any type of list since the individual list elements are not used; the elementx patte~-matchedin 1 ength’ssecond equation does not appear in the co~espondingright-handside.Typedeclarations,whichareoptionalinHaskell, consist of an identifier, a type declaration sign ‘::’and a type specification. The type of length is: length
::
[a]
-3
Int
where a is a type variable that stands for an arbitrary type. Thus, length takes an arbitrary list ([a]) as input and returns (->) an integer (Int) as its output. We can derive instances of a polymorphic type by replacing type variables with specific types. For example, by replacing a with Int or [Char7 we see that [Intl -> Int [Char]I -> Int
are instances of the type [a1 -> Int.These instances typically appear when length is used in larger expressions and the context imposes additional constraints of the type of length,as for example in length [l, 2, 31 + length [rrredtr, rlyellowli , llgreenll]
The main advantage of polymorphic typing is that functions and data structures can be reused for any desired type instance. The same definition of length can be used to cornpute the length of a list of integers and a list of strings. ~anguageswithout a polymorphic type system would require the specification of two length functions: one that operates on lists of integers and another that handles lists of strings. C allows the programmer to resort to pointers and write a single length function that operates on generic lists, where elementsarepresent as void * pointers.Thehandling of pointers,however, is errorprone and the useof the void * device discards valuable typeinfo~ation. Type checking in polymorphic languages like Haskell is more difficult than in traditional monomorphic type systems. In p ~ i c u l a r the , equality of types involving type vario will n , be explained in Section’7.3. ables is harder to determine and requires~ n i ~ ~ a t ias
By definition, a function in Haskell defines a fixed relation between inputs and output: whenever a function f is applied to the argument value arg it will produce the same output no matter what the overall state of the computation is. Haskell, like any other pure functional language, is said to be ‘referentially transparent’ or ‘side-effect free.’ This property does not hold for imperative languages, where assignments to global variables and
through pointers may cause two function calls f a r g to yield different results, even when the argument valuea r g is the same in both calls. The good thing about referential transparency is that it simplifies program analysis and transformation since a closed expression always denotes the same value independent of the context, and may be moved around freely. A closed expression is an expression that contains no references to external names other than global identifiers.The bad thing is that it preventstheprogrammerfromwriting space-efficient programsthatuse in-place updates. For example, adding 1 to all elements of a list in Haskell requires a complete new list to be constructed: = [l add-one I I add-one (x:xs) = x+l
:
add-one xs
Each application of the list constructor ‘:’allocates a fresh node. In an imperative language, we can update the input list in-place, in which case we require the caller to create a copy of the input beforehand if the original is still needed later. In simple cases the functional language compiler can deterrnine that the input list is indeed no longer neededafter the call to add-one, so it can generate in-place updates too. Unfortunately, whenever the input list is passed as part of a larger data structure the compiler probably cannot infer the last-use requirement, and must conservatively allocate a new list. Functional programs are notoriousfor allocating huge amountsof memory. Therefore garbage collection is a very important part of the run-time system. It allows the run-time system to present the user with an unlimited ~ o u noft memory, but requires the compiler to insert the necessary hooks in the code, as discussed in Section 5.2.2.1. Given the fast rate at which functional programs typically allocate and, implicitly, deallocate memory, garbage collection techniques that only examinethe live data, like two-space copying (Section 5.2.5), perform best.
A fundamental concept of modern functional languages is that functions are ‘first-class citizens’: functions may be passed as ~guments,returned as results, and kept indata structures, just like ordinary data values such as integers. A function that takes a function as an ~ g u m e n tor , delivers one asa result, is referred to as a higher-or~erf~nction.Imperative languagesbarelysupporthigher-orderfunctions:functionsmayperhapsbepassedas parameters, but it is impossible to create ‘new’ functionsat run time. The ability to construct new functions out of existing ones provides great abstractive power to the user, and is common in mathematics. The differential operator, for example, is a higher-order function which takesa function as argument and returns anotherfunction, its derivative, as the result:
This mathematical definition can be directly expressed in Haskell as follows: diff f = fwhere f- X = (f (x+h) - f X) / h = 0.0001 h
Note that this definitionof d i ff crudely approximates the true derivativesince it takes the limit by fixing h to a small constant; a better definition that employs a sequence of ever decreasing values will be presented in Section 7.1.8. The important aspect of the example is that d i f f returns as its result a 'new' function that is composed out of already existing functions ( f , +,-,and />.The expression d i ff s q u a r e approximates the function 2x and can be used in other expressions. For example, the expression (diff square) o
yields an approximation to the derivative of s q u a r e in point 0: 0 .0 0 0 1. Since d i ff s q u a r e yields a function, it can be differentiated again: the expression (diff (diff square)) 0
yields 2 .0 as an approximation to the second derivativeof s q u a r e in point 0, Another way to create new functions is to partially apply an existing function to a number of arguments that is less than the arity of the function. This technique is known as ,after the logician H.B. Curry, and can be used to specialize functions by fixing some parameter values. A s an example, we will look at the differential operator again. Instead of constructing a new function that truly operates as the derivative, we candefine a function d e r i v that computes the derivative off in point x: deriv f x = (f (x+h) - f x) / h where h = 0.0001
It is important to note that there is no semantic difference between the definitions unary function d i ff and the binary functiond e r i v , since the equation
of the
deriv f x == (diff f ) x
holds for any function f and point x. Functional languages therefore consider a function with YE arguments as syntactic sugar for a chain of YE unary functions processing the arguments one by one: This view allows currying since function calls consisting of functions applied to too few arguments like d e r i v s q u a r e are not semantic errors, but can be usefully interpreted as shorthand notation for some new function, In short, an YE-ary function applied to m arguments denotes an (YE-m)-aryfunction. In this view, applying an rz-ary function to YE arguments leads to a 0-ary function, which may be passed aroundlike any other function. As a consequence it is not necessary to compute the value of a 0-aryfunction when simplypassing it on to another function:it may be passed unevaluated as long as the called function is prepared to carry out the evaluation when needing the value. Lazy evaluation, which is to
be discussed next, uses exactly this approach and does not evaluate an expression (a 0-ary function) until its valueis needed. Currying complicates compiler construction since we need additional run-time support to construct and evaluate curried functions. This will be discussed in detail in Section "7.5.
Functional languages lack explicit control flow and are based solely on expressions constructedout of valuesandfunctionapplications.Evaluating a non-trivialexpression amounts to evaluating some subexpressions and combining the partial results intothe final answer by applyinga built-in operator or user-defined function. Itis up to the compiler and run-time system to determine which subexpressions to evaluatein which order. Imperative languages, on the contrary, prescribe toa large extent how and when expressions should be evaluated. For example, expressionsat argument positions must be evaluated before calling azy ~ ~ a ~ u arelaxes t i o ~these constraints by specifying that a subexpression will only be evaluated when its value is needed for the progress of the computation, This approachofferstheprogrammer a simplewayofspecifyinginfinite data structures, as shown in the following example: main = take 100 [l. .l
The function take is a standard Haskell function that takes the firstn elements from a list. In this case take operates on the infinite list of natural numbers generated by the [l...] expression. Lazy evaluation does not start off by evaluating I 1 ..I ,as would happen in a comparable C program, but rather invokes take since take is the function at the top of the expression tree for main. The call of take proceeds to evaluate its second argument step by step until the firstl00 list elements are known. of With lazy evaluation we can improve on our earlier derivation example. Instead approximating the derivative with a small fixed value for h, we can take the limit of a series of approximations, deriv f x = lim [(f (x+h) - f x)/h 1 h. <- [1/2^n I n<-[l..]]l where lim (a:b:lst) = if abs(a/b - 1) c: eps then b else lim (b:lst) eps = 1.0e-6
The local function 1 im takes the limit of the list of approximated derivatives by comparing the relative difference between two successive values untilit becomes less than epS. Note that the numberof approximations is unknown in advance and depends on the rate at which function f converges. The relaxed handling of infinite lists under lazy evaluation allows the last definition of derivr to closely resemble the mathematical definition. Another example of the usefulness of lazy evaluationis in constructing parsers basedon attribute grammars. An att~bute grammar translates into a set of functions, one for each production rule, specifying how each attribute is to be computed. These functions can be used for parsing immediately: lazyevaluationautomaticallyevaluatestheattributes in therightorder. See Johnsson
(198'7) for a detailed description. The expressiveness of lazy evaluation burdens the functional language implementation, which must support delaying and resuming computations. Section7.5 discusses graph reduction,which is thebasis of modernimplementations of functionallanguages.In essence a function application translates into a heap-allocated data structure containing the function and its argument. When needed, such a suspended function can be activated by taking the code pointer and calling it with the arguments stored in the heap.
Theshorttour of Haskellhashighlightedthefunctionallanguage features thatrequire additional work in the compiler. For reference, Figure '7.3 lists which compiler phase handleswhichaspectofHaskell.Notethat the handling of themost difficult aspects hi~her-orderfunctions and lazy evaluation -is deferred to the run-time system. Consequently, the run-time system of a functional language is considerably larger than that of most imperative languages and includes routines for garbage collection (already discussed in Chapter 5), dynamic function creatio~invocation,and suspensio~resumptionof lazy computations.
C o ~ ~ as^ ~ ~ e r~
n aspect ~
~
Lexical analyzer
Offside rule
Parser
List notation List comprehension Pattern matching
Context handling
P o l y m o ~ ~type i c checking
Run-time system
Referential transparency Higher-order functions Lazy evaluation
a
~
Figure 7.3 Overview of handling functional language aspects.
The general structure of a functional language compiler is given in Figure 7.4. The high-level source language is simplified through several transfo~ationsinto a small functional core that contains all essential functional aspects but no syntactic sugar. The functional core requires explicit typing of functions to ease code generation, so for languages likeHaskellwheretypedeclarationsarenot mandato~,thecompilerfront-endhasto derive them from the source code. After the initial translation of thehigh-levellanguage into thefunctional core, nu~erousoptimizations are applied to improve code quality. Many of these optimizations
~
High-level language 7.3
~ p t i ~ i z a t i o n7.7 s 7.6 Code
ene era ti on
Figure 7.4 Structure of a functional compiler.
are also present in imperative compilers; examples are constant folding, function in"lining, and common subexpression elimination. In the final step, code generation takes the functionalcoreandtransforms it into thetarget code. C is usuallyselected as thetarget language, since this offers the possibility to leverage imperative compiler technology for free. ANSI C programs are portable across a wide range of hardware platforms and C compilers are good at the complicated taskof generating machine code: they include fancy algorithms for registerallocationandinstructionscheduling. C compilers,however,are not capable of o p t i ~ i ~ i nmemory g references and function calls as much as required by functional programs, so code generation in a functional language compiler still requires some work. The resulting C program is then compiled with an ordinary C compiler and linked with the nun-time system into an executable. The run-time system takes care of handling those aspects of a functional language that cannot be handled at compile time, like lazy evaluation and higher-order functions.
The defi~itionof what constitutes the functional core requires some thought. On the one hand, it must be high-level enough to serveas an easy target for the front-end that compiles the syntactic sugar away. On the other hand, it must be low-level enough to allow concise descriptions of opti~izations,which are often expressed as a case analysis of all core constructs. A third consideration is that when using C as the target language, the functional core does not need to beas low-level as when generating assembly code. We will useas our functional core a subset of Haskell that includesthe following constructs:
basic data types, includingint, char, and bool; (user-defined) structured data types; typed non-nesting functions; local bindings as partof let-expressions; expressions consisting of identifiers, arithmetic operators, if-then-else compounds, and function applications; higher-order functions; lazy evaluation semantics. These constructs, except for the last two,canbeeasilymappedonto example, the factorial function without pattern match
C constructs. For
f a c :: i n t -5. i n t fac n = i f ( n == 0 ) t h e n 1 e l s e n * f a c (n-1)
maps into: i n t fac ( i n tn ) { r e t u r n( n == 0 ? 1 : n * f a c ( n - 1 ) ) ;
1
Since most of the functional core can be mapped easily to C, code generation concentrates on making higher-order functions and lazy evaluation explicit by generating calls to the run-time system. The remaining sections of this chapter discuss the various parts of a functional compiler as outlined in Figure 7.4. First we briefly discuss type checking (Section 7.3), followed by the desuga~ngtransformationsthat translate Haskell into thefunctional core (Section 7.4). Next, we discuss the run-time system (Section7 . 3 , which takes care of handling higher-order functions and lazy evaluation by means of an inte~reterknown as the graph reducer. This graph reducer is the target of the code generator described next (Section 7.6); when generating C, difficult functional core constructs like higher-order functions are translated into calls to the run-time system. The most important optimizations performed on the functionalcore that reduce the numberof calls to the run-time systemby performing part of the interpretation at compile time are covered in Section 7.7. Finally, we discuss some advanced graph manipulation techniques aimedat reducing memory consumption (Section 7.8). Such reduction also improves the speed of the compiled program, due to better cache behavior and reduced garbagecollector activity. Roadrnap
7.3 Polymorphic type checking '7.4 Desugaring 7.5 Graph reduction 7.6 Code generation for functional core programs 7.7 Optimizing the functional core 7.8 Advanced graph manipulation
Thepolymorphictypesystem of Haskellmakes typechecking more difficult than for imperative (monomol~hic)languages as discussed in Chapter6. The simple notion of type equivalence needs to be redefined when polymorphic types like Ea] and [b -> c I are compared.Anothercomplication is thattypedeclarationsinHaskellare optional. Although good programming practice dictates that users explicitly provide the types of their functions, the Haskell type checker must be capable of deriving type information solely from the functiondefinition. It is not immediately obvious how todo this, since it is a property of a polymorphic function that it has many types. For example, consider the higher-order function map, which applies a function f to all elements of a list: = [l map f [1 map f ( x : x s ) = f x : map f xs
From the ] pattern in the first equation we derive that the second a r g u ~ e n of t map is a list. Likewise the result of map is a list too. Note that both list types are not related, so the first equation constrains the type to map :: a
-2
[bl
-> [cl
The second equation contains the subexpression f x,which reveals that f is a function. The general type of a function is a -> p. We know, however, that x is an element of the argument list withtype [b] andthat f x is part of map’s result list withtype [ c ]. Therefore, we conclude that the type of f is b -> c. The type of map is now constrained to map
::
(b
->
c)
->
[bl
->
[cl
The bracketsaround b -> c arenecessary since the -> operator is right-associative, which is consistent with currying. We cannot refine the type further without compromising on thegenerality of map. Forexample,narrowingthetypeto (a -> a ) -> [a] -> [ a ] ) excludes the use of functions like ord that transform values from one type into another (Char to Int). When inferring types, the type checker should derive from the constraints the most general instance of the type, which represents the complete set of types that the functioncovers.
ic Type checking in Haskell boils down to handling function application, which is the essential building block of expressions, and may involve a polymorphic function as well as a poly~orphicargument. When encountering an anonymous function applicationlike fun expr
the type checker introduces type variablesso it can express typing constraints: fun expr
:: a -> b ::
c
In imperative languages function application is dealt with by checking whether or not the type of anexpression at some argumentposition (c) is ‘equal’ to thetype of the corresponding formal parameter (a), In Haskell it is not required that types a and c are equal in that they represent the same set of types, but only that they can be unified to a subset that is common to both. Consider, for example, the type declarations of functions map and leng h: t map :: ( a -> b) -> [a] -> [bl
length
::
->
[cl
Int
In the expression map length
the type checker must unify the type of length,which is [ c ] -> Int,with the type of map’sfirst argument, a -> b. This results in a becoming the list [c] and b becoming Int,which allows the type checker to conclude that the type instance of map in map length is map
::
->
([c]
and so the type of map map length
Int)
-32
[[c]]
->
[Int]
length itself is
::
[[cl]
->
[Intl
This result canbe used to determine that the expression map length [rtredff, llyellowll, rlgreenll]
is well-typed: the types E [ cI ] and [[Char]]are unifiable, with c = Char. An expression like map length [1 .. 1 0 0 ],however, is invalid:theunification of [[c13 and [Int3 fails, because it would require Ic ] and Int to be unified, whichis impossible because Int is not a list. The details of unificationaredeferredtoChapter 8, since unification is thebasic mechanism used to implement logic programminglanguages. Forour type-chec~ngpurpose it sufficestonoticethattheunification of two polymo~hictypes essentially results in a substitutionof the type variables involved. These substitutionsareeasilyaccommodatedinthetype-tablestructuresuggestedinSection 6.1.2, When type checking an n-ary function f , rz fresh types ti are added, one for each ument, and the typeof f is set to f :: t, -> * . -> t, -> t, ai :: ti *
A function definition may comprise multiple equations, which are to be type checked one after the other. Each equation may contain some patterns at argument positions, A pattern is handled by introducing fresh types for its components and unifying the top-level type with the corresponding argument type t 1, For example, the pattern ( x:xs ) requires two ad~itionalentries in the type table, one for x,say t,, and another for xs,say t,,. The type of (x :xs ) is then [tx],which must be unified with type ti. Type check in^ then proceeds bottom up through the AST for the r i ~ h t - ~ aside n ~ of theequation.Constantssimplypropagatetheir co~esponding type,whichmaybe
polymorphic. This is, for example, the case with the empty list [] . We create a fresh a new instance of the type [a] .It is important that each entry in the type table to hold occurrence of [] gets its own typeso the type checker can handle caseslike length (1:[ I ) + length (‘a/:[])
where the first occurrence of [] has type [I n t 3 and the second (unrelated) occurrence is of type [Char] .Global identifiers like l e n g t h must also be assigned a fresh type on each occurrence, since theymay be usedindifferentcontexts.Intheexample above l e n g t h has types [ I n t ] -> I n t and [Char] - I n t , respectively. For local identifiers like parameter names, on the other hand, the type checker must use a single entry in the typetable to catch type errorslike f xs = length xs + xs
Type checking fails since the two types for xs - [a] and I n t -cannot be unified. The i n t ~ ~ e d i anodes te in the AST are essentiallyall function applications, and requirea unification as outlined above.
The next step after type checking is to remove the syntactic sugar from a Haskell program and transform it into its functional-core equivalent. In this section we focus on translating ,list c o ~ ~ r e h e n s iand o ~ ,nes~ed~nctions to core constructs. lists, pa~tern mutchin~
List notations contain three forms of syntactic sugar: the :,I and .infix operators. The operator :constructs a node with three fields: a type tag Cons, an element, and a list. So theform x : xs is transformedto (Cons x xs ) ,andthe list [1 I 2 I 3 1 is transformed to (Cons
1
(Cons
2
(Cons 3 [ l ) ) )
Lists like [1 ..lo 0 1 and [1 ..] are usually translated to calls of library functions that express these lists in terms of :and []. An example of such a function is the function range, already encountered in Section7.1.2: range n m= if n>m then [ l else (n : range (n+l) m)
Pattern matching is a notational convenience for expressing recursive functions that contain several cases distinguished by patterns occurring at parameter positions. The general layout of such a function is shown in Figure 7.5. The n-ary function fun is specified as m equations with patterns cons~ainingthe arguments, The equations are matched from top to bottom; the patterns are matchedfrom left to right. A s exemplified by the function t a k e
fun pl,l ... Pl,n = exPT1 fun p2,1 ... P2,n = expr2
Figure 7.5 The generallayout of a function using pattern matching.
take
::
I n t -> [a1
- S
la1
t a k e 0 xs = [l take n E 1 = [l t a k e n ( x : x s ) = x : t a k e (n-1)x s
which returns thefirst n elements of a list, three types of patterns exist:
-constant patterns, such as 0 and
1. ], -variable patterns, like n and xs,which match any argument, and
-constructor patterns, like
(x:xs ) ,which is really syntactic sugar for (Cons x xs ) ,
A constructor pattern matches a specific case of a structured type. A constructor may contain several fields, each being a pattern itself. Examples are x :(y :s) ,which matches any list of length two or more, and x :(y :[3 ) ,which matches any list of exactly length two. Such nested patterns may impose additional constraints, and introduce variables that maybereferencedintheexpressionpart of theequation. The generalizedpatternmatching function of Figure '7.5is compiled into the code of Figure '7.6. Here eondi specifies the conditions imposed by the patternsfor equation i, and defsi defines all the variables thatoccurinthepatterns pi,1 ... pi,n. Inshort, condi teststhestructureand retrieves the contents if the structure matches. The last line of code is added to catch the case that all patterns fail, which signals an error in the specification of the function; the error routineprints a messageandthenhaltsexecution. The translatedcode for the take function is shown in Figure7.7. Translating constant and variable patterns to functionalcore constructs is easy: a constant yields an equality test, and a variable imposes no constraint at all. Constructor patterns, on the other hand, requireadditional support to provide type informationat run time. We must be able to verify that an argument (a2)matches the constructor specified in the we require the run-time support to provide the pattern (Cons). For simplicity -type-constr function that returns the constructor tag of an arbitrary structured type element, In addition, we must be able to reference the fields in the constructor type. Again we assume that the run-time support will assist us by providing the generic -type-f i eId n function that returns the nth field of any structured type. These two functions are straightforward to implementif the run-time system uniformly encodes structured types as records witha common header anda variable number of fields. The simplistic approach of translating pattern matches away by generating code for
fun al ... a,
+
if (condl) then let defsl in expr, else if (cond2) then let defs2 in expr2
d ~ ) letinexpr, else if ( ~ o ~ then else error "fun: no matching pattern" Figure 7.6 Translation of thepattern-~atchingfunction of Figure 7.5.
take a1 a2 = if (a1 == 0 ) then let xs = a2 in [1 else if (a2 == [ l ) then let n = a1 in [ l else if (-type-constr a2 ==I Cons) then let n = a1 x = -type-field 1 a2 xs = -type-field 2 a2 in x : take (m-l) xs else error "take: no matching pattern" Figure 7.7 Translation of the functiontake.
the individual equations results in non-optimal code. It can be improved in several ways, all of which require the handling of multiple equations at once. For example, recursive functions typically perform a case analysis of all constructors that might occur asa certain argument, as exemplified in Figure 7.7. Since the code has already been type-checked at compile time, any second argument ina call of take is guaranteed to be a list. So the last equation need not verify that the argument matches the constructor pattern, and consequently, the error guard can be omitted too. This simplifies take to the code shown in Figure 7.8. Another optimization concerns the occurrence of the same pattern in multiple equations. Consider the functionl a s t , which returns thelast element in a list:
take a1 a2 = if (a1 == 0) then let xs = a2 in [1 else if (a2 == [ l ) then let n = a1 in [ l else let n = a1 x = type-field 1 a2 xs = -type-field 2 a2 in x : take (n-l) xs I _
Figure 7.8 Optimized translationof the functiontake.
last
::
[a]
-B
a
last [x] = x last (x:xs) = last xs
The first pattern, [x],is syntactic sugar for (Cons S []) ,whereas the second pattern, (x:xs) ,stands for (Cons x x s ) When generating code for each equation separately, both ~quationswillcheckthattheirargumentpointstoa Cons constructor node, The check in the second equation can be optimized away by proper book-keeping, resulting in the translation shown in Figure 7.9. Note that the check for the constructor pattern cannot be optimized awayas it could with take above, since the patterns do not cover all the patterns of the list type; the pattern [] is missing, leaving last [] undefin~d.
last a1 = if (-type-constr a1 == Cons) then let x = -type-field l a1 xs = -type-field 2 a1 in if (xs == [ l ) then X
else last xs else error ftlast:no matching patternff Figure 7.9 Translation of the function last.
An alternative method of implementing pattern matching consists of generating one finite state automaton for all the patterns of a function definition and using it at run time to determine the correct right-hand-side expression (Pettersson,1992).
List com~rehensionscan be ~ansformedaway just after type checking has been performed. For readability the transformation is presented as a source-to-source translation, although in reality it operates on ASTs. Our ~ n n i n gexample will be the following list comprehension, which computes all Pythagorean triangles with sidesless than or equal to n: pyth n = [( a , b , c )
I
a c - [l .. n] , b c - [ a .. nl, c c - Eb .. nl , a*2 + b n 2 == cA2]
Notethatthenumbers b and c arenotdrawnfromthe range [l .. n] ,but from [a .. n] and [b .. n] respectively.Doing so improvestheefficiency of thealgorithm and prevents the generation of equivalent triples like (3 , 4 , 5 ) and (4 , 3 , 5 ) . The restrictions derive fromtherelation a <= b <=Ec, whichholds for Pythagorean triangles. In general, a list comprehension [ e x ~ r e s s ~I o q ~ ~ a l i ~ e,r
*
, q ~ a l i ~ e1r
consists of an expression anda number of qualifiers separated by commas, wherea ~ ~ ~ ier is either a ~ e ~ e ~or~ atfilter. u r A e ~ e r ~ t oisrof the form var - list e x ~ ~ e ~ siti u ~ ; introduces a variable iterating over a list, which may be used in later qualifiers. A filter is a Boolean expression, which constrains the variables generated by earlier qualifiers. The ans sf or mat ion of list comprehensions works by processing the qualifiers from left to right one at a time. This approach naturally leads to a recursive scheme as presented in Figure 7.10. Transformation rule (1) covers the base case where no more qualifiers are present in the list comprehension. The result is a single element list holding the expression expr.
The filter qualifier is handled in transformation rule (2), where F stands for the filter and Q for the remaining sequence of qualifiers. The filter is transformed into a conditional F holds,wecancomputetheremainder of the list expression. If theconditionin comprehensio~by recursively invoking the ans sf or mat ion scheme T; otherwise the computation is t e ~ i n a t e dby returning the empty list. The generator qualifier e - L is covered in rule (3). The generator produces zero or more elements e drawn from a list L. We must generate code to iterate over all elements e of L, compute the remainderQ of the list comprehension for each value of e, and concatenate the possibly empty -result lists into a single list. The key idea is to generate a nested function ( f a ) that takes element e and produces the list of values that Q (the remainder of the list comprehension) can assume for that element e. This function f is then called with each element of L as an argument and the resulting lists are concatenated, yielding the desired result. We rely on the scoping rules of Haskell to have all external references to e
-
l
i
~
Figure 7.10 Translation scheme for list comprehensions.
present in 'T{ [expr I Q ] } ' refer to the argument e of our new function f Q . To avoid name clashes the function name f Q must be a (new) name that does not occur in the list comprehension, This leaves us with the problem of calling a given function for all elements in a list and concatenating the elements of the resulting lists into one list, We cannot use the map function for this purpose since it simply concatenates the results of function applications, and would yield a list of lists in this case. Buta small modification of map suffices: mappend :: ( a -> [bl) -> [ a ] -> [bl mappend f [] = [l mappend f (x:xs) = f x ++ mappend f xs
Each call to function f returns a list of values, which are appended (++)together by map pend into one big list, rather than Gons-edinto a list of lists. The function mappend provides exactly the functionalitywe need to ~ansformthe generator in rule(3). Applying the transfo~ationto the Pythagorean list comprehension produces the following code: p y t h n = mappend f-be2 [l. . m ] where f-bc2 a = mappend f-c2 [a . nl where f-c2 b = mappend f-2 [b .. nl where f-2 c = if (a*2 + b"2 == cA2) t h e n E(a,b,c)l else [ l
"he transforma~ionof list comprehensions generates 'nested functions', as do several other ~ a n s f o ~ a t i o n The s . functional core deliberately excludes nested functions, functions that are local to other functions, since most target languages of functional compilers (for example, C and assembly code) do not support nested routines. As explained in Section 6.3.5,
the standard approach in non-functional compilers is to parameterize the nested routine with a lexical pointer, the frame pointer of the lexically enclosing function. T ~ o u g hthis pointer, the generated code can access variables in outer scopes. Using lexical pointers to activation records in combination with higher-order functionsandlazyevaluationcausesdynamicscopeviolations,however,since a call to a nestedfunctionmayescapeitslexicalscope at runtime,renderingitslexicalpointer as the result of a higher-order invalid. For example, a nested function can be returned function; lazy evaluation can delay the executionof a call to the nested function until lon after the caller has returned its value, which contains a reference to the suspended call. The latter case is exemplified by the function sv-mu1 shown in Figure 7.11.
sv-mu1 scal vec = let s-mu1 x = scal * x in map s-mu1 vec Figure 7.11 A function returning a function.
The sv-mu1 function defines the multiplication of a scalar and a vector, which is represented by a list. It calls the higher-order function map to apply the nested function s-mu1 to each element in the vector list. At run time, however, the interpreted code for sv-mu1 returns a graphholdingtheunevaluatedexpression map s-mu1 vec. Since thisgraphincludes a pointerto s-mul, which is definedintheactivationrecord of sv-mul, its dynamic scope (pointer scope) is that of the activation record. ~ e t u ~ i the ng graph to an environment of larger scope constitutes a scope violation, as explained in Section6.2.3.2.Ifwereturntheroutinevalue map s-mu1 vec, theactivationrecord of s v m u l will be removed before the nested function s-mu1 is ever applied. A potential solution is to allocate activation records from the heap instead of allocating them on the stack. The obvious disadvantage, however, is the higher cost. Also, the overhead of allocating and initializing activation records, passing and dereferencing lexical pointers, is relatively large for most nested functions, which usually only reference one or two non-local entries. Therefore, modern functional language compilers use lambda liftin tounnestnestedroutines, as explainedinSection6.3.6. The functional core, however, directly supports currying (partial p a r ~ e t e ~ z a t i o nso ) , we do not need to generate closures andtake care of allocating(part of) theactivationrecordsintheheap as inthe imperativecase.Translating a nestedroutine f to a globalroutine is just a matter of extending it with additional parameters p1 ... pa that capture the out-of-scope pointers; each usage of the nested function f must be replaced with a curried call: f p1 ,.. Pa* Returning to our scalar-vector multiply example of Figure 7.11, we lift the nested s-mu1 function into a global function named sv-mul-dot-s-mu1 as follows (we use sv-mu1 .s-mu1 since theidentifier sv-mul-dot-s-mu1 ratherthantheintended Haskell does not allow dots in identifiers). We extend the function heading with an additional parameter namedscal that captures the pointer to thes cal parameter of the outer
sv-mu1 function. Next, all calls of s-mu1 in the body of sv-mu1 are replaced by the expression sv-mul-dot-s-mu1 s c a l . The result is shown in Figure 7-12, in which the function sv-mu1 now returns an expression in which names either refer to parameters (passed by value) or to global names.
sv-mul-dot-s-mu1
scal x = scal * x:
Figure 7.12 The function of Figure 7.11, with s-mu1 l ~ b d a - l i ~ t e d .
For nested functions with a single out-of-scope pointer, lambda lifting is definitely cheaper than using heap-allocated activation records. Passing one addition~lparameter is as expensive as passinga lexical pointer, while no activation record has to be allocated and initialized. In most other cases lambda lifting outperforms activation records too. Activation records are only beneficial when shared between multiple nested functions so the initialization overhead can be amortized; this case rarely occurs in practice. Now that we have seen how the front-end can remove all syntactic sugar from the source program and leavejust program text in the functional core, we can turn to the problem of translating this functional core text to runnable code. The main issues are higherorderfunctionsandlazyevaluation. The usualtranslation of function^ core code is a hybrid one: the generated code creates data structures, graphs, that are handled further by an inte~reter.The next section introduces the graphs and treats the interpreter, the graph reducer. The generation of the code that produces these graphsis explained in Section'7.6. There are a multitude of possibilities to short-circuit this two-stage process offirst creating graphs and then reducing them, each leading to better code but a more complicated code generator. The most prominentof these are covered inthe last few sectionsof this chapter.
eer is the heart of the run-time system and deals with higher-order funczy evaluation. It will be used extensively by the code generator, described in the next section, to handle the applications of both standard and higher-order functions to their argu~ents.The graph reducer is an interpreter which operates on heap-allocated data structures representing expressions that are part of the functional program being executed. xpressions may be simple basic values such as integers and Booleans, structured types such as lists, and function applications. A function application can either represent a curried function consistingof a function and someof its arguments, or a lazy function applicationwith all argumentspresentwhosevaluemay or may notbeneededduringthe remainder of the computation. Recall that currying depends on the observation that
f el e2 ...
e,
is syntactic sugar for the chain of unary function applications writtenas (" f el) e 2 )
.
or, more explicitly, as ("
f
@
el)
@ ez)
... e,)
in which the function operator has been made visible. Graph reduction takes this observation literally, by representing a function application as a linked list of application nodes,
f
The @-operator denotes an application node holding a function pointer and an argument pointer. With this explicit representation of function application, higher-order functions can be passed around as ordinary pointers, and can be instantiated by allocating and initializin~the required application nodes. The execution of a functionalprogramstartswiththeconstruction of thegraph representing the initial expression whose value mustbe computed. Next, the graph reducer repeatedly selects a part of the graph that represents an expression that can be si ying a function definition, Such an expression is called a r e ~ u c expr i~~~ x for short. The selected redex is then reduced and the processis repeated. reducerstopswhen it cannolonger find a reducibleexpression. Finally, theresulting graph is returned as the result of the initial expression. Each step of the graph reducer is commonly referred to as a r ~ ~ u c t ibecause o ~ , an expression is reduced to a simpler one. Note, however, that applying a function usually enlarges the graph since the number of nodes representing the body usually exceeds the number of nodes holding the application spine. Nevertheless, applying a function is considered to yield a simpler expression since one level of abstraction -the function -is peeled off. The fully reduced expression resulting from a nested expression is just returned as its result. The result of the top-level expression is printed by the driver, as the result of the program. The driver prints the expression through a depth-first visit of all nodes, perfoming a case analysis on the type of each node. A basic value can be printed immediately; a function node has an associated name; a structured value is handled by recursively invoking the graph reducer on each field; and a curried function is printed as first the function name ( f ) and then recursively the arguments oneby one. Note that finding a curried function is not an error since functions are first-class citizens and may be the result of any expression, even of the main program. As an example of the operation of the graph reducer, consider the reduction of the following expression:
let twice f x = f ( f x) square n = n*n in twice square 3
The hi~her~order function twice applies the function square passedasan a r g ~ ~ e n t twice to thevalue 3. The graphreducerstartsoffwithbuildingthe initial expression twicesquare 3:
@
J '
3
square twice
The next step is to select a f1unction application that can be reduced. The initial expression contains just one redex, the application of twice to its two parameters, square and 3. Now the declaration twice f x = f ( f x)
says that we canreplace the subgraph
'
/-\ f
X
twice
for any substitution of f and x. By taking square for f and raph (1) by
3
for x, we can replace
square 3
Likewise, the declaration square n = n*n.
allows us toreplace the subgraph
JB
square
n *
n
which, applied to graph ( 2 ) by taking square
@ 3
*
for n, yields
@
square
3
Since graphs can become arbitrarily large it is inconvenient to draw the complete graph after each reduction, especially since reductions only involve small subgraphs. Therefore a reduction is shown by clearly marking the redex and its replacement. This is also exactly in line with the underlying implementation, which does not copy the complete graph at each reduction, as the diagrams may suggest, but only replaces the small part under reduction. For example, the application of square that transformed graph ( 2 ) into graph (3) is written as
\ The checked application node indicates the top of the application spine under reduction and is called the root of the redex. The dashed m o w indicates that the root is replaced by the graph on the right-hand side, which is the body of square instantiate^ with pointers to the a r g u ~ e n square t @ 3. Note that after the replacement (part of) the left-hand side may have become unreachable and falls to the garbage collector. Continuing the reduction process, we must select the next redex. There are two candidate function applications: the multiplication (*) spine and the square spine, each havin all their ~ g u ~ e npresent. ts The ~ultiplicationspine, however, cannot be reduced because its ~gume nt sare not in the proper format; the built-in operator can only perform the multiplication when both ~ g u m en tspoint to a number. Therefore, the graph reducer selects the square spine for the third reduction step:
* Next, the inner multiplication can be reduced, now that its argu~entsare in the proper format:
*
3
Finally, the remaining multiplication can be reduced to complete the computation: . ) -
*
81
9
after which the final value 81 is output. The above example shows that the graph reducer is basically a three-stroke engine:
-select a redex consisting of a function and its arguments, -instantiate the right-hand sideof the function, and
-update the rootof the redex.
The update in the last step is essential to avoid duplication of work. If the update was o ~ t t e deach , dereference of one of the pointers to the root would have to redo the instantiationinstep 2. In theworstcasethiscausesanexponentialoverhead.Selectingthe redex (step 1) is an important issue since it steers the computation and, therefore,is the key to implementing lazy evaluation. Note that in step 2 the right-hand side of a function is not 'evaluated' in the imperative sense, since all function a~plicationscontained in it are translated to ,application spines raph. Depending on the selection criteria such function applications rnay or rnay not be valuated in the future, whereas in imperative programs they are evaluatedto a single value as soon as they are met.
In general the compL~tation graph contains many redexes that can be selected for reduction. ~lthoughthe graph reducer can select any redex, it is of course advantageous to select a redex that is essential for computing the final result. The question is, how to find an essential redex; to find the answer we use a data-flow argu~ent,as follows. We h o w that we must reduce the initial expression to its value. Therefore we start at the root node of the initial expression. f the root node is not an application node, the graph reducer outputs the value. If the root points to an application node, we know that its value is needed so we are safe when se~~cting the redex with this node as its root. To determine the extent of the redex, we traverse the application spine down to the left to find the function, say f , that we must
apply; see Figure 7.13. We check whether or not the application spine contains the necessary number of arguments for the function f . If it does not, we have detected a curried function that cannot be applied andthe graph reducer must output it.
Figure 7.13 Identifying an essential redex.
If all rz arguments of f are present in the spine from the root we have located a functionapplicationwhosevalue is needed.Thisfunctionapplicationmay be ourwanted redex, but it need not be. If function f is a user-defined function, we are able to apply f ; hence, the application node inthe spine pointing to thelast argument of f (the node pointing to e, in the diagram) is the root of the next redex. If f is a built-in operator, however, it depends on the stateof the arguments and the natureof the operator f whether or not we can apply the operator. Built-in operators may require an argument to point to a value noderatherthananapplicationnode.Forexample,arithmeticoperators like + and * require that both their argu~entspoint to numbers so they can perform the addition or mul~ t sus to recursively search the tiplication, respectively. These so called strict a r g ~ ~ eforce computation graph at the application node appearing at a strict argument position, to find an essential redex. Turning to the graph in Figure 7.14 taken from the previous reduction sequence, the application node at the root is not a proper redex since the first argument of the multiplicationoperator is strict andpointstoanunevaluatedexpression.Thereforethechecked application node shouldbe reduced to a value first.
root:
square
@
3
igure 7.14 A graph with a built-in operator with strict arguments.
The process of developing the graph and finding the next redex to be evaluated in the compu~~tion is called ~ n ~ i ~ The d unwind i ~ ~ procedure . outlined above always selects the‘leftmostoutermost’redex.Thisreduction order is alsoknownas ~or~a~-order It is the underlyingmechanismtoimplementlazyevaluation.Imperative languages use a different strategy for evaluating expressions; they evaluate expressions at argument positions before invoking the function. This app~icative-orderr ~ d ~ c t i opolicy n selects the eftmo most innermost’ redex. The followingexample illustrates the difference: let const c x = c in const 1 (2+3)
The corresponding graph is:
J
eonst
J
3
+ 2 ~ o ~ a l - o r d reduction er selects the leftmost outermost redex, the root of which is the top ap~licationnode in our example. This application of const reduces to 1 immediately, without ever touching the second argumentof const. Applicative-order reduction, on the otherhand,selectstheleftmostinnermostredex,which is theadditionthatyields 5. Another reduction is needed to reduce const 1 5 to the final result 1. This example shows that applicative~orderreduction may need more reduction steps to derive the same result as normal-order reduction. In the worst case, however, applicative-order reduction may start evaluating a non~terminatingcomputation that is not needed and, consequently, fail to produce the result obtained with normal-order reduction. Another consequence of the difference in reduction order is that the expression const 1 (1 /0 ) will generate a ~ n - t i m e‘division by zero’ error under applicative-order reduction, while no error occurs under normal-order reduction.
This section provides the C code for a simple reduction engine based on the previous discussion. The reduction engine operates on four types of graph nodes: numbers, lists, function ~pplications,and function descriptors. Figure ’7.15 shows the header file declaring the s t ~ c t u r of e graph nodes: a tag and a union holding oneof the four node types. A functiondescriptor contains thearity of thefunction, its name for printingand debug gin^ purposes, and a pointer to the actual C code. An integer is represented directly by its value, A list node holds one CONStmctor consisting of pointers to the head and tail of the list; the empty list contains no further i n f o ~ a t i o nbesidesthe NIL tag.And a (binary)applicationnodeholdspointerstothefunctionand ~gument. The function pointer may point toa function descriptor or another application node.
Nodes can be created by invoking the associated constructor function, which calls the memory manager to allocate a fresh node in the heap. Figure 7.16 shows such a constructorfunction, for APPL nodes. Since nodesaremanipulatedalmostexclusivelythrough t type in the reductionengine. pointers, the data typePnode is the p r e d o ~ n a n data
typedef enum {FUNC, NUM, NIL, CONS, APPL} node-type; typedef struct node *Pnode; ; typedef Pnode (*unary) (Pnode *arg)
struct function-descriptor { int arity; const char *name; unary code;
1;
struct node { node-type tag; union { struct function-descriptor func; int num; struct {Pnode hd; Pnode tl;} cons; struct {Pnode fun; Pnode arg;} appl; } nd;
1;
/* Constructor functions */ *name, unary code); extern Pnode Func(int arity, const char extern Pnode Num (int ;num) ; extern Pnode Nil (void) hd, Pnode tl) ; extern Pnode Cons (Pnode fun, Pnode arg) ; extern Pnode Appl (Pnode Figure 7.15 Declaration of node types and constructor functions.
The eval function in Figure 7.17 i~plementsthe three-stroke reduction engine. It takes a pointer to an arbitrary graph node and performs a case analysis on the ta out what to do next. Application nodes and function descriptors require further processing, while all other tags stopthe reduction engine since a basic value can be returnedas is. An application spine must be unwound to identify the function and argumentsconstituting a redex that needs evaluation. Since the arguments are needed when executi~gthe function, the reduction engine pushes them onto a stack so it can refer to them directly. After unwinding the application nodes the arguments are conveniently stackedas shown in Figure 7.18, When the reduction engine detects a function descriptor it checks if all arguments are present. If this is not the case then the root points to a curried function, which cannot be
Pnode Appl (const Pnode fun, Pnode arg) { Pnode node= (Pnode) heap-alloc (sizeof
1
(*node) 1;
node->tag = APPL; node->nd.appl.fun = fun; node->nd.appl.arg = arg; return node;
igure 7.16 The node ~ o ~ s t ~ uf ~ u nt o~ tr i for o ~ application ~ (a)nodes.
reduced, so reduction stops and the arguments are unstacked before returning. The caller decides on what to do with the curried function; for example, the top-level driver will print it. If, however, all argumentsarefoundpresent,thepointertothefunction code is retrieved from the descriptor and the function is called. The arguments of the function are passed as an array. When the call returns the arguments are popped off the stack and the result is copied into the root node, so all future references will find the result immediately and need not repeat the call. This completes the current reduction step, and the next reduction cycle is started. The reduction engine is accompanied by several routines implementing the semantics of the built-in arithmetic andlist operators. Each built-in operator is wrapped in a function descriptor,whichincludes a pointertothe code p e r f o ~ i n gtheactualreduction.Figure 7.19 shows the code of the built-in multiplication operator mul. The basic routine mu1 ( ) evaluates its arguments, extracts the numbers from them, ~ultipliesthe numbers and creates a node for the result. The basic routine is unsuitable for calling by the interpreter since the latter supplies the arguments as an array; the prelude routine -mu1 bridges the gap. The function description m u 1 represents the function mu1 as a graph node, to be used whena graph for a call of mu1 must be constructed. We see that the mu1 routine starts by calling eval on both argu~ents,to ensure that these point to numbers rather than to unevaluated app~icationnodes. This leads to a situation where eval indirectly calls itself, thus stacking the ar~umentsof l~ultiplefunction applications on top of each other. An example is shown in Figure 7.20, which presents a square3. The depth of snapshot taken from the evaluation of the expression twice is inprinciple thestack of eval callsdependsontheprogrambeingexecutedand un~ounded.To avoid memory corruption, the stack can be implemented as an extensible m a y (Section 5.1.3).
almosttrivial,sincetheonlytask of the C code is tocreategraphs for theinterpreter eval ( ) . For each function definition in the program, we generate a function descriptor and the necessary code to build the right-hand-side graph. For example, the higher-order
#include !!node. h" #include !leva1.h" #define STACK-DEPTH
10000
static Pnode arg [STACK-DEPTH] ; static int top= STACK-DEPTH;
/* grows down */
Pnode eval (Pnode root) { Pnode node = root; int frame, arity;
APPL
frame = top; for (;;) { switch (node->tag) { case : /* unwind */ arg[--top] = node->nd.appl.arg; /* stack argument */ node = node->nd.appl.fun; /* applicationnode */ break; case FUNC: arity = node->nd.func.arity; if (frame-top c arity) { top = frame; return root;
1
node = node->nd. top += arity; *root = *node; break;
func.
1
1
Figure 7.17 Reduction engine.
function twice f x = f (f x)
can be easily transformed into thecode
*/
code (&arg [top] ); /* reduce */ /* unstack arguments */ /* update root pointer */
default: return node;
1
/* curriedfunction
argument stack root:
@
J
@\ square twice
--
34-
4
#
4
A”””
igure 7.18 Stacking the arguments while searching for a redex.
Pnode mu1 (Pnode -argO, Pnode -arg1) { Pnode a = eval (-argO) ; Pnode b = eval (-argl) ;
1
return ~um(a->nd.num* b->nd.num);
Pnode -mu1 (Pnode *arg) { return mu1 (arg [Ol , arg E11 ) ; struct node -mu1
= {FUNG, ( 2 , “mul”, -mu1}};
Figure 7.19 Code and function descriptor of the built-in operator mul.
argument stack
* square
3
Figure 7.20 A more compli~atedargument shck.
~ e n e r ~ t i ofor n f ~ ~ c t i ocore n ~ l~
Pnode twice(Pnode-arg0 Pnode f = -argO; Pnode x = - a r g l ;
1
, Pnode
r e t u r n Appl ( f , (Appl ( f
Pnode "twice(Pnode*arg) r e t u r n t w i c e ( a r g[Ol
1
s t r u c t node-twice
I
-argl)
x)1
r
o
~
r
~
{
;
{
, a r g [l1 1 ;
= {FUNC, ( 2 , lltwiceta -twice}} ; I
It may not be immediately obvious, but the introduction of the two local variables f and x that pick up the arguments is the result of -trivial -pattern matching on the ar~umentsf and x, ene era tin^ code from a user-definedfunctionthatinstantiates the right-hand-side graph at run timeis s~aightforward.The graph reducer (eval)knows how to handle function applications, so we will simply translate each core construct into a function call, as follows. Expressions form the heart of the functional core and consist of identifiers, n u ~ b e r s , functionapplications, ~ t h m e t i coperators,if-then-elsecompounds,andlet-expressions. ~dentifiersneed no further processing since we can use the same names in the C code. Numbers are allocated in the heap by wrapping them with their constructor function (for example, Nurn (3 ) ). The code for a function application boils down to const~ctinga graph consisting of several application nodes holding pointers to the ~ ~ u m e expression. nt example, For function the application twice square 3 translates into App1 (Appl (&-twice, &-square) Nurn ( 3 ) ) ,which will be evaluated by the graph reducer when needed. Expressions involving arithmetic infix operators are regarded as syntactic sugar for function applications of the corresponding built-in operators. Consequently, they too translateinto code that constructs a piece of graph. For example, I
n * f a c( n - 1 )
becomes APP1( App1 (&-mu1 I n ) , App1 (&-fac APP1( App1 (&-sub, I
n) ,
Num(1) ) )
)
Translating the if-then-else const~ctionin the functional core is less straightforw~~. It cannot be translated into the if-then-else of C, because the latter is a state~entand cannotoccuraspart of anexpression. The conditionalexpressionin C (cond ? x : y) can be used in expressions, but it is not suitable either, because lazy evaluation requires
~
that nothing be evaluated before it is needed. Without further analysis we cannot be sure thattheconditiontestedinan(nested)if-then-elsecompound is needed.Therefore we translate the if-then-else compoundinto a call to the built-inconditional primitive shown in Figure 7.21. Note that Booleans are handled as numbers by the run-time system, just as in
c.
Pnode conditional(Pnode -argot Pnode -argl, Pnode -arg2) { Pnode cond= -argO; Pnode "then = -argl; Pnode -else = -arg2;
1
return eval(cond)->nd.num ? "then
: -else;
Pnode -conditional(Pnode *arg){ return conditional (arg [OI , arg [l.], arg E21 ) ;
1
struct node -conditional = {FUNC, ( 3 , ilconditional", -conditional}}; igure 7.23 Code and function descriptorof the built-in conditional operator.
e final construct that can beused to form an expression in the functional core is the let-expression. An example of a let-expression is shown in Figure 7.22. The mainproperty of let-expressions is that they introduce local bindings. These bindings are translated into assignments to local variables in C. Such a local variable holds a pointer to the graph noderepresentingthenamedexpression,whichcanevenbe a (curried)function. The translatio~of the functi~nf is shown in Figure 7.23; the usual definitions of -f and -f are not shown.
f a = let b = fac a c = b * b in c + c ~ i g u 7.22 ~ e Example of a let-expression.
ependencies between local bindings are resolved by topologically sorting the bindfore emitting code; the case o f bindings that mutually refer toeach other is handled cise 7.13. Let-expressions may occur at the toplevel or nested inside other expressions. Since C does not allow local variables to be declared inside expressions, such nested
Pnode f (Pnode Pnode a = Pnodeb = Pnode c =
-argO { -argO; A p p l ( & - f a c ,a ) ; App1 (Appl(&-mull b)
I
b) ;
f Figure '7.23 Translation of the let-expression of Figure 7.22.
let-expressions require some more work. We canfloat the nested locals to the beginningof the C! function if we resolve name clashes caused by merging the nested scopes. This is easily accomplished by renaming nested identifiers to unique names.
The performance of the interpreted core programs is rather poor. The const~ctionand interpretation of the graphs is the main source of overhead. For example, a simple multiplication already requires the allocation of two apply nodes making up the function call and a number node to store the result. Fortunately, a simple optimization, s~ort-circuiting ~ u ~ c tai ~o~~Z i c a t irequires ~ ~ s , no further program analysis and cansignific~tlyreduce the graph handling overhead. It can be applied to top-level expressions and to the argu~ents of many built-in functions, as we will see. Section 7.7 covers optimi~ationsthat require more extensive compile-time program analysis. The codegeneration scheme discussed above treats all function calls equally and generates code to allocate a spine of application nodes. W e n such an application spine is returned as the result of a function, the graph reducer imediately unwinds this spine to locate the function and arguments, followedby a call to that function. As an example consider the function average: a v e r a g eab
= (a+b) / 2
the C translation of which is showninFigure7.24.When average () (oractually average () )is invoked by the graph reducer weh o w that its valueis needed under the the construction of the applilazy evaluation mechanism. Therefore we can s~ort-circ~it cation spine for the top-level call and call the div ( ) routine directly, as shown in Figure 7.25. Note that the arguments to div () are still unevaluated. More in general, if the top-level expression in a right-hand side is an application of a function F with n arguments, the standardcode segment for this application
I _
which builds an application spine of length n, can be sho~-circuitedto ... A,)
Pnode average (Pnode -argO , Pnode Pnode a = -argO; Pnode b = -argl; return APP1( APPI ( &-div , APPl( App1 (&-add, b
-argl) {
a ),
)
1, Num(2) );
1
Figure 7.24 Translation of the functionaverage.
Pnode average(Pnode -argO, Pnode -argl) { Pnode a = -argO; Pnode b = -argl;
1
return
div(App1
(Appl (&-add,
a), b) , Nurn(2) ) ;
igure 7.25 Suppressing the top-level spine in the slatio ion of the functionaverage.
which calls F directly. This optimization is not restricted to top-level expressions and can be applied to any position in which an application spine is constructed which we know will be unwound i~mediatelyby theinterpreter.Onesuchposition is inthe strict arguments of built-in operators. Most built-in operators require their ar~umentsto be evaluated before the actual operation can be performed, so they invoke e v a l ontheirswictarguments.Thus, if a built-in operator is called directly then weh o w that it will invoke e v a l for its strict arguments.The default code generationschemetranslates each functionapplication into an app~icationspine to preservelazyevaluationsemantics. ~ o n s t ~ c t i napplication g spines for expressions that arestrict ar~umentsof a built-in operator, however,is u n n e c e s s ~and the application spine may be replaced by a direct call. So, ~hort-circuitingcan be applied not only to top-level calls in right-hand sides of function definitions but also to function calls that are strict arguments of built-in operators, The effect on the a v e r a g e code is that the lazy addition spineis replaced by a direct
call to the built-in add operator, resulting in the optimized code shown in Figure 7.26. Notethat all argumentsare still unevaluated;what we save is constructingthespine, un~indingit to an array, extracting "add ( ) from the node -add, and calling __add() , which retrieves the arguments from the array and calls add () . Pnode average(Pnode -argO, Pnode -argl) { Pnode a = -argO; Pnode b = -argl;
1
return
div(add(a,b) , Nurn(2)) ;
re 7.26 Suppressing the argument spine to d i v in Figure 7.25.
The performance effect of these optimi~ationsdepends on the program being interpreted, but improvementsof a factor of two or more arequite common.
The p e ~ o ~ a n of c ethe interpreter-based approach outlined in the previous sectionsis way off compared to thep e r f o ~ a n c eof imperative codes, evenif we include the opti~izations applicationnodesandexploitthe strictness of built-inoperators oussection. To achieve acceptable performance, close tothat of state-of-the-art functional languagecompilersapply a host of advancedoptimizations, all of whichrequireprogram-wideknowledge.Typicallythese opti~izationsare pe~ormedon the AST after the source language has been simplified to an i ~ t e r ~ e d i acode t e like our functional core. ~e will now discuss four important optimizations that occur at the functional-core level. Low-leveloptimizationsare left tothe C compiler,which is used as a portable y s i~s o ~~ ~i a ~l y s i s , optimi~ingassembler. The first two op~imizations,s t r i c t ~ e s s ~ ~ a land tackle the overhead of manipulating graph nodes. The other two optimizations, tail calZs and a c c ~ ~ ~ l a t i ~ ~address ~ r the s ~(recursive) ~ e ~ t sfunction , call overhead introduced by the functional program~ingstyle, which structures programs as a c o ~ ~ o s i t i oofn many small functions. Together these four optimizations allow us to generate code for the recur~ ~ Figsive factorial function that is as efficient as its imperative loop-based c o u n t e (see ure 7.2).
By default, lazy evaluation causes expressionsat ~ g u m e npositions t in a function call tobe passed unevaluated as a graph of application nodes. Building the graph is quite expensive, and becomes pure overhead when the called function will in all cases evaluate the (lazy) argumenttoproceeditscomputation.Somefunctions ~ l ~ a need y s to evaluate some of
their arguments. If a function always needs the valueof argument ai,the function is said to be strict in argument ai, The purpose of stri~tnessa n a ~ ~ sisi sto determine for all userdefined functions which arguments are strict. Then the code generator can exploit strictness i n f o ~ a t i o nby evaluating expressions at strict argument positions before calling the function, to save the overhead of building the graph structure by the caller and the call to the graph reducerby the callee. The basic scenario for analyzing the strictness of a function is to propagate the strictness of built-in operators and other user-defined functions bottom-up through the AST. As an example, consider thefunction safe-div a b = if (b == 0) then 0 else a/b
The AST of this function, annotated with theflow of the strictness information,is shown in Figure 7.27.
if
Figure 7.27 Flow of strictness i ~ f o ~ a t i oinnthe AST of saf
e-div,
We start by constructing the basic sets of strict identifiers at the leaves of the AST. A parameter p yields the set {p},since when the reduction engine reaches p, the value of p will be needed. Likewise, a local variable v, defined in some let-expression, yields the set {v}. Constants, data constructors, and function names yield the empty set {}. The strictness information sets are propagated up the AST according to the rules shown in Figure 7.28. Note that the italic capitals (L, R,C, etc.) represent language constructs in the left column and the sets of strict variables occurring in these language constructs in the right column. Arithmetic operators need both their arguments, so we simply take the union of the left and right sets.The i f -then-e 1S e construct tests the condition and then selectsthe then- or else-part. All strict arguments in the condition are needed. In addition we propagate the parameters that are needed both in the then- and else-part, since, whichever way the conditional goes, these will be needed. Next, we consider the two propagation rules to handle calls to user-defined functions. The first rule covers the case that the function identifier is a global name (fun,), so we know exactly which function is called and what the
Language construct I; operator R
i f C then T e l s e E
fun,
@A
Setpropagated be to LUR Cu(TnE) nnin(m,n)
1@
S - .
@An
u~~i~~~~~~,i) Ai i=l
F @ A I @...@A,
F
let v = V i n E
(E\{v})uV E
,if v E E ,otherwise
Figure 7.28 Rules for strictness information propagation.
properties of its arguments are. The second rule covers the case that the function being invoked is described by someexpression (F), whichcouldbeassimple as a function parameter or as complicated as ani n t e ~ e d i a t eresult computed by a higher-order function. When a global function is called, the union of all argument sets occurring at strict parameter positions is propagated. The strictness predicate ‘strict(fun, i)’ is used to filter which argument sets ( A i )are part of the result set. The number of arguments supplied (n) need not be equal to the number of parameters (m),so the union is over the first ‘min(rn, n)’ arguments only. For function expressions (the second rule) no strictness information about the arguments can be derived. We can do better than propagating the emptyset, however, since we do know that the function itself will be needed, so we return the set F holding the function parameter. The final rule in Figure7.28 covers the usageof local variables in let-expressions.If a local variable is strictly needed for the evaluation of the result expression it will be part of the set E of strict identifiers. In that case, we remove it from the set and addall strict identifiers occurring in the value expression V associated with variable v. Otherwise, we sirnply pass set E. Determiningthestrictness of a function by floatingupsets of strict arguments requires knowledge about the strictness of the functions used in the body. We can be conservative and have the strictness predicate return false for functions that we have not processed yet, but a better solution is to use a closure algorithm. Before presenting this algorithm, however, we address recursive functionssince they require special trickery. ~
0
~
0
The approach of floating up strictness information mayfail to infer the strictness of certain arguments of recursive functions. The problem is that, even with a closure algorithm, the bottom-up traversal starts with conservative infomation about the strictness of the recursive function’s arguments. This prevents the discoveryof strict arguments like b in the following example:
f
a b =
if ( a == 0 ) then b e l s e f ( a - l ) (b+l)
The function f is representative for a class of functions that iterate until some t e ~ i n a t i o n condition is met: a == 0. Each step performs an operation on some intermediate result: b+ 1, W e n the recursion stops, the result of all c o ~ p u ~ t i osteps, n b, is returned. It is clear that f is strict in both its ~guments,but without any knowledge about the strictness of f beforehand propagating the strict ~ g u m e n t salong the AST identifies only a as being a strict ~gument. The solution is to opti~isticallyassume that a recursive function is strict in all its a r ~ u ~ e n t s , p r o p athe ~ a tstrict ~ ~ g u m ~ nunder t s this assumption, and then check if the derived set of strict arguments includes all argum~nts.This is the case for f as Figure '7.29 shows. The optimistic assumption that f is strict in both ~ ~ u m e nist ssignaled with the (+ , + ) annotation and holds true since the result set that floatsout the AST is {a , b}.
a
1
F i ~ u r7.29 ~ Optimistic strictness analysis of the function f.
In the general case, the result set (R) may not contain all ~ g u ~ e n t In s . that case, we know that the initial assumption (A) was too optimistic;the best we can hopefor is that the function is strict in R. Therefore, we narrow the strictness assumption to R and propagate the sets through the AST again. Narrowing the strictness assum~tionA continues until the result set R equals the assumptionA . A s an example, consider the function g x y z =
if ( z L== 0 ) then x else g y x (2-1)
The optimistic approach in the case of g needs thee steps: ster,
~ s s u ~ ~ t i o nresult
This is the correct result, since g returns either x or y depending on whether or not 2; is even. In general, the optimistic approach is guaranteed to produce the correct set of strict ~ g u m e n t sfor any recursive function f that terminates. Only for recursive functions that deteriorate once invoked, the optimistic scheme may report ~ g u m e n t sto be strict that are not. An obvious but ugly exampleis the function f x - f x
The optimistic approach derives that f is strict in x in one step. Although this is strictly speaking incorrect, since the value will never be needed, the error has no serious consevaluating anexpressionin a non-strictcontexthastwopotential quencesinpractice. consequences: (1) the evaluation may not terminate, or (2) an invalid operation is being performed. Since the call to f would not t e ~ i n a t eanyway, the only difference is that in case (2) a ~ n - t i m eerror is reported instead of looping indefinitely. Now that we have shown how to handle self-recursive functions,it is easy to generalize the optimistic approach to cover mutually recursive functions calling each other. The closure algo~thminFigure 7.30 doesexactlythat.Itstartsoffwith a set of strictness assumptions, one for each function involved,and iterates until the correspond in^ result sets match their assumptions. The algorithm of Figure 7.30 is capable of identifyinganystrict ~ g u m e n t ,but requires all ASTs to be available at once. This is not very practical (separate co~pilation), nor feasible (prop~etarylibraries). Some compilers therefore store strictness informationin associated info files, which will be consulted when a function is used in another program module. Whenever an infofile is updated, all modules depending on it must be recompiled to identify as many strict arguments as possible. In the case of a library, recompiling to update the info file is usually not an option and, consequently, strict arguments may be missed. This, however, is only a problem for higher-order library functions that operate on user functions; all other functions cannot call back the program module, and therefore do notdepend on externalstrictness i n f o ~ a t i o n .~ u n c t i o ~ alanguage l compilersoften include a library file containing the source code of common higher-order functions like map and rnappend with each user file instead of linking them as a library. This results in better analysis and, hence, better code. S
Expressions occurring at strict ~ g u ~ epositions nt may be evaluated immediately without const~ctinga graph, Section 7.6.1 already showed the advantage of exploiting the strictness of the arguments of built-in operators. Now we can also exploit strictness of userdefined functions. For example, consider the factorial functionfac:
ata ~ e ~ n i t i Q n s :
1. Let Strict be an associative mapping from (identifier, int) pairs to Booleans, where each pair describes a function name and argument index.
nitializatiQns: 1. For each function f with arity M set Strict% i] to true for arguments i ranging from l to m. n ~ e r e n ~rules: e 1. If propagation of strictness i n f o ~ a t i o nthroughthe AST of function f shows thatfis not strict in argumenti, then Strict% i] must be false. Figure 7.30 Closure algorithm for strictness predicate.
fac n = if (n =c= 0 ) then 1 else n * fac (n-1)
Withoutstrictness infor~ationaboutitself f ac compilesintothe code shown in Figure '7.31. Note that the strictness of the built-in multiplication mu1 is already used to call f directly. ac Also, the conditional has been expanded in-line, and the strictnessof its first ~ g u m e nhas t been used to callequal directly. The argument of f however, ac, is passed lazily as a graph. Using the fact that fac is strict in its argument n allows us to remove the last spine construction, leading to the code of Figure 7.32. Pnode fac (Pnode -argO) { Pnode n = -argO;
1
return equal (n, Num(0)) ->nd.num ? Num(1) : mu1 (n, fac (Appl (Appl (&-sub, n) , Num(1) ) ) ) ;
Figure 7.31 Lazy translation of the argument of f ac.
Until now we have used the strictness i n f o ~ a t i o nto avoid constructing applications spines, but we can go one step further. In addition to generating a routine fac ( ) which expects one unevaluated argument in which it is strict, we can generate a second routine f ac-evaluated ( ) , whichexpectsanevaluatedargument.We can thenreplace fac (n)by fac-evaluated(eva1 (n))
and, if we h o w that n has already been evaluated, leave out the call to eval ( ) . We can do the same for the built-in operators, leading to the introduction of add-evaluated ( ) ,
Pnode f ac (Pnode Pnode n= -argO;
1
-argO) {
return equal (n, Num(0)) ->nd.num ? Num(1) : mul(n, fac(sub(n, Num(1))));
Figure 7.32 The first step in exploiting the strictness of the argument of f ac.
equal-evaluated () ,etc. The code for fac that uses this optimization is shown in Figure 7.33; note that the code for fac ( ) itself has been left in place, andnow calls fac-evaluated ( ) after evaluating the argument. Now, when generating a direct call has a choice between two translations: calling of fac, the code generation fac-evaluat (ed ) if fac is known to be strict and the argument is known to be evaluated, and calling f ac () otherwise. (The options of calling -f ac ( ) when arguments are in an m a y and of creating a node using -f ac are still open as well.) This code uses at most one call of eval ( ) for each call of fac (n);this is a great improvement over the code from Figure 7.32, which calledeval () 6n times.
Pnode fac-evaluated (Pnode Pnode n= -argO;
1
return equal-evaluated(n, Num(0)) ->nd.num ? Num(1) : mul-evaluated (n, fac-evaluated (sub-evaluated (n, Num (1)) ) ) ;
Pnode ac f (Pnode -argO){ return fac-evaluated(eva1
1
-argO 1 {
(-argO) );
Figure 7.33 Exploiting the strictness off ac and the built-in operators.
With strictness analysis we avoid c o n s ~ c t i n gand inte~retingapplication spines at strict positions. The overall effect on performance is strongly program dependent, since the quality of the strictness analyzer and the lazinessof the program determine whichfraction of the application spines can be optimized away. The strictness analysis discussed so far only considers argumentsof functions, but many expressions are partof data structures. Some progress has been made with analyzing the strictness of data structures like lists (Burn,1991), but in general it is a hard nut to crack, so most strictness analyzers simplydo not address the issue at all. To alleviate the problem of strictness analysis failing to determine the strictness of certain expressions compilers often have provisions for the programmer to manually indi-
cate strictness. Haskell provides the standard function seq (for ‘sequence’) that can be used in seq x y to ensure that x is evaluated before running y. This function can easily be usedto ma& strict expressions,butalsoplaysanimportantrolein progra~ing interactive programs where actions should happen in a precise order.
For simplicity the graph reducer requires all data to be ‘b~xed’in a graph node cont~ning a tag for identification, as outlined in Figure 7.15. Although this allows a uniform treatment of pointers when inte~retingthe computation graph, it is also a source of inefficiency. For example, the strict version of the function f ac in Figure 7.33 still allocates a considerable number of graph nodes to hold thei n t e ~ e d i a t eresults produced by the builtin operators less,mul,and sub,even in their -evaluated forms. These intermediate nodes are passedon to another operator or call of f ac,both of which are onlyinterested in the numerical contents, not in the box surrounding it. The function mul-evaluated () Pnode mul-evaluated (Pnode a, Pnode b) { return Num(a->nd.num * b->nd.num);
1
simply dereferences the argument pointers without even ins~ectingthe node tag, and does not use the pointers otherwise, It is obvious that we can pass the numbers directly, instead of boxed in graph nodes. In general, we can pass any basic value unboxed at strict argument positions. Once strictness analysis is performed, it takes little analysis to find out o ~ i ~ which ~gumentsare suited for unboxing. This analysis is called ~ The effect on the fac example is shown in Figure ’7.34, which features the routine f ac-unboxed ( ) . The built-in operators are no longer needed, and the correspondingC operators are invoked directly. The function fac-evaluat ed ( ) was adapted to take care of unboxing the argument beforecalling f ac-unboxed ( ) ,and boxing the result in a number node. Boxing the result is necessary since fac-evaluated () has to return a boxed value as before. f ac,it no longer allocates any graph node. After applying the boxing analysis to This is, however, an exception since most functions in a Haskell program of reasonable size do not operate on strict arguments with basic types. Nevertheless boxing analysis can have a noticeable effect on performance: the unboxed factorial runs about l 0 times faster than its boxed counterp~t.
The last statement in a routine in code generated for functional languages is often a function call. One reason is that repetition is expressed using recursion infunctional languages. But we will see in Chapter 8 that the same phenomenon occurs in code generatedfor logic programs, as Figures 8.16, 8.19, and even 8.15 and 8.18 amply attest. Some of these tail calls can be avoidedby in-lining the called routine, but the ones involved in recursion cannot. These in particular cause a problem, since they may lead to stack overflow. A tailrecursive routine working on a linked list of N elements will reach a stack depth of N just
int fac-unboxed(int n) { return (n == 0 ? 1 : n * fac-unboxed(n-1));
1 Pnode fac-evaluated(Pnode -argO) { Pnode n = -argO; return ~um(Eac-unboxe~(n->nd.num));
1 Pnode Eac (Pnode -argO) { return fac-evaluated (eval(-argO))
;
1 re 7.34 Strict, unboxed translation of the function fac.
before finishing; and when it finishes, it spends a time proportional to N in unstac~ingthe activation records. An iterative routine w o r ~ n gon the same N-element list would have a stack depth of 1 and a 1-level unstacking operation would suffice upon ter~ination. So it is natural to look for ways of avoiding the stacking involved in tail calls. Fo~unately,much of the administration involved in a tail call is super~uousand can be eliminated. There are several reasons for this. First, there is no reason to return to the routine R that made the call (the caller) when the routine C that was called (the callee) has finished, since there is no additional code in the caller that needs to be executed. So there is no need for a return address when C exits, nor was there any need to store one when C was called and the transfer (tail call) to C could be effected by a simple jump instruction. Second, the formal and local variables of the caller R are no longer required as soon as the parameters of the tail callee C have been computed, provided these p ~ a ~ e t edo r s not contain pointers to the formals or locals of R. And third, now that we have seen that the formals and locals can be abandoned and there is no need for a return address, most or all of the activation record of R is free just before the call. When generating assembly code, these possibilities for opti~ization are easily exploited. The parameter values of C are written over the activation record of R, which now looks like a proper activation record for C, and a simple jump to the start of the code of C is executed. Some care must be taken in the overwriting of the variables, but on the whole the o p t i ~ z a t i o nis straightfo~ardand r e w ~ d i n g .It prevents the stack from growing from tail recu~sion,thus extending the range of applicability of recursion, and reduces the routine e ~ t r y / ~ xcode i t to a simple jump. When generating C, however, the start of the code of the callee is not available to jump to, except in the case of a directly tail-recursive routine, Fo~unately,that is an important case in functional programs. First, many tail calls are already directly tailrecursive. Second, the remaining indirect tail calls can often be t r a n s f o ~ e daway by inlining the code of the callee. If the code of the callee itself ends in a tail call, we must inline further if that call is not to the original caller. This approach fails if not all source code is available, for example, when calling a separately compiled (library) function. Most
mutually recursive functions, however, occur in the same module. As an example consider the following function that drops the first n elements from a list: drop n 1st = if (n == 0) then 1st else drop (n-1) (tail 1st)
Using strictness i n f o ~ a t i o nand boxing analysis we would normally generate the C-code in Figure 7.35. The tail call can easily be transformed into a back jump, as shown in Figure 7.36. The tricky part of overwriting the parameters with the new values of the tail call is left to the C compiler by introducing a set of temporary variables - t m p O and -tmpl. Pnode drop-unboxed (int n, Pnode 1st) { if (n == 0 ) { return 1st; } else { return drop-unboxed(n-l, tail
1
); (1st)
Figure 7.35 Strict, unboxed translationof the functiondrop.
Pnode drop-unboxed (int n, Pnode 1st) { L-drop-unboxed: if (n == 0 ) { return 1st; } else { int -tmpO = n-l; Pnode -tmpl = tail (1st);
1
n = tmp0; 1st = -tmpl; goto L-drop-unboxed;
Figure 7.36 Translation of drop with tail recursion eliminated.
Although we have seen above how to transform direct tail recursion into efficient c code, many self-recursive functions do not end in a tail call, but require alittle postprocessing of the result of the recursive call. An exampleis again the function fac:
fac n = if (n == 0) then 1 else n * fac (n-1)
The problem is that in this code the multiplication is performed after the recursive call, resulting in a stack of n recursive calls. Another transfo~ationis required to reduce the required stack space to a single frame. If the operator using the recursive result is associative, as is the case for multiplication, we can trans for^ the code to use an accumulating argument, as shown in Figure 7.37.
fac n
=
if (n == 0) then 1 else prod n (n-1) where prod acc n = if (n == 0) then acc else prod (acc*n) (n-1)
Figure 7.37 Tail-recursive f ac with ~ c c u ~ u l ~ targument. ing
The local function prod accumulates the results of the multiplications along the way in its the call prod p q computes p*q* (q-1)* . . . "1. So the desired first a r g u ~ e n acc: t product n* ( n - 1) * . . .* 1 is computed by the call prod n (n-1 ) ,except for n equals 0, which has to be caught as a special case. Since prod is a tail-recursive function being strict in both arguments, the code generator may now produce the code shown in Figure '7.38. Note that the local function prod is lifted to the top and has been renamed to fac-dot-prod. The tail call to itself has been transformed into a jump, which makes the code as efficient as the handwritten imperative version of Figure 7.2. The transformation to an accu~ulatingargument depends on the compiler being able to recognize the recursive pattern and to determine that the operations performed on the results are associative. In other words, the top-level node in the AST should be an associative operator and one of its children s~ o u ldbe a self-recursive call. Figure 7.39 gives the translation scheme for a general right-recursive function: if a function definition matches the first line in the table, it can be replaced by the other two functions. In this scheme, @ denotes an associative operator, F is the function name, and B, V, E, El...En denote arbitrary expressions. An additional constraint is that the @ operator should be strict, to avoid building a chain of unevaluated expressions through the accumulating argument. Note that the test B in the replacement code of I ; is evaluated in a different environment than the B in the code for F'; so the code of B is d~plicatedin this transformation, but its actions are not. The function f ac, given as fac n = if (n == 0) then 1 else n * fac (n-1)
matches the conditions in the translation scheme, with F = fac,n = 1, @ = *, B = (n == 0 ) , V = 1,E = n, and El = (n- 1 ) . This results in its replacement by the functions
int fac-dot-prod-unboxed(int acc, int n) { L-fac-dot-prod-unboxed: if (n == 0 ) { return acc; } else { int -tmpO = n * acc; int _tmpl = n-l;
1
1
acc = -tmpO; n = -tmpl; L-fac-dot-prod-unboxed;
goto
int fac-unboxed (intn) { if (n == 0 ) { return 1; } else { return fac-dot-prod-unboxed(n, n-1);
} Figure 7.38 Accumulating translation of f ac, with tail recursion elimination.
F xl...x, = if B then V else E G.3 F El...E, ==3
F xl.,.x, = if B then V else F’ E E ,...E, F’ x,,, x1 ,.xn = if B then xoCc@ V else F’
(x,,, 0 E) El...E,
Figure 7.39 Accumulating ~ g u ~ etranslation nt scheme.
fac n = if (n == 0) then 1 else fac-prime n (n-l) fac-prime x-acc n = if (n == 0) then x-acc * 1 else fac-prime (x-acc * n) (n-1)
which, after the arithmetic simplification x * 1 I=$ x,matches the code in Figure 7.37. Translation of the functional core code of Figure 7.37 to C yields the code of Figure 7.38. In-lining and a few other optimizations discussed in Chapter 4 yield code that is comparable to that in Figure 7.2. In s u ~ ~ a rthe y , following optimizations were required to trans for^ the definition of f a c in Figure 7.1 toG code equivalent to thatof Figure 7.2:
-top-level sho~-circuitingof the if-state~entand its branches;
-sho~-circuitingof =E=and * due to known strictness of their arguments;
- strictness analysis; yields that f a c is strict in its argument; - short-circuitingthe - operator, which is now allowed; - translation to C; - uiibo~ingthe ~ g u m e n t ; - accumulating the argument; and further optimizations performed by the C compiler.
The optimizations presented in this section showed that it is possible to generate efficient code for the factorial example. Likewise, many simple mathematical functions can be compiled into efficient C code. In general, however, strictness analysis breaks down on symbolic code that manipulates user-defined data structures; it is very difficult to determine whether or not certain fields inside a record are strict, especially when infinite data stmctures and higher-order functions are involved. Aggressive in-lining and constant propagation before c a r ~ i n gout strictness analysis reduces the problem somewhat, since all cases of known functions passed as an a r g u ~ e nthen t reduce to ordinary, non-higher"order, functions. Neve~heless,the daily practice is that most data s t~ c tu re sare passed lazily, which causes a significant amount of time being spent in the r u n - t i ~ esystem, The graph reducer and garbage collector may take more e~ecutiontime than the user-defined code bits in the worst case.
As mentioned several times before, the performance of functional programs is strongly depend~nton the amount of graph manipulation involved. ~ ~ e a m l i n i nthe g handling of graph nodes therefore has a noticeable impact on p e r f o ~ ~ a ~ i In c e this . section we discuss a few advanced techni~uesemployed by several reduction engines at the lowest level.
The basic graph reducer discussed in Section 7.5.2 uses a single graph node type (Figure 7.15) to represent the different possibilities. Alt~oughthis uniform coding eases the interpretation of graph nodes, it also wastes memory. For example, a simple number takes up as ~ u c space h as a function descriptor holding three attributes (its arity, name, and code address). The obvious improvement is to use v es, where each node consists of a tag followed by a number of fields depending on the node type. The raph reducer, and the garbage collector as well, is minimal since the code in c e is that less Figure 7.17 is based on a case analysis of the node tag. The p e ~ o ~ a n effect heap space will be allocated, resulting in fewer garbage collections. Another effect is that cache pe~ormanceincreases since unused parts of the uniform nodes no longer take up precious cache space.
Graph nodes arequite small, so the node tagfield consumes relatively much memory. Note that a graph node is always identified through a pointer, or else it is garbage, so we can lift the tag from the node and put it in the pointer itself. The least significant bits in a pointer are ‘free’, because of alignment constraints for the nodes. Typically nodes are aligned on word (4 bytes) boundaries, allowing 2 bits for tagging. Four tags is precisely what we need to encode the node types usedby the basic graphreducer; the node-type defined in Figure7.15enumerates five values,but we caneasilyrepresent‘Nil’ as a constant(null) pointer. If more tags are needed the most significant bits in a pointer can be used for taggingtoo. This reducesthemaximaladdressablememory space, but,whenusing32-bit pointers, taking two bits off still leaves 1 Gbyte. In combination with the two least significant bits, this provides 16 different tags, which is enough to represent the (most common) node types in any graph reducer. ging saves one word per graph node, which reduces the pressure on the cache and garbage collector. Another advantage is that inspecting the tag now consists of selecting a fewbitsfromthepointerinstead of dereferencingit.Theselectioncanbe is cheaperthanthememory accomplishedwith a simplebinaryANDoperation,and lookup required before. Nothing comes for free, of course, but the price of pointer tagging is small: before dereferencing a tagged pointer the tagbits must be zeroed to convertit into a ‘clean’ pointer. It typically requires one binary AND operation to zero the tagbits, so the benefits far outweigh the costs.
The basic code generation scheme from Section 7.6 translatesa function into a sequence of calls to constructor functions like Num () and Appl ( ) to build the graph representing the body of the function. This simplistic approach causes the nodes to located one by one. It is more efficient to allocate them all at once. ~ ~ g r ~ g a tal~ocation ~ canbe arranged, but requires some analysis of the AST to compute the total amountof heap space needed for the complete expression. Note that the compiler can no longer generate code that calls the constructor functions of the run-time system, but must generate the proper code for filling in the specific node type itself.
S supports the concepts of currying and higher-order functions very well, but has the drawback of generating long application spines for functions with many arguments. Calling an n-ary function in a lazy context requires n application nodes. Although aggregate allocation of the nodes at once reduces the costs of buildas a singlevector of ing the spinesomewhat,morecanbepresentingthespine pointers to the arguments. A vector is a variab~e-lengthgraph node consisting of a function pointer, a length field, and n argument pointers. The lengthfield is needed by both the graph reducer and the garbage collector to function properly. The vector apply node is more space efficient than a spine of binary application nodes if the number of arguments exceeds two (2+n versus 2n). Anotheradvantage of usingvectornodes is that
unwinding becomes cheaper, because the pointers in the application spine need not be traversed. Vector apply nodes frequently represent delayed function calls in which all function arguments are present; the number of arguments matches the arity of the function. In this case the vector apply node is in fact a self-contained closure. The graph reducer can improve performance by disc~minatingclosures from curried applications. Normally the graph reducer unwinds an apply node by stacking the argument list before finally invoking the function through its code address, The arguments are passed as an m a y of pointers. Inside a closure the arguments already appear as an may, so the graph reducer can skip the unwind phase and invoke the function directly, This saves processing as well as stack space. It is an important optimization since most vector apply nodes hold closures and currying is used (relatively) sparingly in most functional programs.
This concludes the chapter on compiling functional programs. We saw that the increased level of abs~actionprovided by functional languages over traditional imperative languages puts a burden on the compiler writer. Some features, like pattern matching and list comprehension, are considered syntactic sugar and can be handled quite easily in the front-end. Fundamental features like currying, higher-order functions, and lazy evaluation, however, require substantial effort from the compiler back-end even when C is used as the target language. Lazy evaluation implies an execution order (normal order reduction) that is different from the strict model offered by the target machine (applicative order reduction in C). Therefore the language run-time system centers around an interpreter, the graph reducer, that repeatedly searches for the next computation step (reduction) in a graph of unevaluated expressions. To increase performance, the functional language compiler exploits several optimizations to minimize or even circumvent interpretation overhead (for example, graph manipulation). Strictness analysis is the most important optimization since it identifies which expressions must always be evaluated, so traditional code can be generated for them, ~nfortunately, strictness analysis on symbolic code involving user-defined data structures is notoriously difficult. As a consequence a large fraction of the computation goes through the interpreter. This also explains why the garbage collector, discussed in Chapter 5, is another important component of the run-time system of a functional language.
- Functional languages promote a declarative style where programmers specify only what to compute. It is up to the compiler and run-time system to decide how, where, and when to compute (sub-)results. - Functional languages include special syntax (for example, list comprehension) and programming constructs (for example, higher-order functions) to raise the level of abstraction. This complicates the compiler and the run-time system.
-Syntactic sugar is handled by the lexical analyzer and parser. This includes the
offside
rule, list notation, list comprehension, and pattern matching. -Themore fundamental issuesconcernpolymorphictyping,referentialtransparency, higher-order functions, and lazy evaluation.
-A functional language compiler is typically structured as a front-end that t r a n s f o ~ sthe complete source language to an intermediate language that contains only the essential features: basic data types, structured data types (lists), non-nesting functions, local bindings (lets), expressions, higher-order functions, and lazy evaluation semantics. -Numerous optimizations are performed on the intermediate code (or actually the AST), before code is generated. Most compilers build on imperative compiler technology by generating C; C is used as a portable optimizing assembler. -Pattern matching is handled by generating a simpler function which includes a sequence of if-then-else constructs to explicitly check whether or not actual ~ g u ~ e nmatch ts a specific constant or type constructor. -List comprehension is handled by a translation scheme that transforms the (semi-)mathematical notation into a set of higher-order functions that iterate over lists and filter out the right elements. -Nested functions are floated to the top by the lambda~liftingtransformation. Pointers to identifiers in outer scopes are translated into pointers to additional arguments that are passed to the lifted functionsexplicitly. -Type checking of functional languages is complicated by p o l y ~ o ~ htypes; i c functions and expressions may have multiple types. Also, types need not be specified explicitly, so the compiler must infer the type of arbitrary expressions using type inference techniques. -Type inference first determines the constraints that are imposedby the body of the function at hand. The set of constraints is then solved by unification, which results in the most general instanceof the type of the function,
-Higher-order functions and lazy evaluation are tackled
by an interpreter known as the graphreducer,which explicitly manipulatesapplicationnodesrepresenting(partial) t is represented as a spine of rz function calls. A function call with rz ~ g u m e nexpressions application nodes, each holding a function and argument pointer. An application spine that holds a function and all its ~ g u m e n t sis known as a ‘redex’ (for ‘reducible expression’), -The graph reducer is a three-stroke engine that repeatedly selects a redex, instantiates the right-hand side of the function, and updates the root of the redex. Instantiating the right-r and side of the function builds the graph representing the body and replaces the
formal parameters with pointers to the actual arguments. - The reduction order controls the selection of the next redex. ~ o ~ a l - o r d reduction er always selects the leftmost outermost redex and, consequently, only reduces what is needed to progress the computation; this is lazy evaluation. Applicative-order reduction is used by imperative languages and evaluates the leftmost innermost redex first; expressions at argument positions are evaluated before calling the function, - ~ormal-orderreduction can handle a larger class of programs. It allows programi~ersto specify infinite data structures as long as they only use a finite part of them. - The code generator emits C code, which will be invoked by the graph reducer, to instantiate the body of a user-defined function at run time. - Expressions are translated into a sequence of application nodes. The if-then-else construct translates into a call to the conditional function supplied by the run-time system. Local bindings (let-expressions) translate into assignments to local variables in C. - If a function fdirectly calls another function g (with some arbitrary argument expressions) then control can be transferred directly to g without using application nodes. This optimization short-circuits top-level function applications. Expressions occurring as operands to a built-in s t r i ct operator rnay be evaluated immediately instead of being transformed into a (lazy) piece of graph. - A broad compiler, which has access to the AST of all functions, is needed to achieve acceptable performance. Strictness analysis and boxing analysis are program-wide analyses that avoid the manipulation of graphs in many cases. - Strictness analysis identifies for each function which arguments will always be evaluated, so corresponding argument expressions need not be passed as a graph of function applications, but may be evaluated safely before the call. - ~ e t e ~ n i the n g strict arguments of a function proceeds bottom-up through the AST. Function application is handled by looking up the strict arguments of the called function and passing the sets of strict arguments in the AST at the corresponding positions upward. The strictness properties of a recursive function are optimistically set to true. Then the strict arguments are d e t e ~ n e dby repeatedly propagating sets along the AST, until a fixed point is reached. - Once strictness analysis is performed it is no longer necessary to pass strict arguments as graph nodes. Boxing analysis determines which arguments rnay be passed by value and takes care of changing between boxed and unboxed representations when switching from interpreted to compiled code. - In assembly code, the tail call inside a routine R can be optimized into a stack adjjustment plus jump to the new routine, C. - In C we can only jump to the start of the current routine. This allows us to optimize the impo~antcase of directly tail-recursive functions by overwriting the parameters with the values of the tail call and jumping back to the start of function R. - Often recursive functions do not end in a tail call, but in an operator that processes the result returned by the tail call. The accumulating transformation may be applied when the operator is associative and the tail call is recursive. The body of the function is reorganized to take an additional argument that accumulates partial results in a strictly tailrecursive manner; the operation is performed as input to the accumulating argume~tof the tail call.
-The end result is that state-of“the-a~compilers for functional languages can emit code that is similartotraditionalimperative counterparts for theclass of nume~calalgorithms. ~ptimizingsymbolicandinherentlylazyprograms is muchmore difficult, though. -The last drop of performance can be gainedby streamlining the graph reduceritself. We discussed four advanced graph-~~ipulation techniques: va~.iable-length nodes, pointer tagging, aggregate node allocation, and vector apply nodes (or closures).
a1 programmingdiffers conside~~bly fromimperative progra~ming.The books (1998)andThompson(1999)provide a gentleintroduction into theworldof urrying, and lazy evaluation. 0th books use the Haskell language,which is the de facto functionallanguagewithlazysemanticsandwhich succeeds early lazy languages like SASL, Miranda, and LML. Standard ML (Milner et al., 1997) on the other hand, is the prominent functional language with strict, or non-lazy, semantics. It is quite popular, and the book by Paulson (1996) is a good place to start. LISP, o~ginatingin the early sixties, is still alive, with Scheme (Dybvig, 1996) being the prominent representative. Detailed infor~ationabout the imple~entationof functional languages can be found in the proceedings of the annual International Conference on ~ u n c t i o n a l P r o g r a m ~i~g K F P , which combinesthe former biennial Functional ~ r ~ g ~ ~ m and m i nCompu~er g Architecture -FPCA and Lisp and Fu~ctional Pro~ramming LFP conferences. The Implem e ~ t ~ ~ ~ o n o ~ ~ ~~ an nc gt ui o~ ngwor~shops ae sZ are also a rich source of in€ormation. Journals to consider are the J~urnalqf Functional P r o ~ r a ~ m iand n ~ ~, i g ~ ~ r - ~and r dSymer bolic C ~ ~ p u t a t i o(formerly: n LISP and Symbolic computation^. A systematicstudy of functional language implementationsis given by Douence and Fradet (1998). This chapter discussed the modern approach of compiling functional languages using C as theuniversalassembler.Thisapproach is themostrecentstageinthehistorical develop~entof implementing functional languages, which progressed from poorly performinginterpreters (SF3 cornbinators(Turner,1979))throughsupercombinators (Gmachine (Johnsson, 1984)) to native code generation. The standard reference to the early work is The Im~leme~tation o~Func~iona1 Progra~ming ~ ~ n g u (Peyton a ~ ~ sJones, 1987). The follow-up I~plementing ~unctional~ n g u a g e s (Peyton Jones and Lester, 1992), is e espe~iallyuseful for practitioners since it describes the spineless tagless ~ - ~ a c h i nthat forms the heartof the state-of-the-art Haskell compiler from Glasgow ~niver~ity. The Glasgow Haskell Compiler (GHC) is available from the Internet (http://w~w.h~skell.org/ghc/), and comes with a complete set of sources, It is structured such that ‘others can plug in their own strictness analyzer, profiler, front-end, back-end,or otherspecial pass’. Thisopenstructure,andthe availabi y of allsourcecode,eases e ~ p e ~ m ~ n t a twith i o nthecompiler,althoughlearningthe C ’S internals is quoted to be ‘a fairly substantia^ exercise’.
-
7.1
In Section 7.3, the type of map is described as map : : ( b - > c ) - > f bl - =. f c] 'since the - > operator is ~ght-associative,which is consistent with currying'. Explain why the right-associativity of the - > operator is consistent with currying and what would be wrong if the type of map were specified as map : : b - > c - >
Ebl 7.
->
[el.
Derive the type of the followi~gfunction foldr, which folds a right-associative operator into a list of values: foldr op val [ 1 foldr op val (x:xs)
=
Val. op x (foldr op val x s )
7.3
Haskell uses the dot operator ( .) to denote function composition (f .g). What is the polymorphic type of this built-in composition operator?
7.
Give equivalent f~nctional-coreexpressions for the following lists: 1ist If [l..481]
7.5
Pattern matching is considered syntactic sugar that can easily be compiled away. Transform the function unique (a:b:cs) = if (a == b) then a : unique cs else a : unique (b:cs) unique cs = cs
which contains a nested pattern, into an equivalent functional-core routine. 7.
The pattern ans slat ion in Figure '7.6 contains code to catch cases in which all patterns fail. (a) Argue that the activation of this code signals an error in the specification of the function. (b) Argue that it signals an error in the use of the function.
7.7
~ a t t eat~chin^ by ~ e ~ e a t ev da ~ i a b ~ eadvance^ ~: topic : Some functional languages (for example Miranda, but not Haskell) allow another form of pattern mat chi^^, pattern matching by re es. This pattern allows an argument name to occur more than once in the left-hand side of a function definition and requires these occurrences to have the same value for the match to succeed. This allows, for example, defi~itionslike equal x x = 1 equal x y = 0
(a) Find out the precise semantics of this form of pattern matching in Miranda, with aneyeto,forexample, f (x:(y:xs)) (y:(x:ys)).
esign a translation technique for the translation of these p a t t e ~to~the functhe strictness computation rulesfor the code resultingfrom these patterns. e de~nitionof the quicksort function from Section 7.1.3 uses listcomp~ehensions. e the translation scheme in Figure 7.10 to derive a version that uses higher-order nctions instead. he t r a n s f o ~ e dq s o r t functioninExercise 7.8 useslocal la~bda-lifting transfo~ation to float these functions out.
functions. Usethe
plicative-order reduction may compute expressions that are not needed, Give an e x a ~ p l ewhere applicative-orde~reductionevenfails to terminate,while nomalorder reduction succeeds inr e t u ~ i n ga result (by being lazy). he ~ r e e - s t r o ~reduction e engine ends each reduction step by updatin~the root with the result. ~ ~ i t t i the n g update may lead to exponential overhead. Give an example that de~onstratesthis. he reduction engine in Figure 7.17 updates the root node by copying the value into is approach does not workif we extend the reducer to handle user-defineddata structures,whichcanbe of arbitrary size. Theusualwork-around is to introduce ction nodes that consist of a tag and a pointer to the true value. Discuss the es of the eval () code that are required to incorporate such indirection nodes. rsive definitions inside let-expressions can not be translat~dinto assignments to G variables. Explain. A solution is to use a two-phase approach. First, generate code for the recursive definition while using a blank entry for each forward refere. Second, generate code to backpatch the blank entries. Give the trans~ationfor function rep n = let 1st = n in 1st
:
1st
~ h i c hgenerates an i n ~ n i t elist of Ins. lain the opti~i~ations in Section 7.6.1 as examples of partial evaluation. hat are thestrict ~ g u m e n t of s the following function? f x y z O = x + z f S y z p = f y 0 0 (p-l) + f z z
0
(p-l)
lain the unboxing t r a n s f o ~ a t i oof~ Figure 7.33 to Figure 7.34 as an example of ial evaluation.
n the last few paragraphs of Section 7.7.3 we examined the problem of fil ~ a ~ e trecord e r with values, some of which possibly derive from that same
eter record, If, for simplicity, we assume that all values have the same size, we can formalize the problem as follows. Given the set A of assignments I?-new[ll L?-new[21
:= :=
fl(. . . ) ; fz(. . . 1 ;
:=
fN(. . . ) ;
*.. P-new[NI
in which P-new [ ] is an array, and the function ~ g u m e n t s(. . .) may contain any combination of the elements of an array P-old, give an algorithm for generating code for the assi~nmentsin A that works correctly even if P-new and P-old are actually the same array. These assi~nmentsdo what is called in situ r~pZuce~ent.
7.18 The mappend function given in Section 7.4.3 looks like a suitable candidate far the accumulatin~argument transformation, but it is not. Explain. Hint: consider the performance consequences. 7.19 Discuss the benefits of using vector apply nodes when evaluating the follo~ing expression: take 100 (from 1)
where take 0 xs = take n (x:xs) =
from n
=
n
:
[I x
:
take (n-1)s s
from (n+l)
ist to^ of f ~ ~ c t i o nZunguag~ ul i ~ p Z ~ ~ ~ n t a: Study t i o n Landin's paper ( 1964) on the m e c ~ a ~ i cevaluation al of expressions, and identify and summarize the concepts and techniques described.
In thepreviouschapter we haveseenthatthecompilation of functionalprogramsis basically simple, once the idea of a closure is understood. The situation for logic programs is different: even imple~entingthe basic mechanis~s already poses considerable roblems. Logic p r o g r a ~ ~ i nisg based on the specification of ‘relations’between‘terms’, ‘facts’ using these relations, and ‘rules’ for ‘inferring’ new facts from existing facts. Some or may contain logical variables. of the terms involved in these facts and rules may be Facts and rules together are called ‘clauses’. A program in a logic language consists of a ‘list of clauses’. The point of logic programming is that it is possible to request specific inferred facts by doing ‘queries’. Such inferred facts are obtained by the program by searching through the clauses, combining them in various ways and doing backtrac~ing when the do leadto a desired fact. During combinationsleadnowhere,andoftenevenhey thissearch,thevariablesintherulesmaybetentativelytovariousterms,andthese bindings will have to be undone duringbac~tracking.The terms tried for variables may be co~stants9butmayalsobeothervariablesorstructures;thesestructuresmayhold constants,boundorunboundvariablesand/orfurtherstructures.Also,thevariablesare themselves terms. For these reasons, the usual means of binding variables to values, the a s s i g ~ ~ e nist out , of thequestion,andthebinding is performed by a processcalled ‘unification9,to be explained furtheron in this chapter. Note that ‘a variable is bound to a value’, whereas ‘a value is assigned to a variable’. The reason for using the verb ‘to bind’ in this wayis that, unlike traditional variables, logic variables can be unbound-have no binding -or bound -have a binding -and that if they have a binding, that binding leads toa value (or another variable). An example of a relation is ‘is a parent of‘. Facts using this relation might be ‘Arne is a parent of Sachiko’ and ‘Sachiko is a parent of Rivka’. Another example of a relation is
‘is a grandparent of‘, and the two relations might be linked in a rule ‘ X is a grandparent of Z if there is a Y such that X is a parent of Y and Y is a parent of Z’, in which X9 Y, and Z are logical variables. The system should then be able to infer from the above facts the new fact ‘Arne is a grandp~entof Rivka’. The first logic language processors were inte~reters,and in fact there has long been serious doubt if logic language programs could be compiled profitably at all. Today, for problems of a logic programming nature, logic programs compiled by high-end logiclanguage compilers achieve speeds that approach those obtained for the c o ~ e s ~ o n d i n g imperative implementations to within a factor of 2 of 3. Compilation of full backtracking search over the clauses and of unification of the terms require substantial implementation effort even in their non-opti~izedform, but modem Prolog compilers achieve the remarkable speed of their object code by doing extensive optimizations, Fortunately, many optimizations consist of specializations of more general code and can be understood easily on the basis of that code. We will first explain the full backtrac~ingsearch mechanism by showing a very simple and inefficient interpreter for it (Section 8.2). This is followed by a discussion of the unification of two terms (Section 8.3). We then concentrate on a general compilation technique for full recursive backtrac~ngsearch over the clauses combined with full recursive b a c ~ t r a c ~ nunification g of the terms. Its main property is that it is a uniform i~plementationof a logic p r o g r a ~ i n gmodel that is slightly more general than is required by Prolog, and from which more specialized implementations can be derived (Section 8.4). Often, some of the search code can be performed at compile time and part of the remainder can be generated as chunks of in-line code. This often allows us to avoid calling upon the general search mechanism. This optimization is discussed in Section 8.4.3. Next, we restrict ourselves to ~~nification as used in Prolog, which does not require bac~tracking; this yields a more imperative implementation of unification (Section 8.5). As with the search code, often some of the unification code can be pe~ormedat compile time and some of the remainder can be generated as chunks of in-line code. Finally, we apply one optimization to the unification code: ‘Read’/‘Write’ mode (Section 8.5.4). Some compilers for logic languages simplify life by generating C code rather than assembly code, and so will we in this chapter. There is a problem with this approach, however: the most natural expression of full backtrac~ingsearch requires nested routine declarations, which standard C does not allow. Fortunately, one of the most important C implementations, the GNU C compiler, has an extension that allows exactly these nested routines. Also, nested routines are available in a few other imperative languages. Some compilers for advanced logic languages take the level of the target code another step higher and generate Prolog code (Debray, 1994). Whereas few intermediate codes for imperative languages have reached more than local fame, there exists a more or less well-known i n t e ~ e d i a t ecode for Prolog: the ine or ; this may reflect on the complexity of the task of omp piling Prolog. The WAM document defines 39 instructions that together allow an efficient com~iledimplementation of Prolog, including the ‘cut’ operator but excluding assert,retract,and the numeric operators (Warren, 1983). The instructions deal with such things as selecting the proper clause to use in inferring a fact9 unifying under various circumstances, and backtracking. Each inst~ctioncorresponds reasonably closely
to a node in the AST of a Prologprogram;thismakesintermediate code generation relatively easy. does all memo^ allocation, for which it uses five stacks; it can do its own eallocation,butefficiency can beincreased by using an internalcompactinggarbage ollector on some of the stacks. The machine incorporates quite a ~unlberof optimizations to the basic Prolog execution model. Many of these opti~izationsinteract with each other, eresultthatmostWAM i~structionscannot be understood in isolation:several s of the defining document are required to understand the purpose andapplica~ility of the instructions, The WAM report gives translations of the WA i n s t ~ c t i o ~ into s an code, which allows WAM to target code generation almost by macro acihas ~ r i t t e na tutorial on the WAM (Ait-Kaci, 1991). too detailed for treatment in this book. We will first concentrate on the e ~ ~ c u t i omodel n for logic languages and then derive a few WAM-like instructions it, with optimizations. This approach avoids the co~plicationsinherent in interacting ~ ~ t i o nand s may also be helpful in designi~gWAM-like inst~ctionsets for other es. For another derivation of some WAM instructions see Kursawe (1987).
8.2 The general i ~ p l e m e n ~ t i omodel, n interpreted 8.3 ~nification
.4 The general implementation model, compiled
8.5 Compiled code for unification
concepts of logic ~ r o g r a m ~ i n g , ~ ~ ito ~review l y the terminology andto introduce a simple working example.
a t i o ~between ~ terns, facts stating such relafrom established facts. A faet consists of the name of the r ~ l a t i oand ~ its terms, in some syntactic notation;it is an instance of the named relaf a relation is p a r e n t /2, which notation means that the relation is governs 2 terms; in other words, it is a ortantin a givenlogiclanguageto another relation p a r e n t /l, which would then govern I term and . A few examplesof facts using the relationp a r e n t / 2 are:
parent parent parent parent parent
( a r n e, j ames) . ( a r n e, s a c h i k o ) . (koos, r i v k a ) . (sachiko, r i v k a ) . ( t r u i t j e , koos) .
It is more than likely that the first fact is intended to mean that Arne is a parent of lames, and that similar meanings should be attached to the other four facts, but of course such an inte~retationis in no way necessary for a logic program to function. Given the above facts, a query could be ? - p a r e n t (X, r i v k a )
.
in which X is an u ~ ~ o~ ~u r ~i ~onv ~ vent l e ion . ally, names of variables begin with a capital letter, and names of constants begin with a small letter. The query ? - p a r e n t ( X I r i v k a ) asks the system to scan the facts, to find those which match the query, and to show the values of the variables for which the matches occur; note the plural 'matches,, since more than one match may occur. The body of the query, p a r e n t (X, r i v k a ) is our initial goal of the search. A fact m~tchesa goal if there is a binding for the variables in the goal and the fact that makes goal and fact equal. The above goal results in two successive bindings, X = sachiko and X = koos, since the first makes the goal equal to the fact p a r e n t (sachiko , r i v k a ) and the second to p a r e n t (koos , r i v k a ) Queries do not need to contain variables: the query ? - p a r e n t (koos, r i v k a ) results in yes and the query ? - p a r e n t ( r i v k a, a r n e ) results in no. On the other hand, facts may contain variables: the fact p a r e n t ( X I r i v k a ) means that X is a parent of Rivka for any value of X, The fact p a r e n t (X, r i v k a ) can be considered as shorthand for an infinite number of facts: p a r e n t ( a, r i v k a ) , p a r e n t (b, r i v k a ) , ... 'This feature is useful, for example to express the fact equal (X, X) which says that x: equals for any value of X. An example of a rule is grandparent ( X , Z )
:-
p a r e n t (X, Y) , p a r e n t (Y, Z )
.
which means that when we have established the facts p a r e n t ( X I Y) and p a r e n t ( Y , Z ) for some values of X, Y, and Z, we may infer the fact grandparent (X, Z ) . Another interpretation of this rule is that we may replace the goal grandparent (x, Z ) by the goal list p a r e n t (X, Y) , p a r e ( y , %: ) for any bindings of X,Y,and Z. 'The part before the : - token is called the he of the rule, the part after the : - token is y of the rule. We see that the comma in the body can be interpreted as a logical AND. Again the rule may be seen as a summary of infinitely many rules, obtained by infinitely many different bindings for X, Y,and Z:
g r a n d p a r e n t( a , c ) :- p a r e n t( a , b), p a r e n t (b, c ) . with X = a , Y=b, Z=c g r a n d p a r e n t( t r u i t j e ,r i v k a ) :p a r e n t ( t r u i t j e , koos) , p a r e n t ( k o o s , r i v k a ) . with X = t r u i t j e , Y=koos, Z = r i v k a g r a n d p a r e n t( a r n e ,r i v k a ) :p a r e n t ( a r n e , koos) , p a r e n t ( k o o s , r i v k a ) . with X = a r n e , Y=koos, Z = r i v k a
...
Note that the facts required by the second example have been established above and that t e , r i v k a ) ;this is not the wemay therefore infer the fact g r a n d p a r e n t ( t r u i j case in the first and the third example. Actually, facts and rules are not as different in nature as it seems: rules estab~ishfacts conditionally upon other facts, facts do so unconditiona~ly. It is therefore convenient to play down the difference and combine them in the notionclause.
Now suppose the system is presented with the query ?- g r a n d p a r e n t ( a r n e ,
X)
.
which asks for an X of which a r n e is a ~ran~parent, and makes g r a n d p a r e n t ( a r n e, X) our initial goal. The system then tries to find or infer the goal for some binding for X. Thereareno grandparent factsdirectly,butthere is a rule for inferringthem,the clause g r a n d p a r e n t ( X , Z) :- p a r e n t ( X , U), p a r e n t (U, Z ) .
To get an applicable instanceof this clause, we makea copy of it with fresh variables in it: g r a n d p a r e n t ( X l , Z1) :- p a r e n t (Xl, Y1) , p a r e n t (Yl, Z 1 ) .
and try to make the goal g r a n d p a r e n t ( a r n e, X) and the head of this clause equal by binding the variables in them; this is the unification process we mentioned above. It is easy to see that the bindingsX1=arne and z1=Xdo the job; note that wecan bind a variable (z1)to a variable (X). This unificatio~transforms the clause copy into g r a n d p a r e n t( a r n e ,
X)
:-
p a r e n t( a r n e ,
Y1) , p a r e n t (Y1, X ) .
~e can now replace our goal g r a n d p a r e n t ( a r n e, X) by a new goal list: p a r e n t ( a r n e , Yl) , p a r e n t (Yl, X ) . Next, we try to unify the first part, p a r e n t ( a r n e, Y 1 ) with a fact and find that there are two possibilities, p a r e n(ta r n e , james) (with y1= j ames) and p a r e n t ( a r n e , s a c h i k o ) (with Yl=sachiko). The firsttransformsourgoallist into parent(arne,james)
, parent(james,
X)
.
but no fact can be unified with parent ( j ames, X), so this is a dead end; the second t r a n s f o ~ sour goal list into parent (arne
I
sachiko) , p a r e n t ( s a c h i k o
I
X) .
and now we find that parent (sachiko, X) can be unified with the fact parent (sachiko, rivka) by using the binding X=rivka. So we have shown that our query can be reduced to a list of known facts by the binding X=rivka,and the answer to the query ? - grandparent ( a r n eI
X) ,
is grandparent ( a r n e
I
rivka) .
This corresponds to the fact ‘Arne is a grandparent of Rivka’ mentioned at the beginning of ~ ~ e this chapter. The process by which it was derived is an example of an ~ The above covers the basics of the logic para dig^. All logic languages have many additional features, usually incl~dingstructures, lists, and other more advanced data types, integer arithmetic, string manipulation, ways to influence the search order, and ways to abort a search. We will only consider two of these additional features in this chapter, both from Prolog: structures and the ‘cut’ mechanism.
We can use a more precise version of the above inference technique to explain the workings of a very simple interpreter for logic languages. This interpreter is not intended as a serious implementation and serves as a model only; also, it imple~entsonly the bare bones of the logic paradigm, as explained above. The main data structure of this interpreter is the goal list stac . Each of the goal lists on the stack is a separate possibility to satisfy the query, independ t of the other goal lists: the semantics of the stack is the logical OR of all its elements. This set-up is depicted in Figure 8.1. The interpreter works on the top goal list on the stack only. It pops it off the stack, processes it, and pushes zero, one, or more new goal lists back onto the stack. The stack starts with one entry: the body of the query, followed by a special that has the variables in the query and their names as terms. W e answers to the query have been found and the processing stops. The goal list notation is extended with the construct [ P ? = Q 3 , which signifies request to unify P and Q, and the construct [ P == Q ] , which signifies that P and have been unified successfully. These constructs are used as follows. As we have seen above, u~ificatjonserves to adapt a copy of a clause to a given goal list by finding the ~ i n i ~substitutio~ al that will m&e the head of the clause and the first goal in the goal list equal. For example, the (clause copy, goal list) pair
~
e
OR
OR
... OR
bottom element
ure 8.1 The goal list stack.
randparent ( X l , Z1) :- p a r e n t (Xl, Y 1 ) , p a r e n t (Yl, Z1). r a n d p a r e n t ( a r n e , X ) , << ( I t X I 1 , X ) .
by the substitutions X l = a r n e and Z1=x: grandparent (arne, X ) :- p a r e n t ( a r n e , Y 1 ) , p a r e n t ( Y 1 , X ) . g r a ~ d p a r e ~ t ( a r n e , X << ) , ( t r X t tX, ) .
ote that unification is always p e r f o ~ e dbetween one term from the head and one term oal, never between two terms from a head or from a goal. Once an approp~ate for the unification has been found and performed t h r ~ ~ g h o u the t clause copy l list, we can replace the first goal in the goal list by the body of the clause a r e n t ( a r n e , Yl), p a r e n t ( Y 1 X , ),
<<:(rlXtr,
X).
e will write the (clause copy, goal list) pair as a single goal list by provisiona~ly inserting the clause copy after the first goal, and enclosing the first goal and the head o f the clause copy in a [?= 3 construct: r a n d p a r e n t ( a r n e , X ) ?= grandparent ( X I , Z1)], p a r e n t (XI, Y l ) ,p a r e n t ( Y 1 , Z1), << (ltXtt,X ) .
his can be read as 'If g r a n d p a r e n(ta r n e I X ) can unified be with r e n t (XI, zl),the goal list p a r e n t (XI, Y1) , p a r e n t ( Y I I Z 1 ) , X) after substit~tionis valid'. The concatenation of thebody of the clause copy andthe originalgoal list tail is auto ma ti^ inthis notatioll;also, thesubstitution the unification can be p e ~ o r ~ eindone sweep over theextended goal list. er su~cessfulunification, the ?= is changed to = =:
[grandparent ( a r n e , X ) == grandparent ( a r n e , X ) 1 , p a r e n t ( a r n e , Y 1 ) , p a r e n t ( Y l , X ) , < < ( l i X 1 i, X) .
The difference between ? = and == serves to guide the interpreter.
The inte~reteris iterative and has four instructions: ‘Attach clauses’. ”he top element T does not contain ?= or ==, nor is it the special display goal; it is popped off the stack. For each clause C in the program, a copy TI of T is made, a copy C1 of C is made, all variables in C1 are replaced by fresh variables, C1 is attached to TI using the [ ? = ] construct, and the result is pushed onto the stack. The process is depicted in Figure 8.2. ‘Unify’. The top element T contains ? = and is popped off the stack. An attempt is made to unify the goal and the clause head in T in the [ ? = I construct, resulting in zero or more substitutions. For each substitution S a copy of T is made, and S is applied to all elements in the copy, The zero or more resulting goal lists are stacked (in Prolog at most one goal list results). If the unification is completed in any of them, the ? = separator is replaced by the == separator. ‘Match’. The top element T contains == and is popped off the stack. The [ = = I construct, which contains the two unified goals, is removed from it and the result is push~d back onto the stack. ‘Succeed’. The top element T is the special display goal and is popped off the stack. Its terms are displayed in answer to the query. is easy to see that in each situation only one interpreter instruction is applicable.
clause head :- clause [ first goal ?=clause bead
first goa i i he r goals.
1, clause body, other goals.
.2 The action of ‘Attach clauses’ using one clause.
We ?-
will
now
show
grandparent ( a r n e , X)
how
this
interpreter
processes
the
query
. To keep the example small, we reduce the program to
four clauses: p a ( a r n e , jarnes) pa ( a r n e , s a c h i k o ) pa ( s a c h i k o , r i v k a ) g p ( X , Z ) : - pa(X, Y ) , pa(Y, tz)
inwhichwe ab~reviatep a r e n t and grandparent to pa and gp, respectively,and leave out the (Prolog) dot at the endof each clause. We show the stack with one entry to a line. The initial stack is gp(arne,X),<<(ItXIt,X)
The applicable instruction is 'Attach clauses' and in four entries: Egp(arne,X) ?= Egp(arne,X) ?= [gp(arne,X) ?= [gp(arne,X) ?=
since there are four clauses, this results
pa(arne,james)],<<(ItXl1,~) pa(arne,sachiko)l,<:C:(~~X~~,X) pa(sachiko,rivka)l,<<(llX1l,X) gp(Xl,Z1)1,pa(X1,Y1),pa(Y1,Z1), < C : ( ~ ~ X ~ ~ , X )
A copy of each of the four clauses has been inserted after the first goal, using the [?= ]
construct. Note that the variables in the clause for gp have been renamed. Note also that the entire goal list has been copiedfour times. The inst~ctionapplicable to the top goallist is 'Unify', but it fails im~ediatelysince mot even the names of the relations to be unified (gp and pa) match; so this goal list is rejected. The same happens to the next two goallists. This leaves us with one goal list: [gp(arne,X) ?= gp(Xl,zl)I,pa(X1,Y1),pa(Yl,zi),<<("X",x)
e uni~icationattempt of gp ( a r n e X) and gp (X1 for XI and X for 21: I
I
Z1)
succeeds when we substitute
[gp(arne,X) == gp(arne,X)I,pa(arne,Y1),pa(Yl,X)r < < ( t t ~ l l t ~ )
The next applicable instruction is 'Match', which removes the two unified goals: pa (arne,Y1),pa(Y1,X),c;< (IlXt1,X)
Ap~lying'Attach clauses' results in Cpa(arne,Yl) ?= [pa(arne,Yl) ?= [pa(arne,Yl) ?= [pa(arne,Y1) ?=
The top entry unifies goals:
pa(arne,jarnes)] ,pa(Yl,X),<<(tlX1lrX) pa(arne,sachiko)l,pa(Yl,X),<<(lrX1t,X) pa(sachiko,rivka)],pa(Yl,X),<<(ilX1lrX) gp(X2,Z2)1,pa(X2,Y2),pa(Y2,Z2),pa(Yl,X), ~ < ( ~ ~ x ~
by putting Y1 equal to j ames; 'Match' then removes the unified
pa (j ames, X) ,c < ( '(Xt1, X) [pa(arne,Yl) ?= pa(arne,sachiko)],pa(Yl,X), < C : ( ~ ~ X ~ ~ , X ) Epa(arne,Yl) ?= pa(sachik0,rivka)l,pa(Yl,X)r<<(ttXfl,X) Ipa(arne,Yl) ?= gp(X2,Z2)1 ,pa(X2,Y2),pa(Y2,Z2),pa(Y1,X), < < ( t l X i l ,
t is impo~antto note that the substitution of Y 1 takes place in the top entry keeps the entries completely independent. 'Attach clauses' yields
only; this
[pa( james ,X) ? = [pa(james,X) ?= [pa(james ,X) ?= [pa(james,X) ?= [pa(arne,YI) ?= [pa(arne,Yl) ?= [pa(arne,YI) ?=
pa ( arne ,james) I ,< < ( "X" ,X) pa(arne,sachiko)l,<<(ItXrt,X) pa (sachiko,rivka)I , < < ( "X",X) gp (x,z)1 ,pa(X,Y) ,pa(y,Z ) ,< < ("X",X) pa(arne,sachiko)l ,pa(Yl,X),<<(llX'trX) pa(sachiko,rivka)l ,pa(Yl,X), < < (llXst,X) gp ( ~, 22) 2 1 ,pa(X2,Y2z),pa(Y2, Z2),pa(Y1,X) ,< < ( ''Xll,X)
Four unification attempts fail, leaving [pa(arne,~1) ? = pa(arne,sachiko)1 ,pa(Yl,X),<<("Xl',X) [pa(arne,Yl) ?= pa(sachiko,rivka)l ,pa(Yl,X), < < ("X",X) 2 I ,pa( ~,Y2) 2 ,pa(Y2,122),pa(Y1,X), < < ("X",X) [pa(arne,YI) ?= gp ( ~,22)
Again, a unification attempt succeeds, with Y 1= sachiko this time, and after matching results in pa(sachiko,X), << (lfX1f,X) [pa(arne,Yl) ?= pa(sachiko,rivka)l ,pa(Yl,X), < L : ( ~ ~ X ~ ~ , X ) 2 1 ,pa(X2,Y2),pa(Y2, 22),pa(Y1,X), < < ("XIt ,X) [pa(arne,Y1) ? = gp ( ~, 22)
The top entry again requires an 'Attach clauses': [pa(sachiko,X) ?= pa(arne,james)l , < C : ( ~ ~ X ~ ~ , X ) ,X) [pa(sachiko,X) ? = pa (arne, sachiko)1 , << ( ttXlf [pa(sachiko,X) ? = pa (sachiko, rivka)], << ( "X",X) [pa(sachiko,X) ? = gp (X3, Z3)1 ,pa(X3,Y3),pa(Y3,23),<<: ("X",X) [pa(arne,Yl) ?= pa(sachiko,rivka)l,pa(Yl,X), < < (llX1t,X) 2 1 ,pa(X2,Y2),pa(Y2, Z2),pa(Y1,X),<< (lfX1' ,X) [pa(arne,Y1) ?= gp ( ~, 22)
The attempts at unification on the first two entries fail, r e ~ o v i n gthe entries, but the third succeeds with X=rivka: [pa(sachiko,rivka) == pa (sachiko,rivka)1 , << ( "X", rivka) ,X) [pa(sachiko,X)? = gp (X3,Z3)1 ,pa(X3,Y3),pa(Y3, Z3), < < (l1XbV [pa(arne,Yl) ? = pa(sachiko,rivka)l ,pa(Yl,X),<<(ltXllrX) [pa(arne,Y1) ? = gp (X2,Z2)1 ,pa(X2,Y2),pa(Y2, 22) ,pa(Y1,X), < < ("XIt ,X)
'Match' removes the unified goals, leaving us for the first time with a display goal: < < ( ltX1v ,rivka)
[pa(sachiko,X)?= gp (X3, 23)I ,pa(X3,Y3),pa(Y3, 123), < < ( ItXf1,X) [pa(arne,Yl) ? = pa(sachiko,rivka)l,pa(Yl,X), < < (IIXll,X) [pa(arne,Y1) ?= gp (X2,22)1 ,pa (X2,Y2) ,pa (Y2,Z2) ,pa (Yl,X) , < < ( r!Xft ,X )
The interpreter inst~ction'Succeed' now displays the binding X=rivka in answer to the query, and removes the entry. The next three unifications fail, leaving an empty stack. Reaching the bottom of the stack signals the end of our search for answers to the query. It will be clear that an interpreter p e r f o ~ i n gthe four inst~ctions'Attach clauses', 'Unify', 'Match', and 'Succeed' can be written easily, but it will be equally clear that the interpreter has two glaring inefficiencies: 'Attach clauses' generates large numbers of goal lists that are obviously dead ends, and many goals in the goal lists are copied many times.
treatment of uni~cationhas been rather vague. We will now turn to these probwith an eye to improving the interpreter, which isa cardboard model anyway, but cause any serious compiled code based on the above model will exhibit the same probS if we do not solve them here,
ers of redundant processed even if it is obvio
an instance of the relation [pa(arne,Yl) ?= pa(sachiko,rivka)l,pa(Yl,X), C C ( ~ ~ X ~ ~ , X )
in which the relation names match, but the first term differs. If the names of the relations ile time, the first redundancy can be diagnosed and avoided easily at econd example is less straightforw~d:whereas the term sachiko is wn at compile time, theco~espondingterm arne is the result of a binding at run time. is required, but the possibility of this check can be determinedat comC code we will generate uses theidenti~ication m~chanism of C to avoid the beginning of Section 8.4) and some simple~ n - t i code ~ e to avoid the
N times for N applicable clauses in nd the clause only, and represent the [Pa(arne,Y 1 ) ?= pa (arne,james)], [pa (arne,Yl) ?= pa (arne, sachiko) [pa(arne, Y1) ?= pa (sachiko, rivka) ] Epa(arne,Y1) ?= ~ P ( x z , z ~ ) ] ,pa(~2,~2
pa (13,X) , << ("X",X)
is solves the efficiency problem, but introducesa new problem, which becomes apparent when we do the unification on the top element, binding the variable YI to the constant [pa(arne,james) ?= pa
(arne, james) 1 [pa(arne, Y1) ?= pa (arne, sachiko) 1, [pa(arne,Yl) ?= pa (sachiko, rivka) I, [pa(arne,Yl)?= gp(X2,Z2)1 ,pa(X2,
pa (james,X), < c ("X",X)
data structure now suggests that pa (j ames ,X) ,c c (IIXIf ,X ) will be the goal list for the other three unifications as well. But we know from the above analysis that the a1 list will eventually fail, and that the goal list tail that leads to success is actually chiko,X) ,c c (r r X 1, rX). So by the time the unification [pa(arne ,Y 1 ) ?=
pa (arme sachiko] ) is being done, the old goal list tail pa (Yl,X) << ( llx:ll,X) must have been restored, The problem is solved by recording the fact that the binding was p e ~ o ~ tentatively ed and then undoing it - b a c ~ t r a c ~ nover g it - when is dropped. This is the first moment in our discussion of the execution of logic programs that the idea of b a c k ~ a c ~ nactually g pops up: the need for b a c k t r a c ~ nori~ ginates from our wish to avoid copying the entire partial solution at each and any choice point. ow exactly tentative bindin~sare recorded and undone depends on the implementation details of the unification process. Also, we have left the unification process underspecified in the above example of the processing of a logic query. It is therefore time now to turn to unification in more detail,
Until now we have seen only the simplest form of unification, involving only constants and variables. Even then there are already three possibilities:
- unification between two constants: the unification succeeds if the two constants are equal and fails otherwise; - unification between an (unbound) variable and a constant: the constant is bound to the variable and the unification succeeds; - unification between two (unboun~)variables: the variable in the head is bound to that in the goal and the unification succeeds. We do not treat the case of a bound variable separately. A bound variable can be bound to a constant, in which case it is treated as that constant, or it can be bound to an unbound or bound variable, in which case it is treated as that un~oundor bound variable. The interpreter or the generated code must contain code to find out what a variable is bound to; this code is said to ‘dereference’ the variable. The uni~icationof two variables requires some thought and will be discussed in more detail in Section 8.3.3. It is i ~ p o r t ato ~ timple~entunification in such a way that it is possible to undo bindings, since this is required by backtracking. This is quite difficult in general, but we will see that in the traditio~alimplementation it is simple to undo the most recent binding; this rule applies recursively, so bindings B , B 2 , * , B, can be undone cheaply in the order B,, , . , 13 Fo~unately,that is all that the usual backtrack techniques require. We will return to this subject in Section 8.5. *
*
1 s
Most logic languages allow other terns in addition to constants and variables. Examples are st~ctures,lists, and sets. Structures are named lists of a fixed number of components; times ( 2 , X ) is an example of a structure with the name times and the co~ponents2 and X. The name of a structure is also called its ~ u n c t ~the r ; above example explains why. Note that st ~c t ur eslook like facts, but they do not express a relation; they just represent data structures and the structure times (2, X) above could well represent the mathematical f o ~ ~ u2x. la
Structures can only be unified with variables or other structures. Two structures are unified by the same rules as a (head, goal) pair is: their names must be equal, the number of components contained in them must be equal, and these components must be unifiable pairwise by finding the proper bindings, recursively if necessary, in top-down fashion. So times ( 2 , X) and times (2, times (U, Z ) ) canbeunified by binding X to timeS ( Y , Z ) ,since the substitution X=t imes (Y , Z ) turns both structures into times (2, times ( Y , Z ) ). There is a rather subtle difficulty here: suppose we try tounifythetwostructures t imes (2 , x) and t imes (2 , t imes ( Y , X) ) . Straightforward unification binds X to times ( Y , X) ) ,but the corresponding substitution X=t imes (Y , X) turns the first structure into times ( 2, times (U, X) ) andthesecond into times ( 2 , times ( Y , times ( Y , X) ) ) ! The reason is of course that we bound the variable X to X occurs, causing a circularity. In this case, the a structure in which that sarne variable unification is not possible, and the system should reject it. ~iagnosingthis situation is called the occur check. Theoccurcheck is perceivedasexpensive, since it seemsto require scanning the structures involved, and many Prolog systems forgo it, accepting the incorrect binding. Remarkably, this almost never leads to incorrect results; Plaisted (1984) has tried to find an explanation of this phenomenon. Martelli and Montana4 (1982) give an efficient bottom-up implementationof unification, which perforrns the occur checkfree of charge. Many logic languagesfeature lists as terns, An example is the Prolog list [X, plus(1,X) , times(2, X)]
Actually, a list is just a nested set of binary structures with functor denotation is shorthand for 11 1 1 1
(X,
11
111
(plus(1, X) ,
11
111
(times( 2 , X),
1
1
If,
and the above list
)
in which 11 [: ] is the empty list. In principle, lists can be implemented as structures using the 11 I 11 functor, but in practice systems use tailored code for them for efficiency reasons. ~nificationof lists follows the same rules as that of structures; for example, [X, plus(l, X) , times(2, X)]
and [a, plus(1, U), times(2, a)]
can by unified by X= a and Y = a . Another type of term is the set, as occurring for example in thefact pa({koos)sachiko},rivka).
which summarizes pa (koos ,rivka) . pa (sachiko, rivka) ,
This seems a harmless abbreviation, but now the goallist
[pa(X,rivka)
pa ( {koos I sachiko},rivka)I ,<< ( IrXlt ,X)
?=
yields two possible unification results:
I
[pa(koos,rivka) ?= pa ({koos sachiko}, rivka)] , < < ( IrXl1 ,koos) , sachiko) [pa(sachiko,rivka) ?= pa ( {koos 1 sachiko} rivka)] ,< < ( I
This requires extending the ‘Unify’ instruction. Since this multiplication of goal lists may occur to different degrees for each of the term pairs in the head and the first goal, the idea suggests itself to perform the unification one term pair at a time, rather than for the entire (head, goal) pair. The ‘Unify’ instruction then turns [pa(+X,rivka)
?=
pa(+{koos~sachiko},rivka)],<<(llX1r,X)
in which the term pair to be unified is marked with a preceding -+,into
I
[pa(koos,+rivka) ?= pa ({koos sachiko},+rivka) 1 ,< c ( rlXff ,koos) [pa(+X, rivka) ? = pa ({koos -+sachiko},rivka)1 ,c c ( llX1l ,X)
I
Another application of ‘Unify’ then turns the top goal list into [pa(koos, rivka)
==
pa ( { koos I sachiko} , rivka)3 , c c ( llX1l ,koos)
by unifying the second term. Now the question mark has been removed and the ‘Match’ instruction can be applied to remove the matching head and goal, After reporting the binding X=koos,the system proceeds with the next goal list on the goal list stack: [pa(+X, rivka)
?=
I
pa ({koos +sachiko}, rivka)1 << ( llXrt,X) I
The idea of performing the unification stepwise by term-term pairs simplifies its imple~entationand also renders unification of structures and of lists easier to understand, Consider, for example, the unification of CX, plus(1, X) , times(2, X)]
and [a, plus(1, Y) , times(2, b ) l
Working top-down, which is from left to right in our notation, X and a are unified by binding x: to a, next p l u s and p l u s are found equal, and 1 and 1 are unified trivially. Then x: and Y are unified, but X is dereferenced first, and yields a, so Y is bound to a; of course Y is dereferenced too, but since it is not bound, the dereferencing yields the variable Y again. After successfully unifying the subterms times and 2 , we try to unify X and b; x: is dereferenced and yields a, which cannot be unified with b, so the unification fails. This implies that the backtracking mechanism must undo the bindings Y = a and X=a,in that order.
A reasonable set of C data types for the construction of terns is shown in Figure 8.3. Starting at the last definition, we find Term defined as a di~criminatedunion of constant, variable, and structure; we have left out the other term types since they offer little new.
struct variable { char *name; struct term *term;
1;
struct structure { char *functor; int arity; struct term **components;
1;
typedef enum {Is-Constant, Is-Variable, typedef struct term{ Term-Type type; union { char *constant; struct variable variable; struct structure structure; } term; } Term;
Is-Structure} Term-Type;
/* 1s-Constant */ /* Is-variable */ /* Is-structure */
Figure 8.3 Data types for the co n s~ ct i o nof terms<
A constant is represented by a pointer to its value, stored as a string. A variable is represented simply by a pointer; we also keep its name. For a bound variable the pointer points to the term to which the variable is bound; an unbound variable is marked by a zero (null) pointer, A structure is implementedas a recordwith three fields, specifyingthe functor, the number of components N (its arity) and a pointer to an arrayof N components. Assembly-code implementations may leave out the pointer and put the array directly into the record, but this is difficult to do in C due to allocation problems. ~nification(Figure 8.4) i ~ e d i a t e l ydistinguishes two situations: unification involving a variable, and unification of two non-variable terns. To take care of any bound variables, the routine unify-t errns () dereferences both terms;if a term remains a variable after attempt an dereferencing, at it is unbound. The routine then calls unify unbound-va~iable() (Figure 8.6) or unify-non-variables () (Figure &?),depending on whether an (unbound) variable is involved. The routine deref () is shown in Figure 8.5; it follows the pointer chain untilit finds a non-variable or a variable with a zero pointer. The routine unify-un~ound-va~iable ( ) (Figure 8.6) starts by identifying a special case, the situation that both unbound variables are thesame; in this case unification is trivial and must be treated separately since we would otherwise bind the variable to itself. Next the term that is the variable is bound to the other term. It is all right if that second tern is again a variable; we will consider that case in more detail in Section 8.3.3. The binding is registered by calling trai l-binding () ,so it can be undonelater during backtracking. Note that it is not necessary to record the old value tobe restored; unlike in
int unify-terms(Term *goal-arg, Term *head-arg) { / * Handle any bound variables: * / Term *goal = deref (goal-arg) ; Term *head = deref (head-arg) ; if
II
(goal->type == Is-Variable head->type == Is-’Variable) { return unify-~nbound-variable(~oal, head) ;
1
else { return unify-non-variables(goa1, head);
1
I Figure 8.4 C code for the u~ificationof two terns.
Term *deref(Term *t) { while (t->type == Is-Variable && t->term.variable.term ! = 0) { t = t->term.variable.term;
1
return t;
1 Figure 8.5 The function deref ( 1
general bac~tracking,the value to be restored is always ‘unbound’. Exactly when the restore operation takes place depends on the clause search ~echanism, which will be covered in Section 8.5. The code for the routine unify-non-variables ( ) (Figure 8.7) is straightforward: if the two terms are of different type the unification fails, otherwise the proper routine for the data type is called. Two constants can be unified when a string c o m p ~ i s o non their values succeeds, as shown in Figure 8.8. The routine unify-structures ( ) (Figure 8.9) first checks that the two structures have the same arity, and then calls unify-terms ( ) for each pair of terns. If all this succeeds, the unification of the two structures succeeds; otherwise it fails. It will be clear that s i ~ i l a rroutines can be written for lists and other logical data types. For a way to i ~ p l e m e n multiple t unifications, which may, for example, arise when using sets as terns, see Section 8.3.1. We observe that the only place where a binding is made is in the routine unify - u ~ b o ~ n ~ - v a r i( )a ~ (Figure l ~ 8.6). We will now consider b i n d i ~of~variables and b a c ~ t r a c ~over n ~it in more detail.
int unify-unbound-variable (Term*goal, Term *head){ /* Handle identical variables:*/ if (goal == head) { /* Unification of identical variables is trivial */
1
else {
/* Bind the unbound variable to the other term: */ if (head->type== Is-Variable) {
1
trail-binding (head); head->term.variable.term = goal;
else { trail-binding (goal); goal->term.variable.term = head;
1
1 Figure 8.6 C code for unification involving at least one unbound variable.
int unify-non-variables(Terrn *goal, Term *head){ /* Handle terms of different type: */ if (goal->type != head->type) return 0; switch (goal->type) { case Is-Constant: /* both are constants */ return unify-constants (goal, head) ; case Is-Structure: /* both are structures */ return unify-structures( &goal->term.structure, &head->term.structure
1
1
);
Figure 8.7 C code for the unification of two non-variables.
~nificationbetween two variables is unlike anything that happens to variables in imperative languages: the variables are associated with each other in such a way that both are still unbound, and that when one of them is bound to a constant or a third variable, so is the other. The usual i~~lementation of this association is to bind one of the variables to the
int unify-constants (Term *goal, Term *head) { return strcmp(goa1->term.constant,head->term.constant) == 0;
1 Figure 8.8 C code for the u ~ i f i c a t i oof~two constants.
int unify-structures( struct structure *s-goal, struct structure *s-head )
{
int counter; if (s-goal->arity ! = s-head->arity 1 1 strcmp(s-goal->functor, s-head->functor) ) return 0;
!= 0
for (counter = 0; counter < s-head->arity; counter++) { if ( !unify-terms ( s-goal-~components~counterl, s-head->components[counterl )) return 0;
1 return 1;
1 Figure 8.9 C code for the unification of two structures.
other, and to dereference any variable before use to find out what it is actually bound to. There is a slight conceptual problem with this, though, since binding an unbound variable B to an unbound variable A results in B still being unbound. Figure 8.1Ob shows the result of a sequence of four variable-~ariable bindings: B to A, C to B,D to €3, and E to C. Although one would expect this sequence to produce the binding tree shown in Figure 8.10a, this is not what happens. Binding B to A results in a simple binding of B to A, but binding C to B starts by dereferencing B (and C), only to find that B is actually A. So, in the implementation, C is bound to A rather than to B. The same happens to the bindings D to B, and E to C. As we have seen above, the binding of an unbound variable V to a term 2" is implemented as a pointer in V to 7: Undoing the binding is easy if we know the position of this pointer: store a zero pointer in it. It is important to note that only the most recent binding can be undone easily this way, for the following reason. Since both variables are dereferenced before binding, the decision of which pointer is stored into which variable can depend on any of the previous bindings. In the above example, the binding of C to B resulted actually in C being bound to A because B was already bound to A. If we were now
E
C
D
J,d
B
A
-1Figure 8.10 Examples of v ~ a b l ~ - v ~ a bbindings. le
to attempt to undo the bindingof B to A, we would have to unbind C from A and bind it to B, but the data structure in Figure 8.1Ob gives us no possibility to do so. The most recent binding is the only binding that has the property that no other bindin S depend on it, so it can be removed without problems. Figure 8.10a contains enough i n f o ~ ~ t i otonundo any binding at any moment, but is much less efficient. Most logic languages, including Prolog, undo only the most recent binding and thus allow us to use the more efficient i~plementation of Figure 8.10b. More complicated trees of bindings can also occur, If, for example, C in Figure 8.10b is bound to an unbound variable Z, a pointer to Z is written into A, and A is registered for undoing purposes. This results in the tree of Figure 8.1la. Now, when the binding of C to i: is undone, the pointer inA is zeroed, restoring the situation to thatof Figure 8. 10b. Note that neither of these actions affectedC, the variable that was bound. The tree shown in Figure 8.1Ib could result from binding W to A; the tree of which W is part could have been created by binding V to W and then W to X, More in particular, a binding of any variable in the tree V , W , X to any variable in the tree A- E , Z ends up as a binding from X to 51;. Undoing such a binding just zeroes the pointer inX.
si This concludes our discussion of interpreted unification of logic terms; interpreted search was explained already in Section8.2. We will now look at compiled search and will return to cornpiled unification in Section 8.5. ~lthoughit is possible to generate code for statically recognizable uni~icationoperations, compiled code will still have to do full unificationinthegeneralcase. So evencompiledcodewillhaveto include the routine unify-terms ().
D
B
C
E
~i
A
i
A
W
31
i
xz-
Z
.I1 Examples of more complicated v ~ i a b l e - v ~ a b lbindings. e
int ~randparent(Term*X, Term Term *Y = new-Term() ;
*Z)
{
Y->type = Is-Variable; return parent(X, &Y) && parent(&Y,
Z);
}
for the inference rule grandparent (X, Z )
:-
parent (X, Y) , parent (Y, Z )
.
much the same way we generated the C code int expression (void) { return term ( ) && rest-expression ( )
;
1 ure 2.54 for the g r a m ~ a rule r expression
-+
term rest-expression
from Figure 2.53. That would be wrong, however, as the simple call grandparent ( I1arnetf &X) shows. The resulting call p a r e n t ( I1arnef1 , &Y) will successfully set kt to j ames but the subsequent call p a r e n t ( j ames , Z ) will fail; so the query grandparent ( I1arne1l, &X) will also fail, incorrectly. The reason is that to function properly, the routine should backtrack upon failure to the position where the latest choice was made. This position is inside the call of p a r e n t ( Ita r n e , &Y), where another value (sachiko) can be assigned to Y. For the call p a r e n t ( j amesf1, 5:) to start in the body of the routine for grandparent ( ) , the call p a r e n t ( It a r n e If , &Y) must have finished, however. In C and most other imperative ~anguages,no local i n f o ~ a t i o nis preserved for routines after they have finished, so it is impossible in these lan~uagesto backtrack to the indicated position inside the call p a r e n t ( a r n e It &Y) qfter that call has finished. One imperative language in which this is possible, is Icon.
The translation of the grammar rule in Figure 2.54 works only by virtue of the fact that the routine t e m () cannot succeed in more than one way in deterministic top-down parsing. It can match theinput in zero or one way, but not in more than one way, so once it succeeds, there is no reason ever to return to it, since it will not succeed a second time, Consequently it contains no latest choice position. It is this property that is formalized into the LL( 1) property in Section 2.2.4.1. There is a second suggestion thatis implicit in the above example:clauses correspond to routine definitions, and goals correspond to routine calls, More in particular, all clauses with the same relation name correspond to a single routine, which will be called when a goal of that name is to be unified andsubstituted. If we include the numberof terns of the relation in the routine name, the C identifier identification mechanism will already restrict any unification attempt to relations with the same name and the same numberof terms. h order to follow this idea, however, we have to come up with a representation of the stack of goal lists. We will first introduce a technique that allows a routine to ‘return’ more than one value in succession. We will then see that that technique, quite remarkably,also solves the second problem.
Functionsthatreturn a value do so becausethe caller wantstodosomethingwiththe returned value V , say some action A ( V ) . Putting it this way leads to the idea that the function could call A ( V ) directly, rather than returning V. This idea allows the function to return more than one value: rather than writing int return-two-values (void) { return 11 and then return13;
1
e
.
.
A (return-two-values() ) ;
/* use the values */
which is impossible in C and almost all other imperative languages (except icon), we can write void
1
two-values (void (*Act-on) (int)) { (*Actwon) (11) ; (*Act-on) (13);
... two-values (A)
;
/* use the values
*/
which is possible in C and any other imperative language that allows routines as parameters. The routine two-values () is called a list ~ r Q ~ e ~ u rsince e , it represents a list, [ l 1, 131. When it is called with the routine A () as a parameter, it performs the actions A ( l 1 ) followedbyA(1.3). List procedures are part of a nameless programming paradigm in which a composite data item is represented as a one-parameter routine that applies its parameter, an action rQ~t~n toeeach , of the componentsof the data item. of a list procedure is shown Figure in 8.12. The routine An example
number-pair-list ( ) represents the list of integer pairs [(I, 3), (4,6), (7, 3)]; the type 'list of integer pairs' is already suggested by the type of the routine. Note that the routine specifies the list of numbers only and gives no indication of the action(s) to be performed on them. It is data only. The routine number-pair-list ( ) can be used to define another routine which finds and prints all pairs in number-pai r-1 ist that sum up to a given parameter sum. One way to do so is shown in Figure 8.13. To use the number list routine, an action routine to be applied to each pair must be defined; this is fs-test-sum ( ) , which gets the values of the pair as parameters and does the checking. Neither number-pair-list ( ) nor fs-test-sum ( ) can have additional parameters without damaging the general applicability of number-pair-list ( ) as a list procedure. So the parameter of f ind-sum (int sum) must be passed by storing it in a global variable, fs-sum. A call find-sum ( 3.0 ) makes its parameter available globally and then lets number-pair-list ( ) call fs-test-sum ( ) for each pair of integers that number-pai r-1 ist ( ) represents. Finally, f s-t est-sum ( ) tests the required property using the global variable fs-sum. The output is Solution found: 4, 6 Solution found: 7, 3
void number-pair-list (void (*Action)(int vl, int v2) ) { ("Action)(1, 3) ; (*Action)(4, 6); (*Action)(7, 3) ;
1 Figure 8.12 An example of a list procedure.
The code in Figure 8.13 definitely lacks elegance; worse, the approach does not scale. Any serious program using this technique will contain many action routines, each belonging to a find-sum-likeroutine. If we are not careful, their names will clash and we are already trying to avoid such future name clashes by using the prefix fs- on the global names. This is not good enough, however, since the global variable fs-sum prevents the routine f ind-sum ( ) from being used recursively. Suppose in some problem the action routine fs-test-sum ( ) needs to call f ind-sum ( 4 2 ) to decide whether the pair of integers it is called with constitute a solution. This cannot be programmed conveniently in the style of Figure 8.13, since a call of find-sum ( 4 2 ) would overwrite the value of fs-sum. A good approach to program structure problems is often to keep the scope of names as small as possible. In our case this means that both fs-test-sum ( ) and fs-sum should be local to find-sum(), as shown in Figure 8.14. This allows each f ind-sum-likeroutine to have its own action routine named test-sum ( 1 without the danger of name clashes; moreover, and more importantly, the global variable f s-sum has
18
~ ~ c8.4 ~TheI ~ eneral o ~ implementation m o ~ e lcompile^ ~
int fs-sum; void fs-test-sum(int v1, int v2) { if (VI -t v2 == fs-sum) { printf (IlSolution found: %d, %d\n", vl, v2) ;
1
1
void find-sum(int sum) { fs-sum = sum; number-pair-list(fs-test-sum);
1 Figure 8.13 An application of the list procedure.
disappeared completely, allowing full recursion. The problem with the code in Figure 8.14 is that it is not ANSI C: it uses a nested routine. Fo~unately,one of the most important C imple~entations,the GNU C compiler, has an extension that allows exactly these nested routines. Also, nested routines are available in a few other imperative languages.
void find-sum(int sum) { void test-sum(int vl, int v2) { if (VI -t v2 == sum) { printf ('!Solutionfound: %d, %d\n", vl, v2) ;
1
1
number-pair-1 ist ( test-sum) ;
1 Figure 8.14 An application of the list procedure, using nested routines.
It should be noted that action routines differ from c o n t i n ~ f f t ~as o ~explained s in Section 6.3.3. Continuations represent the entire rest of the program, do not return to the caller, and consequently are usually called at the end of a routine. Action routines do return, and can be called freely at any place in a routine. They only represent that part of the program that is dependent on the data the action routine is called with - although in some programs that may of course include almost the entire rest of the program. We now have enough techniques to tackle the compiled implementation of logic languages.
We translate each clause for a given relation into a routine, and then create a single list procedure for that relation, which calls each of the clause routines in succession. Goals of the relation are ans slated as calls to the corresponding list procedure. Both the list procedure and the clause translations get a special action routine parameter in such a call. This action routine is to be called for each successful unification of the goal and one of the clauses for the relation. In fact, the action routine represents the goal list tail used in the interpreter in Sectio~8.2. ~e see that using list procedures solves both the problem of routines returnmultiple values and the problem of the duplicated goal lists, as promised at the end of introduction to this section. For example, the relation arent//2 in Section 8.1 translates into the list procedure shown in Figure 8.15. This routine calls five clause routines to establish the logical OR of the five clauses. The clause routines are local to and nested inside the routine parent-2 ( ) . They can therefore access its parameters as non-local variables. And when they have met with success, they will call the action routine g o a l - l i s t - t a i l ( ) . One of these clause routines, the one for p a r e n t ( a r n e sachiko) , is shown in Figure 8.16; it will be discussed in the next paragraph.
void parent-2 ( Term *goal-argl, Term "goal-arg2, Action goal-list-tail )
{
...
/ * the local routines parent-2-clause-[1-51() * /
/ * OR of all clauses for parent/2 * / parent-2-clause-1 ( ) ; / * parent (arne, james) . * / parent-2-clause-2 ( ) ; / * parent (arne, sachiko). * / parent-2-clause-3 ( ) ; / * parent (koos, rivka) . * / parent-2-clause-4 ( ) ; / * parent (sachiko, rivka) . * / parent-2-clause-5 0 ; / * parent (truitje, koos) . * /
Figure 8.15 Translation of the relation p a r e n t / 2 .
Suppose we have a query ? - p a r e n t ( a r n e X), resulting in the initial goal list p a r e n t ( a r n e , X) , << ( t r X t ,t X) as shown in the explanation of the interpreter. To expand the first goal p a r e n t ( a r n e , X) in this goal list, we issue the call 11 X, g o a l - l i s t - t a i l ) , in which the action routine goal-1 i s t-t a i 1 ( ) corresponds to the goal list tail < < ( XIt , X). Since parent-2 ( ) will call each clause for p a r e n t / 2 and since each clause will call g o a l - l i s t - t a i l ( ) when eve^ it succeeds, this results effectively in the same actions as the inte~reterp e ~ o r m on s the stack of goal lists
[parent (arne,X) [parent (arne,X ) [parent (arne,X ) [parent (arne X) [parent (arne,X)
?= ?= ?= ?= ?=
parent (arne,james) 1 , parent (arne sachiko) 1 parent (koos rivka) 1 , parent (sachiko, rivka)]-, p a r e n t ( t r u i t j e , k o o s )1 , I
<< ( " X " / X )
We see that the action routine providesa very convenient andeffective way to avoid copying the goal list tail. /* translation of 'parent(arne, sachiko).' */ void parent-2-clause-2 (void) {
/* translation of 'arne, sachiko) ., */ void parent-2-clause-2-arg-1 (void) { /* translation of 'sachiko).' */ void parent-2-clause-2-arg-2(void)
{
/* translation of */ void parent-2-clause-2-body(void) goal-list-tail () ; '
.
l
{
}
1
/* translation of head term 'sachiko)' */ unify-terms (goal-arg2, put-constant (ltsachikoli) , parent-2-clause-Z-body 1;
/* translation of head term , a r m , */ unify-terms (goal-argl, put-constant ("arneIf) , parent-2-clause-2-arg-2
1
1
);
/* translation of /parent(' */ parent-2-clause-2-arg-10; Figure 8.16 Generated clause routine for parent (arne, sachiko) .
The code generated for parent-2-clause-2 ( ) and shown in Figure 8.16 goes through the same motions as the interpreter. It is best read from the bottom upwards, due to the 'declaration before use' rule of C. Parentm2-clausew2 () calls the nested routine parent-2-clause-2-arg-1 ( ) for the first argument. That routine attempts to unify the first argument goal-argl with its own first head argument arne by calling
unify-terms ( ) ; the data structure for the head argument was created by calling put-constant ( ) , In addition to the arguments, parent-:!-clause-:!-arg-l( also passes the routine parent-:!-clause-2-arg-2 ( ) as the action routine to unify-terms ( ) , in the understanding that unify-t erms ( ) will call its action routine when the unification has succeeded. eading upwards, we see that parent-2-clause-2-arg-2 ( ) , nested within parent-2__clause-:!-arg-~ ( ) , tries to unify the second argument goal-arg2 with its own head argument sachiko in the same manner. The only difference is that the action routine that is passed on is now parent-2-clause-2-body ( ) rather than parent-:!-clause-2-arg-3 ( ) , since the closing parenthesis after the head term sachiko signifies that the head of the clause is finished. The code for parent-:!-clause-:!-body ( ) is very simple: since there is no body with goals that have to be satisfied, it just calls goal-list-tail ( ) . We see that there is a close relationship between the Prolog code and the C code: a c o m a as a separator after a head term translates into an action routine name ending in -arg-N, a closing parenthesis into an action routine name ending in -body, and a dot into a call of goal-list-tail ( 1 . The code for the version of unify-terms ( ) that uses an action routine is very similar to that shown in Figures 8.4, 8.6, 8.7, and 8.9. The main difference is that where these routines do a return 1 to show that they succeed, the version in this section calls the action routine goal-list-tail ( ) ,passed to it as its third parameter, and proceeds to find more matches. The only interesting point is that since the searching routines do not return until all searching involving their results has finished, they can undo the variable bindings themselves on the way back and there is no need to record trail-binding ( ) . The code for unify-unbound-vari shows how first the variable is bound by setting variable.term,then the search is continued with this binding, and finally the binding is undone by setting variable.term to zero. Code that could be generated for the query ? - parent (arne, X) is shown in Figure 8.18. Actually, since queries are entered d y n a ~ c a llyif, such code is generated, it will have to be compiled on the fly. It is, however, more likely that the input processing section of the compiled program performs the top level of the query interpretively. Section 8.4.5 suggests how this can be done efficiently. The displaying entry in the goal list c .c: ( rrXrr , ;x) is represented by the nested routine display-X ( ) . It will be passed on through several calls of parent-:!. . . ( ) and unify-terms ( ) ,until at some point all requirements for the query are fulfilled and it is called; the search then continues, until all possibilities have been tested. The output of the compiled query is indeed X = james X = sachiko
As an aside, we note that much of the visual complexity of the code of Figure 8.16 is due to the absence of routine denotations in C. A routine ~ e n o t ~ t i oisna notation for the formal parameters and the body of a routine; such a notation, known as a lambda expression in functional languages, acts as a literal or constant and is available in some impera-
void
) {
unify-unbound-variable( Term *goal-arg, Term *head-arg, Action goal-list-tail if (goal-arg == head-arg) { /* Unification of identical variables succeeds trivially */ goal-list-tail (1 ;
1
else {
/* Bind the unbound variable to the other term: */ if (head-arg->type == Is-Variable) { head-arg->term.variable,term
1
= goal-arg;
goal-list-tail ( ) ; head-arg->term.variable.term =
0;
else { goal-arg->term.variable.term = head-arg; goal-list-tail () ; goal-arg->term.variable.term = 0 ;
1
Figure 8.17 C code for full backtracking variable unification.
tive languages as well, for example in Algol 68 and Icon. For lack of such a device in C we areforcedtoname all intermediateroutines,whichcould just as wellremain anonymous. For example, in Algol 68, the code for parent-2-clause-2 () reduces to procedure void parent 2 clause 2 = ( unify terms (goal argl, put constant (rlarnelr) , void: unify terms (goal arg2, put constant (l'sachikorl), void: goal list tail
1;
1
)
in which. void: unify terms (goal arg2 , put : goal list tail
constant(lrsachikoll) ,
)
is a routine denotation, and so is voi. : goa1 1 ist tai 1. n e s e routine denotations terms. are passed as parameters to two different calls of unify The clause parent (arne, sachiko) has no body, so the code inFigure8.16 shows how to do head matching only; there are no goal calls. We will therefore now look
void query(void) { Term *X = put-variable ( "X"
;
void display-X (void) { print-term (X); print f ( 11 \n ) ;
1 parent-2 (put-constant(ttarne9t) , X, display-X) ;
1 Figure 8.18 Possible generated code for the query ? - parent ( a r n e , X ) .
at the translation of grandparent - see Figure 8.19. This figure shows all the code for the translation of the clauses for grandparent 12. The structure is very similar to that of Figure 8.16, Starting from the bottom, we find a list of calls to all clauses for grandparent/2, of which there is only one: grandparent ( X I Z ) :parent ( X , y) parent (Y, Z). The nested routine grandparent-2-clause-1 ( ) unifies the two terns X and Z in the usual way and passes the action routine grandparent-2-clause-l-body ( ) to the last unification, rather than goal-list-tail ( ) , since the clause has a body. The translation of the body follows the same pattern, adapted to goals: rather than calls to unify-terms ( 1 , calls to the routines representing the goals are dispatched; again the last call passes on the action routine goal-list-tail ( ) . Code for the head variables X and Z and for the free variable Y is generated in the appropriate routines, using the library routine put-variable ( ) . The routine put-variable ( ) allocates space for a variable and initializes that space with an unbound variable of the given name. The routine names put-variable, and put-cons tant used above, derive from the co~espondingWAM instructions. The close match between Prolog code and translation is even more visible here than in Figure 8.16: each next nested routine represents the translation of a smaller piece of the clause, and the one call in it handles the difference between the two nested routines, as the comments show. The above paints an elegant and idealized picture of compiled Prolog code; real-world compiled Prolog code is much more hairy, due to opti~zations.The code segment shown in this section suggests already several opti~zations,for example the in-line substitution of several routines that are called only once, but much stronger opti~zationsare possible. We will now look at one specific optimization. I
ctio The code for the relation parent/2 shown in Figure 8.15 calls the routines for all five clauses for parent/2, regardless of the values of the arguments goal-argl and goal-arg2. If we do a little analysis, however, we can often see that a certain call is
void
) {
grandparent-2( Term *goal-argl, Term *goalWarg2, Action goal-list-tail
/* translation of ’grandparent(X,Z ) :-parent(X,Y),parent(Y,Z ) . *, void grandparent-2-clause-l(void) { /* translation of ‘X,Z ) :-parent( X , Y) ’parent ( Y , Z ) . */ void grandparent-2-clause-l-arg-l(void) { Term *X = put-variable ( I1X1* ); /* translation of Z ) :-parent(X,Y) parent (U,2;). */ void grandparent-2-clause-1-arg-2 (void) { Term *Z = put-variable (llZ1l) ; I
/* translation of body ’parent(X,Y),parent(Y,Z ) . */ void grandparent-2-clause-l-body (void) { Term *Y= put-variable (llY1l) ;
/* translation of ’parent(Y,Z). */ void grandparent-2-clause-l-goal-2(void) { parent-% (U, goal-arg2 , goal-list-tail) ;
1
/* translation of ‘parent(X,Y),’*/
1
1
parent-2(goal-arglt Y, grandparent-2-clause-l-go;
/* translation of head term ’ Z ) : - ( */ unify-terms(goa1-arg2, Z, grandparent-2-elause-l-bodl
/* translation of head term ’X,’ */
1
1
1
unify-terms(goa1-argl, X, grandparent-2-clause-1-arg-2);
/* translation of ’grandparent(’*/ grandparent-2-clause-1-arg-10;
/* OR of all clauses for grandparent/2 */ grandparent-2-clause-10;
Figure 8.19 Clause routine generated for grandparent/2,
doomedto fail, oreventhattheentire call of parent-2 ( ) willneversucceed. For example, if one of the parameters is not a constant or an unbound variable, the call will neversucceed.Also, if one of theparameters is a constant,wecancompare it tothe corresponding constant in each clause in turn and if theydo not match it is futile to call the routine for that clause. So two opti~i~ations are possible here, one on the typeof an argu-
ment, and, if that argument is a constant, another on the value of that constant. The WAM supports these optimizations for the first argument only, and we will see now how this is done. Optimized clause selection proceeds in two steps. First, the type of the first ~ g u m e n t is used to jump to the labelof the code that handles termsof that type for the given routine; this multiway jump is performed by the WAM ins~uctionswitch-on-term. Second, for constants and structures, there arefurther switch inst~ctionsto lead the flow of control to specific piecesof code that are appropriatefor the specific first argument. The useofthe switch-on-term inst~ctioncanbeseeninFigure 8.20. The header and code for the clauses themselves are the same as in Figure 8.15, but the OR list of calls totheroutines parent-Z-clause- [1-51 ( ) is replacedbya call on switch-on-term. This call dereferences the first a r g u ~ e n goal-argl, t retrieves its L-Constant, or L-Structure. type and jumps to one of the labels L-Variable, The corresponding WAN instruction receives these labels as parameters; it has another label parameter for lists, but since we don’t do lists, we have omitted it here. Code segmentslabeledby L-Variable, L-Constant, and L-Structure follow; theyare discussed in the following paragraphs. Note thatfirst ~ g u ~ e nfor t swhich the flowof control is led to L-Variable correspond to variables, due to the dereferencing. The position at the end of the routine is labeled by L-f ail;actually a jump to this label means that there are no more possibilities rather than that the routinefailed, but the WAM uses the tern ‘fail’ in these cases andso will we. ~~~~~~~
void
parent-2( Term *goal-argl, Term *goal-arg2, Action goal-list-tail
... /*the local routines parent-2-clause-[l-5]() */ /* switch-on-term(L-Variable, L-Constant, L-Structure) */ switch (deref (goal-argl) ->type) { case Is-Variable: goto L-Variable; case Is-Constant: goto L-Constant; case Is-Structure: goto L-Structure;
1
... /* code labeled by L-Variable, L-Constant, L-Structure */
L-fail: ;
1
Figure 8.20 Optimized translationof the relation parent /2.
ment is an unbound variable, this variable can certainly be bound to the head in each clause, so the code labeled by L-Variable just calls l1 clause ~outinesin succession, just as the unopti~izedroutine parent-2 ( ) did; see
L-Variable: /* OR of all clauses for parent/2 */ /* parent(arne, james) . */ parent-2-clause-l 0 ; /* parent (arne, sachiko) . */ parent-2-clause-2 ( ) ; /* parent (koos, rivka). */ parent-2-clause-3 () ; /* parent (sachiko, rivka). */ parent-2-clause-4 ( ) ; /* parent(truitje, koos) . */ parent-2-clause-5 ( ) ; goto L-fail; 3. Code generatedfor the labelL-Var i ab1 e.
led by L-Constant is showninFigure8.22. It usesthe WAM -on-cons tant to implement the second optimization and direct the ow of: control to the proper code segment for the constant. For the optimization to be he argument before calling must be much faster than calling the rouecking in the traditional way. To this end all constants in the program ,and a function entry (const char *constant, lint char *tab1 [1e) is made availa~lethat returns the entry for constant in table [1;these entry indexes arec o n ~ t ~ tavaila~le s, to the pror example as a set of identifiers ENTRY-C for each constant C in the ize must be a power of two in the W A ,and we c o n f o ~to compiler, having seen all clauses for parent/2 , knows that all of them have for their first ~ ~ u ~ eand n it t sknows what these constantsare. So a switch stater just these values can be const~cted.The index of the constant in the hashtable s retrieved and usedto jump to the proper code. If the constant is ent, there is no possibility that parent-2 () succeeds and a jump lthough there are only 6 constants in the program, the hash table has size 8, to conform to theWAM requirements. el of the form L-Constant-C labels a list of calls to clause that have C as their first ~ g u m e n t ,That list is one or more calls There is noneed for the labels L-Constant-j arnesor for they do notoccurintheswitchment,sincenoclause or rivka as its first ~ g u ~ e n t . list of calls endsin goto fa.i l,since at that pointall possibilities have been exhausted. there we no clauses for parent with /2 a structure as first ~ g u m e n tthe , label ture directly leads to L-fail (Figure 8.23). This jump to a jump could have *
L-Constant: /* switch-on-constant(8, constant-table) */ switch ( entry(deref(goa1-arg1)->terrn.constant, 8, constant-table) ) {
caseENTRY-arne:gotoL-Constant-arne; caseENTRY-koos: goto L-Constant-koos; case ENTRY-sachiko: goto L-Constant-sachiko; case ENTRY-truitje: goto L-Constant-truitje;
1
goto L-fail; L-Constant-arne: parent-2-clause-l ( ) ; parent-2-clause-2 () ; goto L-fail;
/* parent(arne, james) . *I /* parent(arne, sachiko) . */
L-Constant-koos: parent-2-clause-3 (1 ; goto L-fail;
/* parent (koos, rivka) . */
L-Constant-sachiko: parent-2-clause-4 () ; goto L-fail;
/* parent
L-Constant-truitje: parent-2-clause-5 (1 ; goto L-fail;
/* parent(truitje, koos). */
rivka) . */
(sachiko,
Figure 8.22 Code generated for the labels L-Constant-
....
L-Structure: goto L-fail; Figure 8.23 Code generated for the labelL-Structure.
case Is-Structure: goto L-fail;
in Figure 8.20. The first ~ ~ u m eswitch n t opti~izationfor clause selection is quite effective, althou~h the effect of course depends on the program code. In our example, it reduces the number of calls to the clause routines parent-2-clause- E l - 5 1 ( ) from 105 to 24, for the query ?- parent (arne, X). The WAM has several otheri n s ~ c t i o n for s optimizing clause selection; these are pri-
marily concerned with avoiding ad~nistrativeand routine call overhead. There is a set of threeclauseroutine call instructions, try-me-el s e , ret ry-me-el se, and t rus t-me-el which are se, used for the first call, the middle calls, and the last call, respectively, of the clause routines in OR lists, both in top-level routines (Figure8.15) and after L-Variable and L-Constant-. ..labels. The first instruction inco~oratesthe routine call and entry administrationof parent-2 () ,the last the routine exit administration. There is a corresponding set try, retry, and trust,which are used for call lists after L-Const ant and L-S t ruclabels.ture. These instructions call simplified versions of the clause routines that assume that thefirst argument already matches completely (L-Cons tant) or partially (L-S tructure), These combined optimizations complicate the WAM considerably, but do increase its efficiency.
Prolog allows the search spaceto be reduced explicitly by using cuts. The cut ~ ~ is ~ an argumentless goal that always succeeds; it is written as an exclamation mark: !. Its property is that if it succeeds in a clause C for a relation R,it commits the relation R to the search space described by the part after the cut in clause C; the rest of the search space of C and the search spaces of the other clauses for R are ignored. In other words, once the search has passed a cut in a clause for a relation R,then backtracking over that cut causes further processingof the goalR to be skipped. The cut is surprisingly simple to implement: after having called the action routine for the part of the clause after the cut, jump to the endof the routine for the relation. The technique is shown in Figure8.24, which contains the translation of the clause first-grandparent (X, Z )
:-
parent (X, Y), !, parent (U,
Z)
r
.
In this figure, the relation name firs t-grandparent was abbreviated to first-gp to avoid overly longlines. The clause tries to say that X is a first-grandparent of Z if Z is a child of the first child of X,assuming that the relation parent/2 mentions the children in c~ronolo~ical order. It is actually an unfo~unateway of expressing that idea, since it will notsolvequeries like first-grandparent (X, riuka) except by accident, and only serves hereas an example of a clause with a cut. The translation of the cut is in the middle of the text, and consists of a call to the fi r s t _ _ g p ~ 2 _ _ c l a u s e _ ~ ~(~ ) orepresenting a l ~ 3 therest of the actionroutine clause, followed by a jump to L-cut. This label is positioned entirely at the end of the routine for first-gp-2, after the label L-f ai1. The reason for having two separate labels will become evident in the next section. Normal backtracking would bac~trackinto first-gp-2-clause-l-body() when f i r s t _ g p ~ 2 ~ c l a u s e ~ l ~ ~ oisa 1 ~ 3 f i ~ i s h e to ~ ,find other bindings for goal-argl and 'It,but the jump to L-cut cuts this attempt short, in accordance with the semantics of the cut mechanism. The label L-cut has been declared at the beginning of its scope, as is required for non-local labels in GNU
c.
void first-gp-2( Term *goal-argl, Term *goal-arg2, Action goal-list-tail )
{
/ * label declarations required by
-label-
GNU G for non-local jump
*/
L-cut;
! ,parent(Y,Z ) . / * translation of first-gp (x,Z ) : -parent( x , ~, ) void f irst-gp-2-clause-1 (void) { / * translation of 'x,Z ) : -parent(x,Y), 1 ,parent(Y,Z ) . * / void first-gp-2-clause-l-argdl(void) { Term *X = put-variable ( riXri) ; / * translation of Z ) : -parent(x,Y), ! ,parent(Y,Z ) . * / void first-gp-2-clause-l-arg-2(void) { Term * Z = put-variable ( ) ; / * translation of body !parent(X,Y) ! ,parent(Y,Z ) . ' * / void first-gp-2-clause-l-body (void) { Term *Y = put-variable ( i i Y 1 f )i / * translation of ! ,parent(Y,2 ; ) . ' * / void first-gp-2-clause-l-goal-2(void) { / * translation of 'parent (Y,Z ) . ' * / void first-gp-2-clause-1-goal-3 (void) { parent-2 (Y, goal-arg2, goal-list-tail) ;
1
/ * translation of
,!,'
*/
first-gp-2-clause-1-goal-3
(1
;
got0 L-cut;
1
/ * translation of 'parent(X,~),'* / parent-2(goal-argl, Y, first-gp-2-clause-1-goal-2)
;
1
/ * translation of head term ( % ) : - , * / unify-terms(goa1-arg2, Z , first-gp-2-clause-l-body);
1
/ * translation of head term (X,' * / unify-terms(goa1-argl, X, first-gp-2-clause-1-arg-2)
1
/ * translation of 'first-gp(' * / first-gp-2-clause-1-arg-10
;
1 / * OR of all clauses for first-gp/2 * / f irst-gp-2-clause-l() ;
L-f ail : ; L-cut : ;
1 Figure 8.24 Clause routine generated for firstMgrandparent/2,
;
*/
retract ture the possibilitytoaddandremove gicalpredicates,predicates that do not ert (Clause) dynamically adds the rearenorestrictionsontherelation ed relation, already o c c u ~ i ~statically g only from assert calls, or it may be a dicate retract (Clause) dynamically removes the clause Clause from the program. Again, there are no restrictions on e kinds of clauses that may be retracted: in particular it is permitted to retract programe ~ n e dclauses. The question is how to i ~ c o ~ o r asuch t e additions and removals to comferably without compromising the e%elf upon adding or removing a clause rR, and update the tableof constants. may not be cost-effective: some Pro,each of which would cause a recomnique here, in which we switch to an mpiled routines as soon as possible. symbolic form, possibly au~mented with some mechanis~that allows fast access to the asserted clauses for a given relation. his ~ ~ a the ~ assert e s and retract operations themselves simple, and the remaining uestio~is how to a~tivatethe newly added clauses. We first consider the case that the ~ § ~ e clause ~ e drefers to a program- define^ relation. We also assume that asserted clauses ivated after the progr~m-definedclauses; this corresponds to assert2; () in ~ e ~ the u ~ e lausestobeactivated To a ~ s o r bthe as§erted clause into the search process, we return to the routine generated for parent /2 (Figure 8.15) and add the code shown in Figure 8.25 to call the ter.Thiscodechecks a global counter toseeifanyasserted clauses for nt /2 are present at this moment that should be activated after the program-defined ; suchglobalcountersmustbemaintained for all relations by asS e rt and ct, and serve to avoid calling the interpreter unnecessarily. If there is at least one ~ ~ s ~ rclause t e d for parent/2 to be activated now, the name of the relation and the two are stacked in inverse order on the interp~eterstack; the name of the relation tackedlastsinceitisthe first piece of i ~ f o ~ a t i othe n interpreterwillbe en the interpreter is called, with the original action routine of ts only parameter. The interpreter calls this action routine each time it to satisfy the goal just stored on the interpreter stack; each such call implies a r e t u ~ to compiled code. Note that the values "parent/2I t , goal-argl, l-argz,and goal-list-tail together represent the goal list stack introduced in ection 8.2. The process is depicted in Figure8.26. o w see the reasonfor having two separatelabels L-fai 1 and L-cut: a jump to
~ 8.4.5 / ~~~ / e ~ e ~ nthe t i ~f fr ~e ~ d i ~ assert a ~ ~~s and retract ~
~
1 631
void parent-2 ( Term *goal-argl, Term *goal-arg2, Action goal-list-tail )
{
...
/ * the local routines parent-2-clause-[l-51 0 * / / * guarded OR of all clauses for parent/2 * / if (!retracted-parent-2-clause-l) parent-2-clause-I ( ) ; / * parent (arne, james) . * / if (!retracted-parent-2-clause-2) parent-2-clause-2 ( ) ; / * parent (arne, sachiko) . * / if (!retracted-parent-2-clause-3) parent-2-clause-3 ( ) ; / * parent (koos, rivka) . * / if (!retracted-parent-2-clause-4) parent-2-clause-4 ( ) ; / * parent (sachiko, rivka) . * / if (!retracted-parent-2-clause-5) parent-2-clause-5 ( ) ; / * parent (truitje, koos) . * / L-fail: if (number-of-clauses-added-at-end-f orjarent-2 > 0 ) { stack-argument(goa1-arg2) ; stack-argument(~oal-arg1); stack-argument (put-constant ( l1parent/2If ) ) ; interpret (goal-1ist-tai 1 ) ;
Figure 8.25 Translation of p a r e n t / Z with code for controlling asserted and retracted clauses.
L-fail signals e~haustionof the program-defined clauses; a jump to L-cut signals abandoning all further clauses for the relation, including those added by assert2;. The code in Figure 8.25 handles the assertion of clauses and the retraction of clauses that have been asserted. The retraction of p ro g ra ~ - d e ~ n eclauses d is imple~entedby preceding each call of a clause routine C by a test using a global Boolean retracted-C. Many Prolog implementers decide that this causes so much useless overhead that the imple~entationdoes not allow program-defined clauses to be retracted unless they are specifically declared as retractable. The goal lists in asserted clauses may contain goals with relations that occur only in other asserted clauses, in which case the processing of these goals has to be left to the inte~reter? but they may also contain goals with progr~-definedrelations. Since a routine has been compiled for such a goal, it is profitable to call that routine to process the goal, rather than leaving it entirely to the interpreter. This requires the run-time conversion of the name of a routine to a call to that routine. To effect this, we use the same entry ( ) routine used in selecting code for unifying a constant shown in Figure 8.22, and apply it
0
~
Interpreter
~ o ~ p i code ~ed
.,. ...
..* stack-argument(goa1-arg2); stack-argument(goal-argl); stack-argument(put-constant( ... interpret(goa1-list-tail); ,..
...
code
.*.
...
...
void interpret( ... ...
Figure 8.26 Switch to i n t e~ r et at i o nof added clauses.
here to the name of the relation and a fixed table of names of program-defined clauses. The dispatch routine do-relat ion ( ) is shown in Figure 8.2'7 and uses the name of the relation and the fixed table of relation names to select a code sequence for that relation. The code sequence for parent/ 2 unstacks the goal arguments from the interpreter stack and passes them and an action routine to the compiled routine parent-2 () ;the other code sequences are similar. If the requested relation is not program-defined, the flow of control is directed to aninterpreterroutine interpret-asserted-relation (), which searches relations that stem from asserted clauses only. The action routine passed to do-rela t ion ( ) is an internal routine in the interpreter. It processes the goals still left on the interpreter stack and then calls the action routine originally passed to the interpreter in Figure 8.25. Ifwe assume that the goallist stack is ~aintained explicitly in the interpreter, the relation name to be passed to do-relat ion () is found onthe top of the stack and do-relat ion () is called as follows: ~o-relation(unstac~-argument()->na~e, internal-goal-list-tail);
The process is depicted in Figure 8.28. The unstack-argument ( ) call unstack-argument (1->name
in the abovecall of do-re
lat ion ( ) and thosein
void do-relation(const char *relation, Action goal-list-tail) { switch (entry(relation,N, relation-table) ) { case ENTRY- . . . : case ENTRY-parent-2: goal-argl = unstack-argume~to; goal-arg2 = unstack-argument ( ) ; parentm2(goa1-argl, goal-arg2, goal-list-tail); break; case ENTRY-grandpa~ent-2:
...
break; case ENTRY-.,.: . * *
default: interpret-asserted-relation(relation, goal-list-tail) break;
;
1 } Figure 8.27 Dispatch routine for goals in asserted clauses.
Interpreter
Compiled code
stack
code
...
list-tail)
... ... Figure 8.28 Switch to execution of compiled clauses.
goal-argl = unstack-argument(); goal-arg2 = unstack-argument();
;
2'7 unstack a goal, just as the inverse sequence in Fi ure 8.25 stacked a goal. ver,thatthis is notthesamegoal. Thecode inFig 8.25 is located in the program and stacks a goal from the compiled program, to be tried with asserted clauses by the interpreter; the code in Figure 8.2'7 is located in the acks a goal from an asserted and interpreted clause, to be tried either tine if one is available, or with asserted clauses. The i n t e ~ l a ybetweeninterpreterandcompiled code allowsthe assert and ations to be i ~ p l e ~ e n t ereaso~ably d ef~cient~y. ~aintaini~g this mechanimposition as it would seem to be, since much of it is needed anyway to The behavior of the as s ert and retract operations with tion of the assert and il~plementationsimilar to ere, which is applied to programs with tensof thousan~sof Prolog rules we have been c o n c e ~ e dwith c o ~ p i l e dclause selection and its optimithe unification side, we have assumed the existence of a routine *goal-arg, erm Term *head-arg, Action ) ,whichfinds all possibleunificationsbetween goal-arg and n found. This emphasi~es allows one to imple~entsets as terms. On the other .it cannot succeed in more than one way andif one ste ils; no backtracki~ginside the u n i ~ c a t i ois~ required. So it is likely s and their calling sequences can be made more efficient. Also, we would tailored code for the unification of specific terms, rather than call the gen-
Prolog either succeedsor fails, we canusethe imple~entation 8.3.2, which does exactly that. This requires a little re writ in^ of the nt /2 and grandparent/2 routines. The basic str~ct~lre of these routines is the asshowninFigures 8.15 and8.19;they still have an actionroutineastheirlast a ~ e t e r and , they still act as OR list procedures, But the clause routines require some ;theone for parent (arne, sachiko) is showninFigure fortheheadunificationhasbeenreplaced by a simpleset of c oolean function unify-terms ( ) ,which receives no action routine. Each tests one ion,andiftheyallsucceedwecangoonandcallthefunction for thebody t 2 c lause 2 body ( ) ,which is again identical to that in Figure 8.16. gure 8.29 shows the useof the ~~~~~. The trail is one of the stacks of the it is used to store pointers to variables that have been bound. acb clause routine R 11s unify-terms ( ) declares a stack marker, and records it the present stack t-trail-mark () Each time unify-t~rrn~ () or one of its subariable, it stacks its address by calling t r a i ~ - b i n ~ i( )~ g(Fig"
/ * translation of 'parent(arne, sachiko) void parent-2-clause-2 (void) { Trail-Mark trail-mark;
. /
*/
set-trail-mark(&trail-mark) ; / * translation of (arne, sachiko)' * / if (unify-terms(goal-argl, put-constant ("arnel') && unify-terms (goal-arg2, put-constant (ffsachikoll) ) )
{
/ * translation of */ void parent-2-clause-2-body (void) { goal-list-tail ( ) ; / . /
1 / * translation of */ parent-2-clause__2-bodyO; , . /
1
restore-bindings-until-trail-mark(&trail-ma~k) i
1 Figure 8.29 Clause routine with flat unification for parent ( a r m, sachiko).
ure 8.6). At the end, when all possibilities for this binding have been examined, the routine R unstacks all addresses of variables from the stack until the original level is reached again, by calling restore-~in~in~s-un~il-trail-mar~ ( ) on the same stack marker. Note that it is,not possible to know statically how many variables have been bound: the calls of unify-terms ( ) may have failed i ~ e d i a t e l yhave , failed just before the end or may have succeeded; so a dynamic record must be kept. Now that we have 'flattened' the unification we can take the next step towards compiled unification, Since the heads of the clauses are known from the program, we often find ourselves generating calls of unify-terms ( ) with constant parameters; examples are unify-terms (goal-argl, put-constant ( I8arnel1) )
from Figure 8.29, and unify-terms(goa1-arg, put-struct-2 ( lltimestl , put-constant ( ) , put-variable ( I1X" ) ) )
used to unify the head structure times (2 , x:) and the goal argument. In such cases it seems unnecessary invoke to the heavy interpretive unification mechanism of uni y-tferms ( ) ,and the code looks the more clumsy because it creates complicated data structures just for the purpose of comparing them. One could try to remove thatinefficiency by creating all fixeddatastructures at compiletimeandpassingthem to unif y-terms ( ) , but still the full interpretiveunificationmechanismwouldbe employed.Also,thistechniquewould fail for s ~ c t u r e swithconstantsandvariables inside.
Both inefficiencies can be removed simultaneously by directly in corpora tin^ the infomation about the head arguments in the compiled unification code. This leads to the unification 'inst~ctions'~ I F Y - ~ O ~ S U~IFY-VARIABLE, T~T, ~ I F Y - S T R U ~ T U Rand E, several others. As with any interface, there are two sides to these inst~ctions:how they are used and how they are implemented. Their use is demons~atedin Figure 8.30, which featurestwoapplications of ~ I F Y - C O ~ S T A N TThese . applicationsareto be seenas macros,and for thatreasontheyarewrittenin capital letters,following C convention. They expand into code that falls through if the unification succeeds and then it establishes the proper bindings -and jumps to the label L-f ai1 when it does not. Note that the UN1 FY... instructions are code segments rather than routines, in that the flow of control can leave them by falling through and jumping out. This jump-on-failure fits in with the WAM instructionsfor clause selection, as seen,for example, in Figure 8.22.
-
/* optimized translation of 'parent(arne, sachiko).'*/ void parent-2-clause-2 (void){ Trail-Mark trail-mark; set-trail-mark(&trail-mark); /* optimized translation of '(arne, sachiko)' */ ~ I F Y - C O N S T ~(goal-argl N~ tlarneis) ; /* macro */ ~ I F Y - C O N S T ~(goal-arg2, T I'sachiko") ; /* macro */ I
/* optimized translation of '.' */ goal-list-tail (1 ; L-f ail : restore-bindings-until-trail-mark(&trail-tnark)
1
;
Figure 8.30 Clause routine with compiled unification for parent (arne, sachiko 1.
As to the implementation of these instructions, we can of course design the code that implements them by hand, but it is more instructive, and perhapsalso safer, to derive them by partially evaluating the corresponding call to unify-t erms ( ) . The code to be sub-
head-text ) is shown in Figure 8.31; stituted for ~ I ~ Y - C O N ~ (goal-arg, T ~ T g is a temporary variable of type T e r m * , which loses its meaning at the end of the code segment. The code was derived semi-automatically by partially evaluating the call unify-terms(goa1-arg, put-con~tant(head-text))
Since partial evaluators for C are still experimental and not widely available, this particular partial evaluation was performed by hand, as demonstrat~din. the next section.
g = deref (goal-arg);
if (g->type== 1s-Variable) { trail-binding (9); g->term.variable.term= put-constant(head-text) ;
1
else { if (g->type ! = Is-Constant) goto L-fail; if (strcmp(g->term.constant,head-text) ! = 0 ) goto L-fail; } Figure 8.31 The u n ~ ~ c a t i oi n s ~ c t ~ oUNI n F Y - ~ ~ ~ S T ~ T .
The code for ~ I F Y - C O N ~ T (Figure ~ T 8.31) holds no surprises (fo~unately!). On the one hand, it is similar to what a p r o g r a ~ e rmight have written for the task; on the other hand one recognizes fragments from the general unification code from Figures 8.6 and 8.7. A similar process produces the code for ~ I F Y - ~ A R I A B L E(Figure 8.32). It differs somewhat because the head variable may already be bound.
v = deref (head-var); / * head-var may already be bound * / g = deref (goal-arg); if (v->type== Is-Variable) { if (g ! = v) { trail-binding (v); v->term.variable.term= g;
1
1
else {
/ * no further compilation possible * / / * call interpreter * / if
(
!unify-terms (g, v) ) goto L-fail;
1 Figure 8.32 The unification instruction UNI FY-VARIABLE.
ve,the code for theunificationinstruction UNIFY-CONS manual partial evaluation. We will now show ho ed with only minor effort. In the case of U ~ I F Y - ~ O ~ ~ the T~NT, y consists of in-lining the subroutines of unif y-terms ( ) ,replacing rg by put-constant (head-text ) andthen ~epeatedly and deleting unreachable code. The process required the followerations of an almost mechanical nature. in-line Start from unif y-terms( ) (Figure 8.4) and riables () (Figure 8.7) and unify-unbound-~ariable() is easy to do since they all have the same parameters; just inse~ing . Thetest for constants is alreadyin-line; we do not in-line c tures ( ) since we foresee we willnotneed it. Theresult is still G; see Figure 8.33. ~ c o n s t ~ n tas ~ head. Remove useless the head = g) statement, since head-arg is not a variable. Change the headfy-constant (Term *goal, char *head-text) and ut-constant (head-text ) ) t~oughoutthe body. See Figure 8.34,
s ~ ~ s t i t ~ t i oWe n .replace (put-constant (head-text ) ) ->type by tant, (put-constant(head-text))->term.co~stant by xt,and then (put-constant (head-text) ) ->term.* by ERRONEcode no longercompiles,due to thepresence of E~RONEOUS,See Fig~ o ~ ~ ~ aWet ireplace o ~ . Is-Constant == Is-Variable by 0, goal ) by 0 since goal->typeis Is-Variable constant (head-text) xt, and goal -"type by Is-Constant in the switch statement, since that context. The code again does not compile, due to the~resenceof
~ z ~ ~ t i Remove o n . the ineffective operations (I I 0) and the unreachable ) ) The code compiles again since all occurrences of ERRONEOUS were cated in unreachable code, as desired. See Figure 8.37. M i n t ~ ~ ~Replace c e . re turn 1 by fall-through and re turn 0 by oto L-f ail,to match the WAM flow of control. Remove superfluous p~entheses. is isthe final product. See Figure 8.38. rocess of partially evaluating a part of an interpreter to obtain an optimized code inst~ctioncan be applied in many situations. ore in pafiicular, it proe s e ~ ~ e nfor t s ~ I F Y - V ~ R I A B L(Figure E 8.32) and ~IFY-STRUCTURE
int unify-terms(Term *goal-arg, Term *head-arg) { / * Handle any bound variables: * / Term *goal = deref (goal-arg); Term *head = deref (head-arg); ’
I
if (goal->type== Is-Variable I head->type == Is-Variable) { / * Handle identical variables: * / if (goal == head) return 1;
/ * Bind the unbound variable to the other term: * / if (head->type== Is-Variable) { trail-binding (head); head-3term.variable.term = goal;
1
else { trail-binding (goal); goal->term.variable.term= head;
1
return 1;
/ * always succeeds * /
1
else {
/ * Handle terms of different type: * / if (goal->type ! = head->type) return 0; switch (goal->type){ case Is-Constant: / * both are constants * / return strcmp(goa1->term.constant,head->term.constant)== case Is-Structure: / * both are structures * / return unify-structures( &goal->term.structure,&head->term.structure 1;
1
1
1
Figure 8.33 Deriving the i n s ~ c t i UNI ~ n FY-CONSTANT,stage 1.
Deriving the code for UNI FY-STRUCTURE along the lines explained above is straightforward most of the way, but the resulting code ends in a loop to unify the c o ~ p o ~ e n twhich s, does not fit the notion of an ‘inst~ction’.The WAM inst~ctionfor unify in^ structures only unifies the functor and the arity; separate i n s ~ c t i o n smust be generated for unify in^ the co~ponents.We use the same separation here, but some care is required. The code for ~ I F Y - S T R U ~ T U Rcan E be found in Figure 8.39. Figure 8.40 shows an application of it
0;
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg: */ goal = deref (goal-arg) ; if (goal->type== Is-Variable I I (put-constant (head-text)->type ) {
L== Is-Variable
/* Handle identical variables:*/
if (goal == (put-constant (head-text) ) ) return
1;
/* Bind the unbound variable to the other term: */ if ((put-constant(head-text))->type == Is-Variable) {
1
trail-binding( (put-constant(head-text))); (put-constant(head-text))->term.variable.term = goal;
else { trail-binding (goal); goal->term.variable.term= (put-constant(head-text));
1
1
return
1;
/* always succeeds
*/
else {
/* Handle terms of different type: */ if (goal->type != (put-constant (head-text) ) ->type) return 0; switch (goal->type) { /* both are constants*/ case Is-Constant: return strcmp ( goal->term.constant, (put-constant(head-text))->term.constant ) == 0 ;
case Is-Structure: /* both are structures */ return unify-structures( &goal->term.structure, &(put-constant(head-text)) ->term,structure 1;
1
1
1
Figure 8.34 Deriving the instruction UNIFY-CONSTANT, stage 2.
int unif~-constant(Term*goal-arg, char *head-text) { Term *goal;
/ * Handle bound goal-arg: * / goal = deref (goal-arg); if (goal->type== Is-Variable 1 I Is-Constant == Is-Variable) { / * Handle identical variables: * / if (goal == (put-constant(head-text))) return I;
/ * Bind the unbound variable to the other term: * / if (Is-Constant == Is-Variable) { trail-binding((~ut-constant(head-text)1 ) ; ~ R R O ~ E O= ~goal; S
1
else { trail-binding (goal); goal-rterm.variable.term= (put-constant(head-text)1 ;
1
return 1;
/ * always succeeds * /
1
else {
/ * Handle terms of different type: * / if (goal->type ! = Is-Constant) return 0; switch (goal->type){ case Is-Constant: / * both are constants * / return strcmp(goa1->term.constant, head-text) == 0; case Is-Structure: / * both are structures * / return unify-structures( &goal->term.structure, E ~ R O ~ E O U S 1;
1
1
1
Figure 8.35 Deriving the j n s ~ ~ t i ~ o n ~ ~ ~ - ~stage O 3.~ S T ~ T ,
in the compiled code for the head times ( 2 , X ) . The compiled code first establishes the presence of the structure, then acquires a local variable goal-comp pointing to the component list m d then steps through the list, unifying the components in turn. As usual, if any of the uni~cationsfails, the inst~ctionexecutes a jump to the label L-f ai1. We see from Figures 8.31 and 8.39 that when the goal argument is an unbound variable, the corresponding constant or structure must be created and the variable bound to it. This is necessary to allow further unification andor reporting of the value in answer to a query. There is a problem here, though: whereas put-constant ( ) creates the entire
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg:*/ goal = deref (goal-arg) ; if (goal->type== Is-Variable I 0) { /* Handle identical variables:*/ if ( 0 ) return 1;
/* Bind the unbound variable to the other term: */ if ( 0 ) {
1
trail-binding( (put-constant(head-text))); ERRO~EOUS = goal;
else { trail-binding (goal); goal->term.variable.term= (put-constant(head-text));
1
1
return
1;
/* always succeeds */
else {
/* Handle terms of different type: */ if (goal->type != Is-Constant) return
0;
switch (Is-Constant) { /* both are constants*/ case Is-Constant: return strcmp(goa1->term.constant,head-text) == 0; /* both are structures*/ case Is-Structure: return unify-structures( &goal->term.structure,E ~ R O ~ E O U ~ 1;
1
Figure 8.36 Deriving thei n s ~ c ~ Uo ~n I ~ Y - C O N S T ~ Nstage T , 4.
constant, put-st ructure ( ) creates only the record for the s ~ c t u r e fills , in the functor, and the arity, but leaves the component entries empty. This matches the behavior of E , checks only the type, the the second part of the code for ~ I F ~ - ~ T R U C T U Rwhich functor and the arity but leaves testing the component entries to the code followingit. This code (Figure 8.40) cannot, however, handle uninitialized component entries, as would be supplied by the first part if the goal variableis an unbound variable. We solve this problem by initializing components the with anony~ous variables the in call to init ialize-components ( ) in Figure 8.39; these variables can then be bound to the
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg: */ goal = deref (goal-arg) ; if (goal->type == Is-Variable) { trail-binding (goal); goal->term.variable.term= (put-constant(head-text)); return 1; /* always succeeds */
1
else { if (goal->type != Is-Constant) return 0 ; return strcmp(goa1->term.constant,head-text) == 0;
1
1
Figure 8.37 Deriving the instruction~ N I F Y - C O N S T ~ Tstage , 5.
int unify-constant(Term *goal-arg, char *head-text) { Term *goal; goal = deref (goal-arg) ; if (goal->type== Is-Variable) { trail-binding (goal); goal->term.variable.term = put-constant(head-text)
1
;
else { if (goal->type != Is-Constant) goto L-fail; if (strcmp(goa1->term.constant,head-text) != 0) goto L-fail;
1
return 1; L-f ail : return 0;
1
Figure 8.38 Deriving the instruction~ I F Y - C O N S T A ~ final T , result.
components of the structure by the instructionsfor the entries. Figures 8.41 and 8.42 show theeffects of the code of Figure 8.40 whengoal-arg is an unbound variable Y, represented as a null pointer. Figure 8.41 shows the result of the inst~ctionU~IFY-~TRUCTURE(goal-arg , It times , 2 ) . Three allocations
g = deref (goal-arg) ; if (g->type == Is-Variable) { trail-binding (g) ; g->term.variable.term = put-structure (head-functor, head-arity) ; initialize-components(g->term.variable.term);
1
else { if (g-=.type!= Is-Structure) goto L-fail; if ( g - > t e r m . s t r u c t u r e . a r i t y != head-arity I I strcmp(g->term.structure.functor, head-functor) != goto L-fail;
0
1
Figure 8.39 The unification instruction U N 1FY-STRUCTURE.
1
~IFY-CO~ST (goal-comp ~T EO] , I s Z s s; ) ~IFY-VARIABLE(goal-comp [l] , X) ;
/* macro */ /* macro */
Figure 8.40 Compiled code for the head times (2 , X).
have taken place: onefor the structure t imeS /2 , one for an m a y of two components, and -A0 l and -A02 here. The one for two ‘~onymous’variables,whichwerenamed instructions U N I F Y - C O N ~ T ~ T(goal-comp CO] , s12”)and U ~ I F Y - ~ A R I A ~ L E (goal-comp [1.1 , X ) are referred to these anonymous variables and bind them to their respective arguments, as shown in Figure 8.42. We see that the same inst~ctionscan perform unification with data structures from the goal when these are present and construct them when they are not. We are now in a position to generate in-line code for the unification of constants, variables, and structures as head arguments, with goal arguments; unification of lists as head ~ g ~ m e nist svery similar to that of structures. Compiled unification performs effectively the same operations on the data structures as interpreted unification does, with the exception of the intro~uctionof anonymous variables as structure components, It is much faster, since many of the decisions have been taken at compile time. Note also that compiled unification is appliedto all arguments,whereastheoptimized clause selection(Section 8.4.3) examines the first argument only.
~ U ~ ~ ~ C 8.5.3 T I O ~Nn i f i ~ ~ tof i ostru~tu~es n in the WAM
variable goal:
Y
anonymous variables
A0 1
_ .
A02 -
ure 8.41 The effect of the instruction ~ ~ F Y - S T R U C T U R ~ .
variable goal:
Y
anonymous variables
-A01 $ constant
-A02 $ variable X
Figure 8.42 The effect of the i n s ~ c t i o n sof Figure 8.40.
The WAM features several additional optimizations on the unification i n s ~ ~ t i o n s ; we will discuss one of them in some detail in the next section and briefly mention some others in the section thereafter.
There is a second solution to the problem of the uninitialized component entries, already ted in the wording above: leave the entries uninitialized and modify the unification instructions so they can handle that situation, This solution is especially attractive because the anonymous variables are clearly an artifact of our implementation and continue to add at least one level of indirection to all operations on the structure; avoiding them will result inconsiderablespeed-up, since manyPrologprogramsmanipulatestructuresandlists extensively. The solution cannot be implemented asit stands, however, and requires two modifications. The first is simple: the unification instructions on the components have to be notified that the component entries have not been initialized. This is done by introducing a globalstatevariable‘mode’withtwovalues,‘Read’and‘Write’.In‘Read’mode, each instruction works as described above. In ‘Write’ mode the instruction creates precisely the data structure that it wouldunifywithin‘Read’mode,and initializes the proper component entry with a pointer to it. ~nification starts in ‘Read’ mode, and a ~ ~ F Y - S T R U C T Uthat ~ Eneeds to create the s t r u c ~ r eswitches to ‘Write’ mode. Since structures can be nested, the mode must be saved at the start of structure unification and restored at the end of it. far-reaching. The unification The secondmodification is alsosimplebutmore instructions as described above get a pointer to the goal argument to workon, and unification instructions for the com~onentsof structures get pointers to the components. A s a result, when we do not initialize the component entries, the instructions get uninitialized pointers, and no amount of clever program~ingcan make them do anything sensible with them. To initialize a component entry with a pointer to a data structure, the instruction needs the ~~~~~s~ of the component entry, rather than its -uninitialized -value. So the unificationinstructionsneedpointerstopointersto data stru~tures;as the sayinggoes, there is no problem in computer science that cannot be solved by introducing another level of indirection. Fortun~tely,this is easy to do, as shown in Figure 8.43, where we have just added the address operator & to the goal arguments, The code saves the mode in mode-1 before calling the readwrite version of ~IFY-STRUCTU~E, since that instruction may set the mode to ‘Write’, to w a n the following instructions. The mode is restored when all structure components have been unified. Nested structures will need more than one ternporary mode-N tosavenestingmodesandmorethanonetemporary goal-comp to address the various Componentlists. Now that we have seen how the new instructions are used, we can look into their implementation. The readwrite version of ~ I F Y - C O N S T ~ NisTshown in Figure 8.44 and is representative of the general approach. The code for ‘Read’ mode is the same as that for the original ~ I F Y - C O N S T ~(Figure T 8.31)’exceptthatthere is an extra dereferenceoperationonthepointerto the goalargument. The code for ‘Write’mode creates the constant and stores a pointer to it under the goal argument pointer, which in ‘Write’ modewill be an entry ina component list of a goal st~cture. The codefor UNIFY-STRUCTURE (Figure 8.45) follows this pattern with the exception that it sets the mode to ‘Write’ rather than initializing the components, when the goal argument is an unbound variable. Figures 8.46 and 8.47 show the effects of the code of Figure 8.40 with goal-arg an
/ * match times(2, X) * / mode-1 = mode; / * save mode * / U~IFY-STRUCTURE(&goal-arg, Iftimes" , 2 ) ; / * macro * / / * saves readlwrite mode and may set it to Write-Mode * / { / * match (2, S ) * / register Term **goal-comp = d~re~(goal-arg)->term.structure.components; ~ I F ~ - C O N S T A N(&goal-comp T L O ] , ! 1 2 1 f;) / * macro * / UNIFY-VARIABLE (&goal-comp [l], X ) ; / * macro * /
1
mode = mode-1; Figure 8.43 Compiled code with reaawrite mode for the head times ( 2 ,
if (mode == Read-Mode) { g = deref (*goal-ptr) ; if (g->type == Is-Variable) { trail-binding (9) ; g->term.variable.term = put-constant(head-text)
x:1 .
;
1
1
else { if (g->type ! = Is-Constant) goto L-fail; if (strcmp(g->term.constant,head-text) ! = 0) goto L-fail; }
else { / * mode == Write-Mode * / *goal-ptr = put-constant(head-text);
1 The inst~ctionUNI~ ~ - C O N S Twith ~ T readwrite mode.
unbound variable Y,this time using rea~writemode. Figure 8.46 shows that the instruction ~ I F ~ - S T R U C T U R E(goal-arg t! timest! 2) has now left the component array entries uninitialized, avoiding the overhead of the const~ctionof the anonymous variables. The instructions UNIFY-CONSTANT (goal-comp [ 01 It 2 ) and ~ I F Y - ~ ~ R I (goal-comp ~ ~ L E [l] X) recognize this situation and fill in the component array entries with their respective arguments, as shown in Figure 8.47. omp piled unification with rea~writemode creates exactly the same data s t ~ c t ~as re a set of calls to the put-. . . routines, except that only that part of the data structure is created that is not already present in the goal argument. I
I
if (mode == Read-Mode) { g = deref (*goal-ptr) ; if (g->type == Is-Variable) { trail-binding (g); g->term.variable.term = put-structure (head-functor, head-arity) ; mode = Write-Mode; /* signal uninitialized goals
1
*/
else { if (g->type != Zs_Structure) goto L-fail; if (g->term.structure.arity!= head-asity I I strcmp(g->term.structure.functor,head-functor) != 0 ) goto L-fail;
1
1
else { /* mode == Write-Mode */ *goal-ptr = put-structure(head-functor, head-arity);
1
Figure 8.45 The instruction ~IF~-STRUCTURE with reaawrite mode.
variable goal:
Y
Figure 8.46 The effect of the instruction UN1
FY-STRUGT with read/write mode.
The above UNIFY-. ..instructions are simpler than the actual WAM instructions, which feature several further optimizations. We will briefly discuss them here; for a full treatment see the tutorial by Kit-Kaci (1991). We saw that the unification of structures is somewhat complicated by the fact that they can nest. This necessitates saving the mode as shown in Figure 8.43, and leads to the use of several component pointers at the same time, each of which we would like to allocate in a machine register. The WAM avoids these complications in a drastic way: it forbids nested structures. If the head ar~umentconsists of a nested structure, it must be uni-
variable goal:
Y
Figure 8.47 The effect of the ins~uctionsof Figure 8.40 with readwrite mode.
fied in top-down breadth-first fashion: the goal and head top-level structures are unified first, skip pin^ any substructures. Next, the second level of structures is unified, skippin any subsubs~ctures,etc. Since deeply nested head arguments to clauses are rare, this restsiction is not much of a problem in practice. The restsiction has several advantages: saving the mode has become superfluous; only one goal-comp pointer is needed, which can easily be placed in a register; and the door is opened to other optimi~ations,which we will describe now. Since only the UNIFY-STRUCTURE i n s ~ ~ t i ocan n set the mode to ‘write’, the mode is g u a r ~ t e e dto be ‘Read’ in any unification not inside a structure. Since simpler code can be generated in ‘Read’ mode (for example the code of Figure 8.31) than when the mode is unknown (Figure 8,44), the WAM has special inst~ctionsGET-. . . to be used instead of UNIFY-. . . whenever ‘Read’ mode is guaranteed statically; so G E T - C O ~ ~ Twould ~ T be used instead of ~ I F Y - C O N S T ~inTFigure 8.30, Note that since no structure unification is ever started inside a structure, there is no ~IFY-STRUCTUREin the WAM, just a GET-STRUCTURE. Since there is only one pointer to the component list, goal-camp, it can be implicit in the machine and need not be passed explicitly to the instructions. And since the unification inst~ctionsfor structure components (the UNIFY- . . . instructions) are differentiated from those used outside structures (the GET-. . . ins ~ c tio n s ),GET-STRUCTURE can set the pointer goal-camp and the UNIFY-. . . i n s ~ c t i o n scan use it and then increment it, to reach the next component. This also removes the need for the extra indirection described above. As a result, the compiled unification code of Figure 8.43 reduces to G E T - S T R U ~ T ~ ~ E ( g o a tl t/ t i m e s t t , 2 ) ; ~ I F ~ - C O ~ S( 2T ) ;~ T ~ I F Y - V A R I A ~ L (X) E ;
/* m a c r o */ / * macro * / / * macro * /
Note that UNIFY-CONST~Tand U N I F Y - V A R I ~ ~each L ~ have one parameter now, from the rule head; the goal argument is implicit. the head ~ g u m e nknown t A different optimization concerns the instructions GET-VAR IABLE and U N I F Y - V ~ I ~ B L EThe , first use of a free variable X is guaranteed to be unbound, so the
instruction reduces to v = head-var; g = deref (goal); trail-binding (v); v->term.variable,term = g;
for GET-VARIABLE and to v = head-var; if (mode == Read-Mode) { g = deref (*goal-comp++) ; trail-binding (v); v->term.variable.term = g;
1
else { /* mode == Write-Mode */ *goal-comp++ = v;
1
for ~ I F Y - ~ A ~ I A Bnote L E ;the useof the goal component entry pointergoal-cornp in the latter, since it is used only during structure unification. The original GET-~ARIABLE and U ~ I F Y - ~ A ~ I Ainst~ctions ~LE are renamed to GET-VALUE and U~IFY-VALUE, although these narnes are somewhat misleading. The second and further uses of a head variable need not represent a value at all since it may have become bound to an unbound variable.
This concludes our discussion of some optimi~ationsof compiled u~ificationin the WAN(, and of compiled unification in general. We have seen that the building blocks for compiled unification can to a large extent be derived semi-automatically by the partial evaluation of calls to the interpreter. This method was already applied to the imperative languages in Section 4.2.3.2, but the gain was l i ~ t e dthere. The difference is that here the method is applied manually and to a relatively simple interpreter. Professional compilers for functional and logic languages combine these or similar techniques and use them in combinationwithoptimized clause selection(Section8.4.3)andtail call elimination(Section 7.7,3), and such traditional opti~izationtechniques as constant propagation, in-lining, and cloning. The resulting code can come within a factor of two or three with respect to the speed of a similar program in an imperative or object-o~ente~ language.
-Logic p r o g r a ~ i n gis based on (l)the specification of relations between terns, (2) facts using these relations, and (3) rules for inferring new facts from existing facts. Specific inferred facts can be requestedby doing queries. -Facts and rules together are calledclauses; they can contain logic variables. -A logic variable can be (1) unbound; (2) bound to a value, in which case it represents that value; or (3) bound to another variable, in which case it represents whatever that
variable represents. A binding of a variable in a clause gives a new clause; the new clause is called an instantiation of the more general clause. - Clauses can be used to infer new facts, as follows. A clause consists of a head and a list of goals; dependencies between the head and goals are expressed by logic variables. If in an inst~tiationof a clause all goals are establis~edfacts, the head of the clause can be inferred as a fact. - Answering a query means finding bindings for the variables in it that turn the query into a known or inferred fact. - To infer a goal fact F , a clause is sought that might infer F, given proper variable bindings, Possible proper bindings are found by unifying F with the head of the clause. The clause is then instantiated using these bindings; the instantiated goal list yields new goals. Some of these may be known facts, others may need to be inferred. If all goals are eventually reduced to known facts or empty, F is inferred. - The state of the search for clauses and bindings can be recorded in a stack of estended goal lists. An estended goal list is a list of goals and unification requests. - The inference process considers the top stack entry only. If the top stack entry is empty, we have found an answer to the query and report it, using the instruction ‘Succeed’. if the top stack entry contains unification requests, an attempt is made to satisfy these, using the inst~ctions‘Unify’ and ‘Match’; if this fails the top entry is removed. Otherwise, clauses are located that might infer the first goal in the top entry. The inst~ction ‘Attach clauses’ is used to make copies of the top entry and insert unification requests with these clauses in them. - All Prolog interpreters and compiled Prolog programs work with this inference process or optimizations of it. - The inference process acts only on the topmost entry and mainly on its first goal, except for variable binding, which acts on the whole topmost entry. The non-first goals in the non-topmost entries are almost exclusively copies of those in the topmost stack entry, and it is an essential optimization to combine them. However, once combined, variable binding in the topmost entry affects them all. To undo this effect when the topmost entry is removed, all bindings are recorded when performed and undone when the topmost entry is removed. This is called backtracking. - Unification is a top-down process on two terms, one from a head tern and one from a goal term. The terms are viewed as trees. For unification to succeed, the two top nodes must be equal and have the same number of children, and the pairwise unification of the children of both terms must succeed, - An unbound variable is unified with a value or variable by binding it to that value or variable. The binding is recorded for backtrac~ngpurposes. - Structures, lists, and sets are unified by the same rules as terms. Variables in them are unified by the rules for variable unification. - Most Prolog compilers generate either C code or WAM code. The Warren Abstract Machi~ecode is an assembly level macro code that allows convenient optimizing Prolog compilation. - In a compiled Prolog program, each relation is implemented as a routine, and so is each clause.
-The first goal in the routine for a clause C is processed by a call G to the routine com-
piled for the relation in that first goal. This routine gets an action routine as a parameter that will be called whenever G succeeds in finding a proper set of bindings, The action routine is constructed to process the rest of the goals of C and then call C’s action routine. This implements the AND component of inference. The technique requires nesting routine declarations. The routine compiled for a relation R successively tries all clauses for R, passing to them the action routine it obtained from its caller. Copying the goal list has been replaced by copying the action routine. This implements the OR component of inference, Static and dynamic analysis can show that many of the calls to clauses for a relation cannot succeed, and should be avoided. The WAM has special instructionsfor this. The WAM has an ins~uctionfor clause selection based on the nature of the first goal argument: constant, variable, structure, or list; it effects a four-way jump, Next, for constants and structures as goal arguments it has instructions which perform N-way jumps, based on the value of the constant or the functor of the structure. The code positions reached through these jumps are occupied by calls to exactly the clauses that can have fiist head arguments with the selected properties. Exhaustion of the search possibilities is implemented as a jump to a fail label in the WAM. The Prolog ‘cut’ is implemented by jumping to the fail label when backtracking overit. Assert is implemented by conditionally calling the interpreter at the beginning andthe end of each routine for a relation. Retract is implemented by conditionally calling the clause routines. If the interpreter finds in an assertedclause a goal for which a compiled routine exists, it switches back to the compiledcode. -Naive compiled codefor unification is inefficient because itoften compares known constants to known constants and because it often creates data structures just to compare them. -To avoid the inefficiencies, the WAM has special instructions for unification with head arguments that are constants, variables, structures, and lists. These inst~ctionsare partial evaluations of the general unification routine for head arguments, specialized for constants,variables,structures,and lists. Theyapplyto all headarguments, first or otherwise. If a WAM unification i n s ~ c t i o nmeets a goal value, it compares the value to its built-in value. If the two match, the instruction continues; otherwise it jumps to the fail label. If a WAM unification instruction meets a goal variable, it creates a copy of its internal value, binds the variable to it, and continues. -The WAM unification instructions for structures and lists create the top nodes of the structure or list only. The creation of the children is left to further unification instructions, which are notifiedof this situation by turningfrom ‘Read’ mode to ‘Write’ mode. Usingthe above andmany other optimizations,compiled logic programscancome within a factor of two or three with respect to the speed of equivalent programs in imperative and object-oriented languages.
-
-
-
-
-
-
-
-
Information about the imple~entationof logic languages can be found in journals like the Journal of Logic P r o g r a ~ ~ i nand g ACM rans suctions on Program~ing~ n g ~ u g and es ~ y s t e ~ and s , in the proceedings of, for example, the International Conference on Logic Programming and the ~ C ~M I Conferen~e ~ on Progra~ming P ~ ~ n g~u a g Design e and I ~ ~ l e ~ e n t a t PLDI. io~ The original explanation of the Warren Abstract Machine is given by Warren (1983). Kit-Kaci (1991) has written a tutorial on the WAM. The definitive survey of sequential Prolog implementation, both historically and technically, was written by Van Roy (1994); it contains an extensive literature list.
-
8.1.
Prolog tries the different subgoals in the body of a clause from left to right. In theory, other evaluation orders are also possible, for example from right to left. For each of the two queries ??-
grandparent (arne, 2) . grandparent ( X , rivka) .
determine whether left-to-right or right-to-left order is more efficient (using the grandp~entrelation of Section 8.1). What can we learn from this?
8.2
In Section 8.2.1 we showed how the interpreter processes the query ? - gp (arne, X ) . Process this query again, but now using the opti~zationsdiscussed in Section 8.2.2.
8.3
Given the following relations: author(grune, ParsingTechniques) . author ( j acobs, ParsingTechniques) . author(grune, programmi~gLanguageEssentials) . author(ba1, p r o g ~ a m m i n g D i s t r i b u ~ e d ~ y ~ t e ~ s ) . author(ba1, programmingLanguageEssentials). coauthor(X,Y) : - author(X, Book) , (Y, Book).
(a) Show how the inte~reterdescribed at the beginning of Section 8.2 processes the query: 7-
coauthor(ba1, grune) .
(b) Describe which code will be generated for these relations using the techniques of Section 8.4.2. (6) Describe which code will be generated for the relations using the optimized clause selection technique described in Section 8.4.3.
8.
Project: In Section 8.2 we presented an implementation model in which the goal lists
are kept in a queue which acts like a stack since the ‘Attach clauses’ instruction manipulates it in a last-in-first-out fashion. Explore the effects of changing the ‘Attach clauses, i n s ~ c t i o nso as to manipulate the queue entries in a first-in-first-out fashion: it removes the top element, but adds the new clauses at the bottom. Which of the optimizations discussed here are still applicable? A language designer proposes to allow queries in which relation n m e s can be variables as well. Discuss the consequences for the general interpreted implementation model. (a) Showhowtheinterpreterprocesses un~ficationof p ( X , [ 2 , X , 4 I ) and p ( 3, [A, B, 41 ) ,using the techniques described in Section 8.3.
(b) Describe which code will be generated for this unification using the techniques described in Section8.5.
.cl
~ ~a unification t e routine unif y-l i stS () for lists as described in Section 8.3.1. Why is this code improvement an direct over application of unify-structures ()? In Section 8.3.3 we explain that the unification sequence B to A, C to B, D to B, E to C, leads to the binding tree in Figure 8.10b rather than the one in Figure 8,lOa. Is there a binding sequence that~~~Z~ lead to the tree in Figure8.10a, and if so, what is it? Refer to the example at the beginning of Section 8.4.1 on list procedures. The procedure return-two-values ( ) could also return a list with both values it wants to return, and let the caller consume the results one at a time. Why will this not work in general? Write a list procedure that accepts two list procedures specifying integer lists and that implements the intersectionof both lists. Construct code for a clause with twocuts in it: first-first-grandparent (X, Z )
:- parent
(X, U), !, parent (U, Z ) ,
which tries to say that X is a first-first- randp parent of % if first child of X.
2; is
the first child of the
Find out from the Prolog manual how as t seand retract r should behave with respect to ~ a c ~ t r a c ~and n g verify , if the approach from Section 8.4.5 supports this behavior. Design an implementation of optimized clause selection and retracted clause handling in an integratedfashion using bit vectors. Hint: see Colomb (1988). (a) The meta-logicaloperator var (;x) inProlog is definedtosucceedif X is currently an unbound variable and tofail otherwise. How can this operator be imple-
mented in compiled Prolog? eta-logical arithmeticalinfixoperator is inFrologrequiresitsright (b)The operand to be a s ~ c t u r E e that represents an arithmetic expression in some specific format. All variables in E must be bound. If V is currently an unbound variable, the operation succeeds and V is bound to the ~ t h ~ e tvalue i c of E, If V is bound and its value is equal to thearith~eticvalue of E5the operation succeeds. Otherwise it fails. How can this operator beimple~entedin compiled Prolog? Assume the existenceof a routine Term *evaluate-expres sion ( ) that evaluates an expression in the specific f o ~ a to t a term of type Constant,orgivesanerrormessage if the expression is in error. Also assume si~gle-lengthinteger arithmetic. Suppose one in~oducesin log F/
an ~ ~ t h m e toperator ic between with the syntax
between E , and E 2
whichevaluates E l and E2 as inExercise8.14,If V is u n b o u n ~the ~ o~eration succeedswith V boundsuccessivelyto all ~ t h m e t i cvaluesbetween E l and E2 inclusive (of which there may be zero!). If V is bound and its value lies between E l and E 2 inclusive, the o~erationsucceeds. Otherwise it fails. How can this operator be imple~entedin omp piled Prolog? Derive a unification instruction UNIF~-LISTfor unifying lists, similar to the one in Figure 8.39. i ~ of t logic o l~a n ~ u ia ~~p~l ~ ~ ~ ~ t a Study t i oRobinson’s n: paper (1965), which ~ntroducesunification as an essential partof theorem p~ovingand logicp r o g r a ~ ~ m i n ~ (requires co~siderable mathe~aticalsophistication),orhis later paper robins so^, 19’71) which concerns theimple~entationof unification, and write a s u of it. ~
~
~
~
P a ~ ~ l land e l distributed systems consist of multiple processors that can communicate with each other. Languages for p r o g r a ~ i n gsuch systems support constructs for expressing c o ~ c u ~ e n cand y c o ~ u n i c a t i o n In . this chapter, we will study how such languages can be implemente~. A s we will see, the presence of multiple processors introduces many new problems for a language implementer. We will first define what we mean by a parallel system and what by a distributed ,the goal is to solve a given problem asfast as possible, using processorscooperatetosolveasingleproblem,Parallel processing typically features in applications that would take a huge amount of compute So, the reason for using a parallel time, often weeks or months, on a single processor. system is to obtain performance improvements. s y s t e ~also contains multiple processors, but now the processors work on many different tasks from many users. There is normally no single problem involved. A typical configuration is a system with one workstation per user, plus some additional processors providing a file service or a time-shared compute service. The processors in a distributedsystemareautonomousandareconnected by anetwork. "here aremany reasons for using a distributed system insteadof a single, centralized machine: a distributed system is more cost-effective, potentially more fault-tolerant, and it is easier to extend the system incrementally. Interestingly, it is possible to use a distributed system as a parallel one. In fact, many research groups have studied how to use a collection of (idle) ~orkstationsfor runnin~ parallel programs. This approach is attractive, because many institutes already have the wor~stationsinstalled.
An important issue in compiling parallel and distributed p r o g r a ~ sis the architecture of thetargetsystem.Duringthepast decades, manyarchitectures for paralleland distributed systems have been designed. In this chapter, however, we focus on only two widely-used machine models: ~ ~ Z t i ~ r o c e s s oand r s ~ ~ Z t i c o ~ ~ ~The t e r difference s. betweenthesetwoarchitectures is thatwith a multicomputerprocessesondifferent processors run in different address spaces, whereas with a ~ultipro~essor at least part of 9.1). In a rocessor, all processorshave theaddressspace is shared(seeFigure accessto a single,sharedmemory. The processorsnicate by readingandwriting variablesintheshared ~ e m o r y .Multiprocessorscanbeconstructed, for example, by connectin several processors to a single bus. Other, more scalable, methods also exist. A tic consists of several processors connected by a network. The processors communicate by sendingmessagesoverthisnetworkto each other. The networkcan either be especially designed for the multicomputer, as in the CM-5 or SI)-2, or it can be an off-the-shelflocalareanetwork(LAN),suchasEthernet. A dist~butedsystemcanbe regarded as a ~ ~ l l t i c o ~ p uwith t e r an off-the-shelf LAN.
(a)~ultiprocessor
(b) ~ u l t i c o ~ p u t e r
Multiprocessors and ~ u l t i c o ~ p u t e r s ,
Many models and languages have been proposed for parallel and distributed programming and a great amount of research has been done-on how to imple~entthese languages efficiently. In this chapter, we necessarily cover only a small fraction of all thiswork. Rather than trying to give a broad overview of compi~ationtechniques for many different paradigms, we will focus on a few impo~antmodels. We will focus on langLlage constructs for p ~ a l l e l p r o g r a ~ i nalthough g, many constructs9for example message passing, can be used just as well for writing ~istributedapplications. For distributed applications,
ditional language c o n s ~ c t sexist, such as atomic transactions, but these will not be discussed here. The parallel programming modelswe discuss are: s ~ a r ve~~r i a ~ l e~s ,e s s ~~ ~~ se s i ~ ~ ~ ~ e c~ t s ~, Space, p ~and e~ a t ~ - p a r a~l l ~rl ~ ~ rThe a first ~ two ~ models, i ~ ~shared . variablesandmessagepassing,arelow-level c o ~ u n i c a t i o nmechanismsthatreflectthe underlyin~hardware. Objects, Tuple Space, and data parallelism are high-level const~cts that allow c o ~ u n i c a t i o nto be expressed at a higher level of abstraction. The models are s ~ ~ r ~ which a ~ , s , d e s c ~ ~ in e dSection 9.1. Their implementation is based on p r ~ c e s s ~and are treated in Section 9.2, In Sections 9.3 to 9.6 we discuss im~lementation techni~ues for shared variables, message passing, objects, and Tuple Space. in Section 9.7, we look at a ~ t o ~ ~parallelization, tic which tries to generate data parallel programs from sequential We should emphasize that much interest in^ research is done on other ~ r o g r a ~ i n g ms too, in p~ticularon functional and logic languages, and on other parallel promodels, for example collective communication. Also, we will restrict ourselves o c o ~ ~ i l a t i otechniques n for multiprocessorsand ~ u l t i c o ~ p ~ tbuilt e r s out of simple on alte~ativeformsofparallel ~icroprocessors. Muchresearchhasalsobeendone dware, for example vector computers, super-scalar machines, and data-flow machines. e will not cover these in this book. Roadmap
9 Parallel and distributed programs 9.1 Parallel p r o g r ~ m i n gmodels 9.2 Processes and threads 9.3 Shared variables 9.4 ~ e s s a g epassing 9.5 Parallel object-oriented languages 9.6 Tuple space 9.7 ~ u t o ~ a t paralleliz~~ion ic
general remarks can be made about the implementation of parallel languages. than with logic and functional languages, muchof the impleme~tationcomes in a g e ~ n - tsystem, i ~ e rather than compiler support. Some of the more popurogra~mingsystems(forexample PVM and MPI) areeven imple~ented raries that are linked witha program written ina traditional language, suchas AN. These p r o g r a ~ ~ nsystems g need no direct compiler support whatsoem with this approach is that the p r o g r a ~ e is r respo~siblefor m& calls, since there is no compiler that checks the calls. Incorrect usage library occurs, for example, when a process sends a message containi~gan integer value, while the receiver expects a oati in^ point value. The type error will not be caught, and will result in incorrect and unpredictable program behavior, This approach is comparable m i n in assembly, where the library calls are regarded as machine inst~ctions ing systems are designed as relatively smallextensions to existy are implemented using compilers that do check for errors such as the one described above. Compilers for such languag~susually are fairly simple, how-
ever, and mostof the languageimplemen~ationconsists of a complex run-time system. An interesting area where compilers do play an important role concerns languages with implicit parallelism. In most languages, the program~eris required to express parallelism explicitly, and to use statements to coordinate the parallel activities (for example by using message passing). In languages with i lel~sm,on the other hand, the compiler tries to parallelize the program auto enerate communication code. Such compilers are very hard to build and ers. We study such compilers in are someti~eseven referred to as 9.7. One problem in studying parallel run-time systems is that the boundary between the Operating System (OS) and the run-time system is often fuzzy. Many p~mitives,such as processes and c o ~ u n i c a t i ocan ~ , be implemented either in theOS or in the r u n - t i ~ esystem.Arecenttrendis to movefunctionality from the OS to therun-timesystem. The advantages are that it saves expensive interactions with the OS and that it gives the compiler writer more control over the implementation of the functionality. An extreme form used in some systems is to map devices, for example the network interface, in user space and to let the run-time system control the device without any involvement from the OS. In our discussions, we will always make clear which functionality we assume the OS to provide.
A parallel programming model provides supportfor expressing parallelism as well as communication and sync~onizationbetween parallel tasks, Below we briefly describe the programming models used in this chapter.
Perhaps the simplest parallel p r o g r a ~ i n gmodel is that of a collection of processes com~ u n i c a t i ~through g shared variables. A ro~essis an abstraction of a physical processor. Each process executes program code seq ntially. Conceptually, a process may be thought as containing a virtual processor, which supplies the processing power, and an ace, which stores data; so an address space is an abstraction of physical mem concepts of processandaddressspaceareusedinmanyotherparallelprogramming models (discussed later) as well. A program ca multiple processes to obtain parallelism. A new process is created with thefor fork sort(A,
1, 1 0 ) ;
which forks offa copy of sort ( ) as a new process and passes threep~ametersto it. With the s h a r e d - ~ ~ i a b l e p r o g r amodel, ~ i ~ g at least part of the address spaces of multiple processes can access the same variables. Such variables les. A given shared variable can be read and written by some or allprocesses,andcanthusserve as a ~echanismfor communicationbetweenthese processes.
An important problem with this programming model is synchronizing the access to thesharedvariables. If multipleprocesses simult~eouslytrytochangethesame data structure, the result will be unpredictable and the data structure may be left in an inconsistent state. A typical example is two processes that si~ultaneous~y try to increment a shared variable X by executing: X: shared integer;
x
:= x
4 1;
Assume that X initially contains the value 5. Clearly, if both processes increment the variable, the resulting valueof X should be 7 . What may happen instead, however, is that both processes will read theoriginal value (5) and compute the new value (6) based on the original value. Next, both processes will write this new value into X. As a result, the variable is assigned the value 6 twice, instead of being incremented twice. syn S needed that are To prevent undesirable such behavior, make sure only process oneaccess can a certain given time. This form of synchronization is called mutual exclusi . A simple primitive e, which has indivisible operations to acquire (set) and release the lock, to acquire a lock that has already been acquired, this process will block until the lock has been released. Lock variables can thus be used to restrict the access toa given shared data structure to a single process only. The example above would be written as follows using lock variables: X:
shared integer; X-lock: lock;
Acquire-Lock
x
:=
x +
(X-lock) ;
1;
Release-Lock (X-lock);
Now,onlyone of theprocessescanexecutetheincrementstatementat a giventime. ~nfortunately,this method is rather low-level and e~or-prone.For example, the programmershouldguard all accesses to shared variables by locks. If the increment statement occurs in several places in the program, and one occurrence is not protected by lock statements, the programis still in danger of behaving incorrectly. A morestructuredandhigher-solutiontothesynchronizationproblem,whichwe oRito~is similar to an Abstract Data Type in that it use in this chapter, is the monitor. is a type that contains dataand the operations to access the data. The dataencapsulat~dby a monitor are sharedby multiple processes. The key idea of a monitor is to allow only one operation inside the monitor at any given time. Thus, all monitor operations are executed in a ~ ~ t u a lexclusive ly way. exclusion, a second form of synchronization is needed, called R, With condition synchronization, a process can be blocked specific condition occurs, for example until a certain process has produced a result that is , With monitors, condition synchronization usually is which are shared variables of type Condition,on whichtwoindivisibleoperationsaredefined: wait ( ) and s igmal ( ) Theoperation wait ( c ) on a condition variable c blocks the invoking process; a signal ( c ) wakes up
one process blocked in a wait on the same condition variable. The two primitives can only be invoked from within a monitor operation. An important feature of the wait operation is that a process that blocks in a wait operation is temporarily lifted out of the monitor, so another process may enter the monitor. A simple example of a monitor is given in Figure 9.2. The monitor contains a variable representing a ‘bin’ and operations to put data in the bin and take data out of it. The p u t ( ) op~rationblocks if the bin already contains data and the get ( ) operation blocks if the bin does not yet contain data. When the operations wake up after being blocked, they recheck the status of the bin, because the status may have changed again since the signal operation was issued.
monitor BinMonitor; bin: integer; occupied: Boolean := false; full, empty: Condition;
(x: integer); operation put begin # wait if the bin already is occupied while occupied do ; wait (empty) od; # put the item in the bin bin := x; occupied := ,true; signal (full); # wake up a process blocked in get end; ; operation get(x: out integer) begin while not occupied do # wait if the bin is empty wait (full) ; od; x := bin; # get the item from the bin occupied := false; signal (empty); # wakeup a process blocked in put end ;
end ; Figure 9.2 An example monitor.
The sharedvariable p r o g r a ~ i n gmodel is primarilyintended for programmingshared memory multiprocessors, machines with physical shared memory. The model is less suitable for multicomputers, since such machines do not have a shared memory for storing shared variables. An alternative programming model, message passing,is suitable for both
mu~tiprocessorsand multicomputers. passing, each process can access onlyits own local data. Processes can exch sending messages other. to The basic model uses two primitives, t and the receive stat t, as illustrated below: the S processl: send (process2, message) ; process2: receive (process1 , message) ;
ere, the first process sends a message to the second process. The call to receive ( ) second process until the message has arrived. The formatof a message depends ramming system. In low-level systems a message usually is just an array of igher-level languages, a message can be a record-like structured value, with fields of different types. any variations of this basic sendreceive model exist. One issue is how the sender eiver address each other. In the basic model, the sender and receiver specify the narne of each other. A more flexible method is to let the receiver accept messages sent by anyprocessintheprogram.Thismethod is useful if the receiver doesnotknowin advance who is going to send the next message. Another way of i ~ c r e a s i ~flexibility g is to avoid specifying process names at all, by using indirect names. Many languages use n a ~ e sfor this purpose. A message sent to a specific port is delivered at a process issues a receive statement on the corresponding port narne. It is up to the system to match ~ ~ n d eand r s receivers. Another important issue concerns the question of when exactly the sending process is allowed to continue. With as~nchrono~s sender the contin~~es immediassing the sender waits until thereceiver has accepted the messa~e. Sync~onous message passing is more restrictive, but it has the advantage that when the sender continuesit knows that the message has been delivered. ]In some languages, the receiver is given control over which messages to accept. For ~ ~ a m ~the l ereceiver , may specify which kinds of messages it wants to accept, or in which order to handle messagesif multiple messages are available: receive print(size, text) suchthat size c 4096; # only accept print messages with a small size
or receive print(size, text)by size; # order the messages by increasing size
e have so far assumed that messages are received by an explicit receive statement. Analternativemethod,called i ,is tocreate a new ~~r~~~for each incoming ss, with its own pro~ramcounter and stack. the context of a process and can access the address spaceof this process: a thread does not have an address space of its own. This thread will execute a ,a routine that is definedby the p r o ~ r a ~for e reach type of message. After handler, the thread terminates. With implicit receipt, multiple threads may cess. All these threads can access the
of that process. Threads can also be used for other purposes. For example, if a process wants to send a request to a remote process, it can create a separate thread to senda message and waitfor the result. In the mean time, the original thread (for example the main program) can continue processing. Threads have become an important concept in many parallel pro 9.2, ming systems, and will be discussed in Section In this bookwe assume that the threadsof a process execute ina pseudo-parallel way, each that means which e (single) processor on thisruns; Alternatively, called is also multiple memory multiprocessor could be assigned to the process, in which case the threads can executephysically in parallel. If thesynchronization ~echanismsdescribed for shared variables (for example locks and monitors) are usedfor synch~onizingaccess to the global variables of the process, the program will execute correctly in both cases,
Sharedvariablesandmessagepassingarelow-levelmodelsthatdirectlyreflectthe shared~~emory and distributed- memo^ machine architectures. Many other parallel prog r a ~ n languages g have been designed that are based on more abstract models. Examples include parallel functional, logic, and object-oriented languages. In this chapter, we will discuss the latter category, which contains a large ~ u ~ b ofe langua~es r ( ~ i l s o nand Lu, 1996). The key idea o f object-oriented program~ngis to ‘encapsulate’ the data in objects. The data inside an object can only be accessed through on the object. Other impo~antconcepts are classes, i andGrune, 1994), butweneednotdiscussthesehere.One of thegreatestadvantages claimed for object-oriented p r o g r a ~ i n gis that it resultsin well-st~cturedprograms, making the technique suitable for writing large programs and easing the reuse of software components.Theseadvantagesare just as important for parallel p r o g ~ a ~ ~ as i nforg se~uentialprogra~ming, whichexplainsthe large interestinparallelobject-oriented languages, ~ ~ a l ~ e l can i s mbe introduced by allowing several objects to executeat the same time, possibly on different processors, by letting a process execute inside the object. Comunication between objects is expressed through operation invocations:an object can invoke an operation on another object, possibly located on a remote processor. Operation invocatio~ is similar to message passing, but it is more cleanly integrated in the language sema~tics, Just as with message passing, there are many alternatives for invoking and servicing an operation.Synchronousinvocationswaituntiltheoperationhasbeenexecuted,while asynchronous invocations continue imme~iately.The receiving object can accept invocations explicitly or implicitly. Many parallel object-oriented languages allow the process inside an object to consist of multiple threads of control. A popular model is to have one threadfor the main process of the object and to createan a~ditionalthread on the flyfor each operation invocation; this implements implicit receipt, ~ynchronizationof these threads can be expressed using a any parallel object-oriented languages are based on monitors.
Anotherapproachtoobtain a morehigh-leveltract p r o g r a ~ i n gmodel is throughan appropriate com~unicationdata structure:the ace. TupleSpace wasdesignedas system. Linda is a small set of simple primitives that can be added to an existingsequentiallanguage,resultingin a newparallellanguage.This idea hasbeen applied to several base languages, resulting in parallel languages like C/Linda, F~RTRANLinda,and LispLinda. The Tuple Space is a conceptual shared memory that is addressed ‘associatively’, as follows. Tuple Space can be regarded as a box of tuples -records -that can be accessed by all processes in the program, regardless of the processor on which they run. In this sense, Tuple Spaceis a shared memory; the pointis, however, that TupleSpace can also be implemented reasonably efficiently ona dist~buted~memory system, as we will see. Three operations are defined on Tuple Space:
-o u t adds a tuple to theTuple Space; -read reads a matching tuple in TupleSpace; -in reads a matching tuple in Tuple Space and simultaneous~yremoves the tuple from Tuple Space. For example, using C/L,inda, thecall o u t ( l l i t e m l l ,4 , 5 - 4 8 ) ;
generates a tuple with three fields (a string, an integer, and a floating point number) and deposits it in Tuple Space. The read ( ) and in ( ) operations search for a tuple in Tuple Space. For each field of the tuple, they can specifyeither
eter, which is an expression passedby value, or ter, which is a variable preceded by a ‘?’symbol and which is passed by reference. The terns ‘actual’ and ‘formal’ are used with this meaning in the Linda literature; they should not be confused with the traditional use of ‘actual and formal parameters’ in prog r a m ~ ~ nlanguages, g For example, the call float f ; i n ( l f i t e m ’ l ,4 , ? &f)
;
specifies two actual parameters (a string and an integer) and one formal parameter (of type float). This call tries to find a tuple T with three fields, such that:
-the actual parameters in the call have the same types and values as the corresponding fields in tuple
-the formal parameters in thecall have the same types as theco~espondingfields in tuple T.
Thisprocessiscalled t . If a tuple T is foundthatmatches,anyformal parameters in the call get the values of the co~espondingtuple fields of T‘, and the tuple T is removed from Tuple Space;all this is performed as an i ~ ~ i v i ~ui~~eZr ~e t i o also n ( called ~ t u ~ action), i c If the matching tuple was generatedby the call
o u t ( l f i t e m l l ,4 , 5 . 4 8 )
;
the variable f will get the value5.48. If more than one tuple exists in the TupleSpace that i n ( ) or matches,one is selectedarbitrarily. If nomatchingtupleexists,thecallto read ( ) blocks: the calling process is suspended until another process adds a matching tuple. The Tuple Space primitives are i ~ ~ i v i ~ in i ~the Z esense that if two processes simultaneously try to i n the same tuple, onlyone of them will succeed; the otherone will block. Linda does not contain any primitives to modify a tuple while it is in Tuple Space. Instead, thetuplemustbetakenout of Tuple Space, modifiedlocally,andthenputback. The advantage of this approach is that it automatically provides mutual exclusion synchronization on the Tuple Space. As an example, suppose we want to increment the second field of the tuple shown above. We can then use the following code: intn; float f; i n ( l l i t e m l l , ? & n , ? &f) ; o u t ( I 1 i t e m l 1 ,n + 1, f ) ;
The first statement gets the tuple and removes it from Tuple Space. The second statement increments the second field and puts back the new tuple. If two or more processes try to execute this code at the same time, onlyone process will succeed in inning the tuple. The others will block until the tupleis back. This achieves the right synchronization behavior. The above example shows how simple shared data structures can be built in Tuple Space. More complicated data structures can be built using a similar approach. The Linda model resembles the shared variable model, but an important difference is that Tuple Space is addressed associatively. A tuple does not have an address, as a word in memory has. The read ( ) and i n ( ) primitives specify a description for a tuple, and the system has to find a tuple that matches this description.
Another important parallel programming model is . With data parallelism, all processors execute the same algorithm (and code), but operate on different parts of a data set, which usuallyis an array. In contrast, in a parallel program using processes, different processors can execute different algorithms. The latter kind of parallelism is generally known as t Data parallelism is more restrictive and less flexible than task paralleli easier use, tosince only single a algorithm has to be defined for all processors. A usual approach is to let the programmer define ~~~t can be executed in parallel, and let the compiler ist tribute the computations and the data among the different processors. Some parallel languages do this by providing built-in data-parallel operations. Fortran 90, for example,supportsoperations on matricesthatcanbeexecuted in adataay. The simplest explicit language constructfor data-parallel programming is the . For example, matrix ~ultiplicationcan be expressed as follows
parfor i := 1 t o N do parfor j := 1 t o N do C [ i , j l := 0 ; for k := l to N do C [ i , j l := C [ i , j l od;
od ;
-t
A[i,
j l * B [ j , kl ;
od ;
This example specifies that the steps of the two outer loops can be executed in parallel, leaving it to the compiler to actually dist~butethe com~utationsover the different processors. An important advantage of data p~allelismis that, due to their simple structure, d a t a - ~ ~ a l programs ~el are easier to analyze by a compiler. We will discuss this in more ection 9.7, when we look at automatic p~allelization. Figure 9.3 summarizes the five ~ ~ a ~programming l e l models.
process 1 r
ite
r
ite
1
process 2
l I
I
message passing
shared variables
objects parallel
process 1 T
T
Space
Tuple
dat~-parallel
i ~ u r e9.3 S u ~ ~ of ~ the r yfive parallel p r o ~ r amodels. ~ ~ i ~ ~
n our discussion of the implementation of the five parallel progrmming above. All these models use processes and many also use threads in addition to processes. "herefore, in this section, we first discuss the implementation of processes and threads. "he first i m ~ o ~ a thing n t to note is that modern o p e r a ~ nsystems ~ already provide a process a ~ s t r ~ c t i oThis ~ . should come as no surprise, since a process is essentially an abstraction of a ~hysicalprocessor, and it is the task of the US to provide abstractions of hardware entities. Unfo~unately,using 0s processes for implementing language-level , switch from one OS process processes often has a hi h overhead, In p ~ i c u la~context to another is often exp sive, since it requires intervention from the 0s kernel. Also, creating 0s processes is usually expensive, because such processes are heavyweig~t;they carry a large mount of state i n f o ~ a t i o such ~ , as open file descriptors and accountin info ~a t i on. An alte~ativeway to implement langua~e-levelprocesses is to use a single 0s process per physical processor, and implement all language-level processes on that processor as threads. This approach is used in many parallel l a n ~ ~ a g esince s , it often is far more efficient than using multiple 0s processes. If the language itself also provides a threads uage-level threads can then be implemented in the same way as lan Below, we discuss how to implement threads. Many - but not all - opera~ngsystems provide threads in addition to processes, A t ~ carries ~ much e less~ state~i n f o ~ a t i o nthan a process, so it is more lightwe threads are managed by the OS, all thread operations (creation, context switchi have to go through the OS, which is still expensive. An alternative solution i s to implement threads in user space. In general, this is easy to do, and in fact m thread packages exist. The only complicated operation is a thread c requires saving the current program counter, stack pointer, and them with the previously saved values from another thread. Since this operation cannot be expressed in most high-level languages, a few lines of assembly code are usually needed for i m p l e ~ e n tthread i ~ ~ context switches. A fundamental problem with user-level threads concerns blocking calls to the 0s kernel. If one thread invokes a blocking operation (for example a receive statement), another thread in the same process should be allowed to continue. If the 0s does not know about the existence of user-level threads, however, the 0s will block the entire process, thus preventing other threads from ~ n n i n Fortunately, ~. several solutions to this problem have been invented (Anderson et aE., 1991). he basic idea is to let the 0s h o w about the existence of threads, but to avoid involving the OS in each and every thread operation. This idea, called sche s, is supported in several modern operating systems, for example Solari in the remainder of this chapter that the run-time tern uses user-level threads. Each user-level thread is represe~tedby a , which has room to save the status i n f o ~ a t i o n(see Figure 9.4). Depending on the threads package, the control block may have additional fields. In the example, the Stack 1i r n i t s field i~dicatesthe beginning and end of the stack that is assigned to the thread, The Status field indicates
whether the thread is ‘running’ (it is the current thread), ‘runnable’ (it is not the current thread, but it can execute), or ‘blocked’ (itcannot execute because it is suspended).
Program counter
Stack limits
Figure 9.4 Structure of a thread control block.
Sometimes a threadspackagecontains a se der threadthat d e t e ~ i n e swhich ~ n n a b l ethread should be allowed to execute next. The scheduler uses the status fields to d e t e ~ i n ewhichthreadscanexecute.Ifmultiplethreadsarerunnable,twoscheduling regimes are possible: either the current thread executes until it (voluntarily) blocks, or the ts, therunningthread after a certainperiod of timeandschedules another thread. The scheduler also takes care of doing the actualcontest switch to the new thread. A djsadvanta~eof using a special scheduler thread is that the context switches to andfromtheschedulerthreadaddrun-timeoverhead. Many thread packages therefore avoid using an explicit scheduler thread. Instead, a thread can call a library routine that selects the next thread to run and does a context switch directly to the selected thread. T ~ r e a dcontrol blocks sometimes need tobe manipulated as list items. A monitor, for example, can maintaina list of threads that want toenter the monitor, A s another example, the scheduler thread may use a list of runnable threads, so it does not need to inspect the S t a t u s field of every thread when it wants to select another runnable thread. For this reason, control blocks usually contain oneor more pointer fields that are used tolink them into lists. This concludes our introduction tothe five parallel ~rogrammingmodels and our discussion of processes and threads. In the following sections we turn to the i ~ p l e ~ e n t a t i o n of the five models, where we will primarily look at the implementation of c o ~ u n i c a t i o n primitives.
On a shared- er no^ multiprocessor, a shared variable can beim~lementedsimply by storing it in the shared memory. The compiler has to know which part of the memory address space is shared, so it can assign shared variables to locations in shared memory, On many multiprocessors, all ofthe ~ e m iso shared, ~ so this task is trivial.On a distributed-
memory multicomputer, shared variables are of course much more difficult to implement. Techniqueshavebeendevelopedtosimulate a virtualsharedmemoryontop of a dist~buted-memorysystem(LiandHudak, 1989; Keleher et d., 1994). Justasvirtual memory moves pages between main memory and the disk, virtual shared memory moves pages between the memoriesof different processors. This techniqueis outside the scope of this book, however.
The main difficulty with the shared-variable model is how to implement the synchronization p~mitives.Let us first look at lock variables. A lock variable has two operations, to acquire and release the lock, that look deceptively simple. The hard problem, however, is implementing these operations ~ ~ ~ ~ Ifvseveral ~ ~ threads ~ ~ try Z to y execute . an operation on the same lock variable, only one of them should be allowed to execute at any one same time, This problem has been studied extensively in the operating systems literature. One solution for making the primitives indivisible is by implementing them in theOS, but this has a high overhead. Fortunately, most processors have hardware inst~ctionsthat an efficient user-level implementation of locks. A frequently used -set. This instruction checks if a given variable is equal to zero, and le to one; either way, the original value of the variable is returned. The hardware guarantees theinstl~ctionto be indivisi~le. With such aninst~ction,it becomes possible to implement indivisible operations on a lock. A lock variable holds a zero if it is free and a one if it is taken. To acquire the lock, a test-and-set instruction on this variable is executed, If the result is zero, the lock was free, and the variable is now set to one. If multiple threads try to acquire the lock simultaneously, only one of the test-and-set inst~ctionswill return zero, since the instruction is indivisible; the others will see that the variable was already set to one. Thus, there can be no confusion about which thread obtained the lock. If a thread fails to acquire a lock, it should block until the lock becomes free; this happens when it is set to zero again by some other thread. A simple way to implement this ’ is torepeatedlytryacquiringthelock.Thisapproach is calle ). It wastes CPU time, however, which in general should b solution is to maintain a list of blocked threads with each lock variable, as shown in Figure 9.S. With this approach, when a thread needs to block on a lock, its control block is added to thislist, its status is set to ‘not runnable’, and another (runnable) thread is selected by the scheduler (see Figure 9.Sb). Whenevera thread releases a lock, it can make the first thread on the list ‘runnable’ again, so the scheduler will eventually select it for execution again (see Figure 9.5~).Note that releasing a lock now becomes more complicated than just resetting the lock variable to zero; for further details see Exercise 9.4.
Let us now look at the second synchronization p~mitivedescribed in Section 9.1.1:the monitor, A monitor is a higher-level mechanism than a lock. It is usually implemented with some support from the compiler. In particular, the compiler helps in making all moni-
lock variable
Thread control blocks I
5,
l
not
runnable
I
5.1 not
I
I
runnable
I runnable
1
ure 9.5 A lock variable containinga list of blocked threads.
tor operations mutua~lyexclusive, First of all, it automatic all^ generates a lock variable for of each monitor, called the mo itor lock. Also, the compiler generates code at the start every oni it or operation to acquire the monitor lock; likewise, it generates code at the end of the operation to release the lock. These statements assure that only one thread can be in the monitor at a time; if multiple threads try to execute an operation on the same monitor, all but one will be blocked on the monitor lock. The compiler also generates code to im~lementthe wait and signal operatio~s. For thispurpose, a separate list is used for each conditionvariable,containingthecontrol blocks of threads that are block~din a wait operation on that condition variable. The wait operation adds the control block of the current thread to this list, marks the current thread as ‘not runnable’, releases the oni it or lock, and finally allows the scheduler to resume of thethreadsinthe list runnableand r thread. A signaloperationmakesone removes its control block from this list. Depending on the precise semantics of the monitor, a signal operation may or may not do a context switch to the signaled thread. If no context switch is done, the signaled thread will eventually be selected by the scheduler.
When that moment arrives, the thread will continue executing its operation.
first obtain the monitor lock again, and then
on a distributed-memory discusshowtoimplementmessagepassin nicationprotocolsprovided implementationdependsonwhich of the CO by the o~eratingsystem is being used. Many langu es are implemented on top of a reliablecommunicationprotocol,such as TGPDP. F efficiency reasons,somelanguage imple~entersprefer a lighter-weight, unreliable protocol, such as UDP, or even use their own c o ~ u n i c a t i o nprotocols. An unreliable protocol usually is more efficient than a reliable one; this i ~ p l i e that s it obtains a lower latency and higher throughput. The disadvane is of course that the languageru~-timesystem then needs to implementits own protocol to make the langua~e-l~vel operations reliable. Since this protocol can be tailored to what the run-time sys rall p e ~ o ~ a n mcaey be better than that on top of a reliable OS protocol In the extreme case, some languages are implemented by letting the work device in user space, and runnin~the entire network protocol as part of the run~time system. This approach only uses the OS during program initialization (to map the network device),butavoidstheOS for communication, so it bas a lowoverhead ( et al., 1 9 9 ~ ) .In g ,themorefunctionality is implementedinuserspace,the mo ~exibilitythe r u n ~ t i ~system e has, This often results in p e r f o ~ a n c egains, but at the cost of a more complicated ~ n - t i m esystem, For our discussion, however, we assume that the e s y s t e is ~ built on topof a reliable c o ~ u n i c a t i o nprotocol providedby the OS, ven with this simplificatio~,there still are many issues the run-time system has to The issues we discuss are: locating the receiver, marshaling, type chec~ing~ selection. For the purpose of our discussion, assume we have two processes, Receiver,that are ~ n n i n gon. different processors and that want to comeach other as follows: Process Sender : A, B: array[1..100] of float; initialize A and B; send multiply(A, B) to Receiver; Process Receiver: A, B: array[1..100] of float; receive multiply(A,B) ; multiply A andB, print result;
Process Sender thus sends a message multiply (A, B) with two arrays of 100 floate ing point numbers to processR ceiver;process Receiver waits for such a ~ e s s a g to arrive and then services it.
The first question to answer is how the sender manages to get the message delivered at the rightdestinationprocess.Atthelanguagelevel,identifierssuchas Sender and Receiver areusedtoidentifyprocesses,butthe OS communication protocol clearly does not h o w anything at all about such identifiers. Typically, OS protocols only h o w about processor names. Thus, thefirst task of a run-time system is to maintain an administration of where each language process runs. In languages that im lement Ian ua e-level esasthreads, an often-usedapproach is tohaveone per processor. A message intended for process Receiv daemon of the processor on which Receiver runs, using the OS c o ~ u n i c a t i o nprotocol; the daemon then hands the message to the thread that implements the destination process (as describedlater).
e over the network. In the example above, the message is a st~cturedvalue containing two arrays of floating point numbers. "he OS communication protocol, however, does not h o w about data types like arrays or ~oating point numbers. A typical protocol merely provides a primitive to send and receive flat rs, whichareuntyped mays of bytes. Therefore, procedures are needed rt a language-level message to a flat buffer at the sending side vert such a buffer back at the receiving side. Such p~oceduresare known as routines. The conversion process is also called ( An example of these conversions is shown in Figure 9.6. Were, a record has to be marshaled that contains the string It John1'and the integer 26. The message buffer containing the marshaled data starts with a length field ina fixed format (4 bytes), denoting the number of bytes to follow (9). Next, the string is stored in the buffer, in null-terminated ASCII form. Finally the four bytes of the integer value are stored in the buffer, with the least significant byte at the rightmost (highest) address, which also happens to be the format used by the sending processor. At the receiving side, the string and integer value are extracted from the buffer. Note that, in the example, the receiving processor uses a different data representation for integers than is used for the buffer, and stores the least significantbyteatthe Z e ~ ~ address. ~ s t Animportantadvantage of marshaling is thatsuch differences in machine representations can be masked. All thatis needed is that the sender and receiver agree upon the fomat of the marshaled data in the buffer. As an optimization, it is also possible to always store the data in the format of the sender, and includea description of this format, for example specifying the byte order in the buffer. The receiver can then check whether the format in the buffer is different from its own format. This scheme avoids doing conversions when the sender and receiver have the same data format, In a library-based programming system, the user often has to write the marshaling routines by hand. With parallel languages, the routines are usually generated automatically by the compiler from the type definitions of the values sent and received. The marshaling routine for a message containing simple data st~ctures,for example arrays, is easy to generate. The routine simply allocates a buffer of the right size and copies the mays into the buffer.Forcomplicated,pointer-baseddatastructures(suchaslists,trees,andgraphs),
name age
3b
=26
0000001~
J1
John"
It
marshaling
00 00 00 09
J
o
h
n \O 0 0 0 0 0 0 1A
P
4-byte length unmarshaling ~~
LAOOOOOO
I It
=2 6
John"
Figure 9.6 Marshaling.
however, automatic generation is not always possible, because the compiler cannot know what constitutes the entire data structure or how to deal with cycles in data structures. Many languages therefore restrict the data types that can be used in messages to scalars, records, and arrays, and disallow data types that contain pointers. An interesting approach to marshaling i s taken by Java RMI, the Java Remote Method Invocation. Any object can be marshaled automatically, provided its class implements the S e r i a1izabl e interface, The Java library contains generic routines that can marshal and unmarshal any serializable object, The routines are written in Java and check dynamically what the types of the object parameters are, using an object-oriented feature called reflection. This approach is highly flexible, but unfo~unatelythe run-time inspection also has a ~ e ~ e n d o uoverhead. s It has been shown (Maassen et al., 1999) that s~ecialized compiler-generated routines can be orders of magnitude faster.
Another important advantage of using a parallel language rather than a library is that the compiler can type-check messages, in the same way it type-checks routine calls. In the example above, the compiler can check that multiply messages always contain two arrays of 100 floating point numbers each. Without such a compiler-check, the sender could also send, say, one array of 400 integer numbers. The discrepancy would go unnoticed, but would clearly result in erroneous results, because the receiver will interpret the incoming stream of bytes as floating point numbers.
After m~shalingthe message, the next step is to actually transfer it over the network. As stated before, we assume that the operating system providesa reliable communication protocol to achieve this,so we do not discuss the detailsof such a protocol here. Once a messagehasbeendeliveredatthedestinationprocessor,typically is tohandthemessageto a local municationdaemonthread,thenextstep on the language, this may be a complicated task, because programmers often are given control over the order in which incoming messages are serviced, In the simplest case (used in our example above), the receiver only indicates the typeof message it wants to receive (such as a multiply). In this case, the run-time system can store incoming messages in separate lists, based on the message type. Each receive statement then blocks until the listfor the specified typeis non-empty. Some languages, however, give the user more flexibility about which messages to accept, and in which order. In SR, for example, the receiver can specify a predicate on the message: receive rnsg(A, N) suchthat N z: 50;
In this case, the ~ n - t i m esystem has to traverse thelist of i n c o ~ n gmessages of type msg and check if there is any message whose second field is smaller than 50. hene ever a new messagearrives, it also hastobecheckedinthisway.Features like thesemake the more expressive, but they clearlyalso complicate the language implementation. r discussion so far hasassumedthatmessagesareacceptedusinganexplicit ~onceptually,this tatement. An alternativeform is implicitmessagereceipt. oach creates a new thread of control within the receiving process that executesa meshandler. To implement this, the run-time system may indeed create a new thread of control for each message. Since messa e handlersgenerallytake little executiontime, however, the overheadof creating and destroying these threads will be high. To reduce the overhead, several techni~uescan be used. One idea is to have the corn~unicationdaemonthread execute themessage han~lerfor each incomingmessa without creating a new thread. Althou~hthis will work in most cases, sometimes a message handler routine may need to block. For example,it may want to acquire a lock that is already taken. Clearly, the comm~~nication d a e ~ o nthread should not block, sinceit would prevent other incoming messages from being serviced. If message handlers are allowed to block, it follows that they need to be executed b A second and more general idea is use to a ndling threads. When a thread has finished executing a handler, it is r e t ~ ~ to e dthe pool instead of being destroyed. When another message comes in, it can be handled by one of the threads in this pool. If the pool is empty when a message arrives, a new thread is created. In this way, many expensive thread creation and destruction operations can be avoided.
ilar imple~entationtechniquesas imper~tivelanguages. Aswe have seen, man^ object-oriented lan
based on ideaslikethreadsandmonitors,whoseimplementationwehavealready elow, we discuss three additional implementation techniques that have been developed more specifically for object-oriented languages: how to locate objects, how to migrate objects, and how to replicate objects.
The first issue we discuss concerns namin and locating of objects. A parallel or distributed program uses many objects, located on several processors which can access each other. Typically, an object in such a program is defined by an o An OID is similar to a pointer. Unlike noma1 pointers in C or Pascal, however, an OlD is valid network~wide. An object on one processor may contain a reference to an object on another processor. There is no syntactic distinction between invoking an operation on a local object -an object on the same processor as the invoker -or on a remote object. In other words, concu~ent object-o~ented languages provide a global, progra~-wide,name space for all objects. A keyquestion is howtoimplementthisglobalnamespace for objectsona distributed-memory machine. A traditional pointer, as used for example in C or Pascal, is represented as a memory address; such a pointer, however, only has meaning for a single address space. An OID, on the other hand, identifies an object on any processor, independent of its location. Typically, some bookkeeping is used on each processor to map an OID onto a physical processor and a memory address within that processor. For example, an OID can be represented as two words:one identifying the processor on which the object resides and one containing its address within that processor (assumingeach processor runs a single process). As an example, suppose an object x: contains a reference to another object Y. The reference of X to Y is then stored as a &word OID, as shown in Figure 9.7’. Suppose x: invokes an operationon Y, as in: Y ->Determinewvalue( llDiamondll );
The run-time system then first determines on which processor Y is stored, by lookingat the first word in the OID. Next, it sends a message to this processor containing the operation, the address, and thep ~ a ~ e t e r s . Unfortunately,treatingeveryobjectinthisway is veryexpensive.Forremote objects, the overhead of determining the location of the object is small compared to the time needed to issue a remote operation, For local objects, however, the overhead can be substantial. One solution is to require the programer to distinguish between local objects -objects that cannot be accessed from a remote processor -and shared or global objects. In Java, for example, remote objects have to i~plementthe Remote interface. Another solution, chosen in the Emerald system, is to let the compiler figure out which objects are used only locally; the others are assumed to be shared.
CPU
4
CPU 7
CPU address
x
:
'
I address 4000:
igure 9.7 Using object identifiersfor remote references.
allow objects tobe moved dynamically from one procesanother. technique, to This sorseveral called has advantages. For example, suppose an object X is going nce of operations on a remote comm~nication.An object Y. Without object migration, each invocation would result in alternative schemeis to first migrate object Y to the processor on whichX runs, then invoke all operations there, and finally, if desired, migrate object Y back to its original location. If Y does not contain much data, this scheme may result in less c o ~ u n i c a t i o noverhead. Object migration can also be used to improve the load balance of a program. The Emerald language provides several notations for expressing object ~ g r a t i o nand for requesting the system to keep certain groupsof objects together on the same processor. To implement object migration, several challenging problems have to be solved. The data of the object have to be transferred to its new location, using the marshaling and ~nmarshalingtechniques described earlier. For objects containin one or more threads of control, these threads also have to be moved, which is co~plicated.Finally, if objects are allowed to migrate, it becomes far more co~plicatedto locate an object, given its OID. Ourschemedescribedaboveassumedthat an OID containsthelocation of the object, encoded as a processor number and a local address. If objects can migrate, this scheme no longer works. One solution to this problem is to keep track of all references to each object. Ifan object X migrates from processor A to processor B, all references to x: are then changed on all processors. ~nfortunate~y, maintaining all this i n f o ~ a t i o nin a coherent way is very hard, An alternative s c h e ~ that e does not need thisi n f o ~ a t i o nworks as follows (see Figprocessor on Aupdated. is ure 9.8). Instead of updating all references, The run-time system on processor A stores a ess (processor B) for object x: (see Figure 9.8b). Whenever another processor, say C, tries to invoke an operation onX,
it will send a message to the ~ n - t i m esystem on processor A, because processor C is not yet aware of the new location of X. The ~ n - t i m esystem of processor A will notice that x: has moved and use the forwarding address for X to d e t e ~ i n eits current location (processor B); the message is then forwarded to this processor. The run-time system of processor B will execute the operation, return the result to processor C, and also inform the run-time system of processor C of the new location of X. All subsequent invocations from processor C on object X will go straight to processor B, as shown in Figure 9 . 8 ~ .In theory, an object can move multiple times, so many forwarding addresses may have to be followed to find a given object. For this reason, the technique is known as ~ o i ~ t ce ~ rasi~~.
machine C
machine A
machine B
Figure 9.8 Pointer chasing.
Besides object migration, some languages also support object re~lication.With this technique, copies of an object are stored in the local memories of several processors. The advantage of object replication i s that operations that read the state of the object but do not modify it can be executed using the local copy, without doing any communication. Thus, if an object is mainly read and seldom modified, object replication can substantially reduce the communication overhead (Bal et al., 1998). Some languages also use replication to make programs more fault-tolerant. A key issue is what to do when an object is modified. Clearly, if the operation is applied only on the local copy of the object, all other copies in the system will become out-of-date. Two solutions to this problem are possible. One is to i ~ v ~ Z i all ~ ~other te copies, by asking each processor to delete its copy. The problem with this is that a processor whose copy has been invalidated needs to get a new copy of the object when it again needs to invoke an operation on the object. If the object is large, this will have a high com-
munication overhead. The alternative is to ~ ~ all copies, ~ ~ for example t e by sending the operation andits parameters and applyi~gthe operation to all copiesof the object. The most fundamental problem with object replication in neral is how to guarantee consistency of all copies.We illustrate thisusingtheupdprotocol.Assumethat M 2 , simultaneouslywant to do an operation on an processes on twoprocessors,and object X that contains an inte r with initial value 0. For example, M I wantstosetthe it to 20, both using an operation assign. ( ) : integer to l0 and M 2 wants to M , : X->assign(lO); M 2 : X->assign(aO);
d M 2 both send their operationto all processors, thereis a risk that these messages e in a different order at different processors. Some processors may receive the from M I first and then the message from Mz, so the final value of the integer 20. Other processors may receive the messages the other way round, resulting in a final value of 10, As a result, th sol~ltionshave been proposed for taken in the Orca system is to use teesthat all messagesarriveat 1998). With this primitive, either processors receive the message from M 2 first. com~unicatio~ primitive can beimplemente~either by the OS or by a user-space protocol.
The next p r o g ~ ~ ~model i n g we discuss is the Linda Tuple Space. uple Space provides a ~articularlyinteresting challenge for language imple~enters. ecall that the Tuple is a kind of associati~eshared memo^. We will discusshowtheTuple<Spacecan be imple~entedon a dis~ibuted- memo^ system. A trivial way of im~lementin~ Tuple Space would be to store the entire Tuple Space as a list on a single processor (the ‘Tuple Space server’), and search this entire list whenever a tuple is read. Obviously, this approach will be highly inefficient. First, it requires a linear search of the entire Tuple Space. Second, every Tuple Space operation will corn~ u n i c a t ewith the Tuple Space server, making this processor a communication bottleneck. Thus, an efficientTuple Spacei~~lementation has to address two issues:
-how to avoid the overheadof associative addressi~ -how to implement the shared memory model
on a system that does not provide shared
memo^.
a combination of compile-time and r u n - t i ~ etechslates Linda ~ r o g ~ ~into m sthe tar sage of Tuple Space operations, as discussed below. tem that is l i ~ k ~with d the c o ~ p i l e d t a r ~lan et code. The preprocessor md thekernel cooperate to addressthetwoissuesdiscussed above.
We first discuss how to avoid the overhead of associative addressing of the Tuple Space model. This problem has to be addressedfor any implementation of Linda, whether it runs on a mu~tiprocessoror a multicomputer. To simplify our discussion, we assume here that the Tuple Space is implemented using the shared memory of a multiprocessor. A distributed imple~~ntation is discussed in the next subsection. The Linda preprocessor analyzes all calls to Tuple Space primitives in the program test. It pa~itionsall th calls into disjoint sets, called r t i t ~ ~ nThe ~partitioning . is doneinsuch a way thatple Space calls in different parnscannot affect eachother. For this purpose, the compiler looks at the number of parameters of the calls, the types of the para~eters,and the values of the parameters if they are compile-time constants.
int n, year; float f; .I.*
/* code initializing and n
f */
(1) in( rlOlympicyear", 1928) ; (2) in(llClassic movie", "Way out West", 1937); (3) in("C1assic movie", "Sons of the desert", 1933); (4) in("C1assie movie", "Sons of the desert", ? &year) ; (5) in("Popu1ar number1*,65536); (6) in(lrPopular number", 3.14159); , n) ; (7) in ("Popular number" (8) in( "Popular number!', f) ; F i ~ ~ 9.9 r e A demo program in GLinda,
Forexample,assume a CLinda programwiththedeclarationsand state~entsas igure 9.9. It is clear that the first two calls cannot affect each other, because they have a different num~erof parameters: the first call i n s a tuple with two fields and the second call i n s a tuple with three fields, so they access different tuples. The second and third calls both have three parameters, but the compiler can determine statically that the last two parameters are different, so these calls cannot affect each other either. Likewise, the fifth and sixth calls differ in their second parameter. The third and fourth calls clearly do affect each other, since the tuple ("ClaSsic movie","Sons of the desert", 1933) ;
can be inned by both calls. The last two calls are more interest in^, because they have actual parameters whose values are not known at compile time. Note that the parameters are variables, but since they are not preceded by a ? they are ~ ~ parameters t ~ that~are evaluated Z at run time. Still, the seventh and eight calls cannot affect eachother, because the types of their second
parameter are different: the seventh call features an integer parameter and the eight call a floating point parameter. It is not clear at compile time whether the fifth and seventh calls will affect each other, since that depends on the run-time value of the variable n. Likewise, the sixth and eight calls may or may not affect each other, depending on the value of f . In such cases, the compiler is conservative and puts both calls in the same partition. The compiler performs the analysis that we i n f o ~ a l l ydescribed above and uses it to generate the five partitions shown in Figure 9.10 for the demo CLinda program of Figure 9.9. In this example we have used i n calls everywhere, but the analysis is the same if o u t or read primitives (or some combination) are used. Partition l : (1)
Partition 2 : (2)
i n ( llOlympic year" , 1 9 2 8 ) ; i n ( I ' C l a s s i c movie1', "Way o u t Westf1, 1 9 3 7 ) ;
Partition 3 : ( 3i n ) ( l l C l a s s im c ovie", (4) i n ( I ' C 1 a s s i c movie",
lKSons of t h e d e s e r t " , 1933); 'lSons o f t h e d e s e r t f 1 , ? &year);
Partition 4 : (5) (7)
i n ( l f P o p u l anru m b e r t 1 , i n ( "Popularnumberll,
Partition 5 : ( 6 ) in("Popu1n a ru m b e r 1 ! , ( 8i n) ( I 1 P o p u l anru m b e r p 1 ,
65536) ; n) ; 3.14159);
f) ;
igure 9.10 The partitions for thedemo C/Linda program in Figure 9.9.
An immediate advantage of this analysis is that the entire Tuple Space can now be split up into different sections, one foreach partition. Since calls in two different partitions cannot affect each other, a call need only search the section of the Tuple Space that is reserved for its partition. So, instead of an exhaustive search of the entire Tuple Space, a call need only search a small part of the Tuple Space. The result of the Tuple Space analysis can be used for more advanced o p t i ~ i ~ a t i o n s as well. In particular, the Linda preprocessor analyzes all calls within a p ~ i t i o nand then ge that is optimized for that partition. We illustrate this determines a s t ~ r ~ re idea with several examples. Assume a givenpa~itioncontains the three callsin the following fragment:
string title, SomeTitle; int year, SomeYear; out ("Classic movie", "Sons of the desert", 1933) ; out ("Classic movie", SomeTitle, Someyear) ; in("C1assic movieff,? &title, ? &year);
So, no other calls in the program produce or consume tuples of this f o m , The tuples of this form will be stored in a separate section of Tuple Space. Since all tuples in this section of Tuple Spacewill always have the constant C 1 as s i c movie If as their first field, there is no need to do run-time matching on this field. In other words, the compiler can determinestaticallythat the first field of the o u t operationandthe i n operationwill always match, Hence, this field need not be stored in the tuple at run-time, which saves memory and the costs of a run-time string comparison. Also, the second and third parameters of the i n call are formal parameters of the right type, so they will always match the actual parameters. In conclusion, there is no need to do any run-time matching at all for the i n call. If tuples of this form exist, resulting from one or more o u t calls, the run-time system can just take such a tuple and return it. A simple way to implement this is to store the tuplesof this partition in a queue. The o u t call adds a tuple to this queue and the i n call waits until the queueis not empty and then returns thefirst element. So, the following data structureis used for this partition: Q: queue of record string title; int year;
A call like o u t ( C l a s s i c movief1, t i t l e , year) is translatedinto: enqueue (Q, [title,
year] );
andacalllike i n ( I f C l a s s i cm o v i e r 1 , ? & t i t l e , ? &year) istranslatedinto: [title, year] := dequeue(Q) ;
where dequeue blocks while the queue is empty. The result is that during run time the Tuple Space need not be searched at all: it is sufficient to do an operation on a queue data structure. The overhead of the associative memory model has thus been eliminated in this case. A s another example, assume theprogramer changes the i n statement into in(f8Classicmovie11,? &title, 1940);
to search for classic movies produced in 1940. This small change has a major impact on the representation scheme, Clearly, this call still belongs to the partition described above, since it has the right number and typesof elements. So, we now have a partition with these calls: out (IIClassicmovie", ltSonsof the desert",1933) ; out (lfClassic movie", SomeTitle, SomeYear) ; in(I'C1assic movie", ? &title, 1940);
he new in call cannot be translated into a simple dequeue statement, because it requires that the third field contains the value 1940. One solution would be to search the entire queue until a tuple is found whose last field contains 1940, ~nfortunately,if the queue contains many elements, asfor example in a complete list of all classic movies, this search mayt&e a longtime. The Lindapreprocessorthereforeswitches to analternative representation scheme for pa~itionslike these, based on a hash table. Each tuple that is added by the out operation is inserted into a hash table, using the last fieldof the tuple as a hash key. The in operation likewise uses the last field asa hash value, so a linear search is avoided, If both in calls are used in the same program, things become more complicated a ~ s s u the ~ eprogram contains the following twostate~ents; in("C1assic movieti,? &title, ? &year) ; ? &title, 1940) ; in ("Classic movie",
The problem is that the first call does not have an actual ~ a r a ~ e t(a e rvalue) that can be used for hashing. Instead, the first call must search the hash table. The hash table thus must have an interface that allows thisoperation, To su~marize,the Linda preprocessor partitions all Tuple Space calls in a program into disjoint sets and then selectsa representation scheme for each set thatis tailored to that set. The result is that a Tuple Space call seldom needs to search the complete TupleSpace,
~ e ~ o ~ultiprocessor, r y the entire TupleSpacecan simplybestoredinthisshared memory. On a distributed-~emorymachine,however,other represe~tationshavetobe found. al approaches exist to represent Tuple h is based on hashing. As described field that has a hash value associated withit. We have described how this hash field can be used to avoid searching the Tuple Space. In a distributed implementation, the hash field can also be used to d e t e ~ i n eon which processor to store the tuple. To continue our earlier example, a statement like out ( ticlassic
movie" , "The
MusicBOX" , 1932) ;
can use the last field (193~)as a hash value. Assume we have a bash unction h ( ) that returns a value between l and P, where P is the number of processors in the system. The above state~lentthen sends a message to processorh( 1932) and stores the tuple ("Classic moviet1,"The Music B O X " , 1932)
on that processor. (As described earlier, the first field can actually be opti~izedaway, but nore this optimizatio~for our current discussion,) In this way, all tuples in the etrandomlydistributed a ~ o all n processors, ~ thus re vent in^ oneprocessor from b ~ c o a~c on~ ~ u n i c a t i bottleneck. on en astatement like
in(!'Classicmoviei1,
?
&title,
1940) ;
is executed, a messa e is sent to processorh( l 940) to ask it for a tuple that matches thein call. A problem occurs if the program contains in calls that do not have an acttxd parameter for the last field, as in in("C1assicmovief1,
?
&title,
?
&year);
In this case, the Linda preprocessor can determine that the last field is not suitable as a hash value. The second field has the same problem. Instead, the preprocessor can It Cl first field as hash value. ~nfortunately,this means that all tuples that have movie as their first field will be stored on the same processor. If there are m a tuples, this processor may become a bottleneck. The hash-based solution thus tries to distribute all tuples randomly among all processors. The advantage is that it is less likely that any one processor becomes a bottleneck. The disadvantage is that almost all Tuple Space calls result in communication. Also, if a processor P1 wants to send data to another processor P 2 (using a tuple), chances are that the data got~roughanother (random) processorP 3, because the tuple containing thedata is hashed onto processor P 3. In this case, two messages (instead of one) are required to get the data from the sourceto the destination. A co~pletelydifferent approach than hashing is to use a or^ ~ i s t r ~ ~ which ~ti#~, comes in several variants. One option is to broadcast all out calls and store the new tuple at all processors. The entire Tuple Space thus becomes r e ~ Z i c ~ on t e ~all processor advantage is that read calls can be executed locally, using the local copyof Tuple so they do not require any c o ~ u n i c a t i o nat all. In calls can first check the local Tuple Space to see if a matching tuple is available, but they also need comrnunication to inform the other processors that they have to delete the tuple. The second form of uniform dist~butionis the inverse of the above scheme: all out calls are executed without doing comrnunication and the new tuple is generated in the local Tuple Space. The entire Tuple Space thus is ~ ~ r t i t i among # ~ e ~all processors. Whenever a tuple needs to be found (using read or in operations), the run-time system first checks tuple is available in the local partition. If not, it broadcasts a request to all ach processor then checks if a ~ a t c ~ i ntuple g is available in its p~tition.If multiple processors have a matching tuple, further co~municationis needed to select one of the tuples. Yet another unifor~dist~bution scheme organizes all processors in a tworid. If a processor P executes an out call, the call is sent to all processors in the same row as P. A read or in call is sent to all processors in the same c o Z of~ the ~ ~ sender. Since a row and a column always intersect at exactly one point,a read or in call will always find the tuple (at the processor at the intersection point) if it exists, To s u ~ ~ a ~many z e ,different representations of the Tuple Space on a distributed systern are possible. The best scheme in general depends on the application. For example, if an application frequently reads a given tuple (for example the global bound in a branchand-bound algorith~),the best p e ~ o ~ a n is c eobtained if the tuple is replicated. Thus, the first uniform distribution scheme, which replicates the Tuple Space, is most efficient. If
tuples are read infrequently, however, this scheme is less efficient. The best scheme also if thehardwaresupports multicast dependsontheunderlyinghardware.Forexample, (sending the same message to multiple processors), for example an Ethernet network, replicating the TupleSpace becomes more attractive. A general problem with this variety of implementation techniques is that the performancemodel is notclear to the p r o g r a ~ e r .Forexample,onsomeimplementations read operationsmaybecheapwhile i n and o u t operationsare expensive, On other implementations, o u t operations are cheap and read and i n operations are expensive. In other words, the performance of a Linda program will be hard to predict without knowing more about the underlying hardware and Linda implementation. This makes it hard to write programs that are not only portable but also efficient on a wide range of systems, Inconclusion,Lindaprovidesanabstract,high-level p r o g r a ~ i n gmodel.ImplementingLindaefficiently is a challengingtask,andmanynoveltechniqueshavebeen developed to avoid the overhead of associative addressing and to il~plementthe Tuple Space on a dist~buted-memorysystem. The high level of abstraction also makes it more difficult for the p r o g r a ~ e to r understand the performance behavior of the Linda primitives.Nonetheless,theLindamodelhashad quite someimpactonparallelprocessing research since its intro~uctionin 1985, and is still being studied and used. For example, JavaSpaces (Freeman et al., 1999) is based on a model somewhat similarto Tuple Space.
The techniques described so far in this chapter are intended for implementing languages with explicit constructs for expressing parallelism. An interesting question is whether we needsuchlanguages at all. Parallel progra~mingis more difficult than sequent~alprogramming, so for the programer it would be much easier if the language i~plementation would take care of the parallelization. The holy grail in parallel processing is to design a compiler that takes as input any sequential program written in a given language and produces as output an efficient parallel program. In this way, programmers could write their programs in sequential languages and would not be bothered with express in^ parallelism, communication, or sync~onization.In addition, existing sequential programs could be run on a parallel computer without any effort from the user. The key issue is of course whether such an automatically parallelizing compiler can be implemented. No such compiler exists yet, and it is hard to tell if one ever will. For many applications, clever ideas are requiredto parallelize them efficiently. papers and Ph.L). theses have been devoted to describin~such ideas for specific applications. It remains to be seen whether a compiler will ever be able to parallelize all these applications automatically. On the other hand, substantial progress has been made with automaticparallelization.Ratherthantryingtopredictthe future, we willdescribethe technical problems of automatic parallelization and illustrate some of the techniques that have been developed so far. Most of the work on auto~aticparallelization is targeted for specific classes of parallel architectures. We initially assume that the target machine is a sh~ed- memo^ multiprocessor.InSection9.7.4 we shall look at distributed-memorymulticomputers.Much researchhasbeendoneoncompilers for vectorcomputersandVeryLong Inst~ction
Word (VLIW) machines, but we do not discuss such work here. Also, we assume that the source program is written in a Pascal-like, imperative language. In the real world, most parallelizing compilers are designedfor Fortran.
The first question to ask is what kind of parallelism a compiler should try to exploit. In theory, it could look all over the input program and try to find sections of code that can be executedconcurrently.Unfortunately,thisoftenresultsinveryfine-grainedparallelism that is not suitable for most parallel processors. For example, the two statements A := B * C; x := Y f 2;
can be executed in parallel, butit is unlikely that the overheadof sending the statements to differentprocessorswill be outweighed bytheperformancegain of usingparallelism. Some machines (for exampleVLIWs) are designed to exploit such fine-grained parallelism efficiently, so for such architectures this approach makes sense. The gain to be expected is modest, however. To exploit larger-scale parallelism and to increase the grain size, a good idea is to look only at loops in the program, since thisis where programs usually spend mostof their time. This observation is especially true for numerical programs, Most paralleli~ingcompilersthereforeexploitloopparallelism.Asanexample,considerthefollowingstatements: p r i n t ( " s t a r t computing") ; f o r i := 1 t o 1 0 0 0 do A [ i l := sqrt ( i ); od;
p r i n t (A) ;
This code computes 1000 square-roots. All iterations can be executed in parallel, since the calls to sqrt are independent of each other and the results are written to different elements of array A. Moreover, since the calls are expensive, it is worth while to execute theminparallel.Assumingthecompilerknowsabout the sqrt function(forexample because it is built into the language) and knows that it has no side effects, it can easily determine that thecalls can and should be executedin parallel. A compiler could convert the above code to a parallel program as follows. Initially, one processor (for example CPU 0) starts executing the program. After executing the first print statement, it arrives at the for-loop. It then splits up the 1000 iterations among all processors. If there are, say, 10 processors, CPU 0 executes iterations 1 to 100, CPU 1 executes iterations 101 to 200, and so on. Each processor writes part of the array A, which is stored in shared memory. After all processors are done, they synchronize and wait until the last one is finished. Finally, CPU 0 prints the valueof A [], This execution model is used by most parallelizing compilers. Except for the sequential code (which is executed by one CPU), all processors execute the same code, but on different parts of the data. The model therefore is known as 'Single Program Multiple Data' ('SPMD') parallelism (Karp, 1987). We do not discuss the details of the work distribution
and synchronization here, but a s s u ~ ethe compiler enerates code for it, using traditional com~unication~ e c h a n i s ~ (for s example shared variables or ~ e s s a pass ~e needed. If different iterations do different amounts of co~putation,the w can become quite complicated, and advanced schedulin techniques may be needed to give ssor an equal amountof work. ral, p r o g r a ~ smay contai~a large n u ~ b e rof loops, so a relevant question is to p~allelize. If there are several consecutive loops (each with independent iterations) in the program, they can all be parallelized, one after each other. For example, the following code: f o r i := l t o 1 0 0 0 do A [ i ] := s q r t ( i );
od ; p r i n t (A) ; f o r i := l t o 5 0 0 0 do B [ i ] := a r c t a n ( i ) ;
od; p r i n t (B) ;
jusl applies the technique described earlier to both loops in turn. If the loops are ‘nested’ inside each other, however, we have a different situation. For e x a ~ p l e consider , the folcode, which adds the matrices A [ ] and B [ 1: I
I
f o r i := 1 t o 5 0 0 do f o r j := l t o 1 0 0 0 do C [ i , jl := A [ i , jl + B [ i , jl; 0d; ad;
There are (at least) two ways to p~allelizethis code. One way is to repeatedly distriver all processors. With l 0 processors, each processor will execute ions and then sync~onize;this process is re~eated500 times (for all values of ij. An alte~ativeapproach is to dis~ibutethe outer loop. All l 0 processors will then execute ~ 0 0 0 0(= 50 x 1000) additions before they sync~onize.Cle approach has a lower overhead, becauseit requires fewer work-distributions and synchronization steps. ~arallelizing co~pilers therefore try to parallelize ‘outer loops’. Put in other words, they try to exploit l~ge-grainedparallel is^, whenever possible. An a~ternativeto this compiler analysis is to allow the p r o g r a ~ ~ to e rindicate which e s using the loops should be executed in para~lel. This is what data- allel el l a n ~ u a ~ do, in Section 9.1 5 . parfor c o n s t ~ cdiscussed t
In the examples given sofar, all iterations of the loops are inde endent of each other. is is where the problen~s loops in actual pro~rams,however, do not have this property for p~allelizingcompilersbegin.To illustrate, consider a sli different version of our earlier example:
for i
:= 2
A[i]
:=
to 1 0 0 0 do sqrt ( A L i - 1 1
) ;
od ;
The iterations of the loop now can no longer be executed in parallel. For example, during iteration 2 (i = 3), the value of A [ 2 1 is required. The value of A [ 2 ] must be computed before iteration 2 can start. Since the value is compute^ during the iteration 1 must be executed before iteration 2. This is called a lo The argument applies to all iterations, so the loop iterations must In other words, the loop is sequential and cannot be parallelized. ~arallelizingcompilers are faced with two challenges. First, they should d e t e ~ i n eas precisely as possible whether a given loop may or may not be executed in parallel. When in doubt, they must be ‘conservative’ and execute the loop sequentially; otherw parallel program might produce different results than the original sequential p Second, whenever a compiler discovers a loop that cannot be parallelized, it should try to transform it into a loop that can be parallelized. We discuss such transformations in the next subsection. Below, we look at the first issue, In general, there can be several different reasons why a given loop cannot be p ~ a l l e l ized. We illustrate the most important cases with the following code fragment: for i : = 1 to 1000 do (1) A [ i ] : = . . .; (2)
...
(3) (4)
B[i-13
(5) (6)
...
:= A I i - 1 1 ; := B C i l ; :=
...; C [ i l : = . . .; C E i - 1 1 : = . . .;
ad;,
Statement 1 assigns a variable (A [ i l ) wh will be used in statement 2 d next iteration of the loop; this is called a e n ~ e n ~ eStatement . 3 uses a variable e will be changed in statement 4 during the next iteration; this is called e. Statement 5 assigns a var 1 be reassigned in statement 6 d ration; this is called an out ence. In each case, the d e ~ e ~ d e n c e prevents the loop from being parallelized. The examples above used very simple subscript expressions. We used either i or i1 as index, where i is the loop variable. In practice, programs use far more complicated subscript expressions, which mdses the dependence analysis harder. Still, an accurate dependence analysis may mdse the difference between being able to parallelize a loop and having to run it sequentially. As an example, consider the following code fragment: for i
:=
1 t o 2 5 0 do
A[4*i]
...
od;
:=
...;
:= A [ 4 * i
- 11;
At first sight, one might say that thecode looks similar to statements 1 and 2 of our previous example, so there is a flow dependence. More careful inspection of the code reveals that this is not the case. The first statement will assign the arrayentries 4, 8, 12, and so on, whilethesecondstatementwillreadthe entries 3, 7, 1l , andsoon.Thus,there is no dependence and the loop can be parallelized,On the other hand, the loop f o r i := 1 t o 2 0 0 do A[5*i]
.,.
od;
:= ...;
:= A [ 4 * i - l];
have a dependence. For example, iteration 3 (i = 3) will assign to A [1 5 ] while iteration 4 will read this element. Several techniques, often based on algebraic manipulation, have been developedfor testing whether a given loop contains any dependencies. Several other problems exist that make loop analysis and parallelization even harder. its If a loop contains a routine call, this routine will have to be analyzed to determine effects on any variables that it can access. So, interprocedural analysis is required here. Another hard problem concerns pointer variables. The code below, for example, contains an indirect assignment througha pointer variable p:
does
f o r i := 1 t o 2 5 0 do "p .- ..*, ... := A [ 4 * i I ; *-
od;
If p points to an element of A [],this assignm~ntmay cause a loop-carried dependence. Unfo~unately,it is often difficult to determine during compile time whether p points into A E I ,let alone determining to which specific element p points. Often, this depends on run-time conditions, and cannot be determined during compile time. Still, several techniques have been developed thattry to estimate the possible targetsof pointer variables and usethisinformationtoparallelizeconservativelyloopscontainingpointer-dereferences ( H u ~ eetl al., 1994). An alternative to this compiler analysis is to let the programmer indicate which loops can safely be run in parallel. Data-parallel languages often use a construct like the parfor statement for this purpose. This construct thus may indicate that it is both desirable and legal to run a given loop in parallel. Of course, this also takes away an important advantage of automatic parallelization.
The for-loops written by the programmer often have a form that make compiler optimizationandparallelizationdifficult.Optimizingcompilers for parallelmachinestherefore exploittechniquestotransformloops into therforms,thatareeithermoreefficientor easiertoparallelize.Dozens of such loo r ~ s t r u ~ t u r itransfor~ations n~ havebeen designed and each compiler typically usesa bag of such 'tricks.' Some tr~sformationsare not related to but increase try to sequential performance, for example by improvingthe avior of theprogram;they strive toincreasethehitratio for the CPU cache. Rather than trying to give a complete overview of loop transfo~ations,we
will illustrate the general idea by describing two transformations that are designed specifically for parallel machines. We start witha simple example, shown below: f o r i := 1 t o 2 5 0 do A [ 4 * i ] := sqrt ( i ); T := A [ 4 * i
-
l ] ;
. * *
od;
This code is similar to one of our earlier examples, but an important difference is that it contains an assignment to a scalar variable T. The problem is that this assignment creates an output dependence, since every iteration of the loop will do an assignment to that same T. A compiler can easily transfom this loop to an equivalent loop that does not have this dependence. The idea is to let every iteration assign to a different (temporary) variable. After the loop has been executed, the value of the last (250th) iteration is assigned to T, We can implement this idea using an extra array, called Tmp 1. f o r i := 1 t o 2 5 0 do A [ 4 * i ] := s q r t ( i ) ; T m p [ i ] := A [ 4 * i - l ] ;
...
od; T := TmpC2501 ;
This transfo~ationis called scalar ex ansion. It eliminates the output dependence and thus allows the transformed loop to be paralleli~ed.On a dist~buted-memorymachine, a slightly different approach is to give every processor its own scalar variable, stored in its local memory. The Tmp [] array then essentially is distributed over the local memories. This transfomation is known Another important transfo interc~an~e, which works nested onloops, such as the loops shown below: f o r i := 2 t o 5 0 0 do f o r j := 1 t o 1 0 0 0 do A[i, j]
:= A [ i - l , j ]
-I-
B [ i ,j ] ;
od;
od;
This code is similar to the matrix additioncode shown earlier, but the left-hand side of the assignment statement now is an element of A [,1. This assignment statement causes a loop-carried dependence. For example, iteration 2 (i = 3) of the outer loop uses the elements A [2 , l] to A [2 , 1 0 0 0 ],which are computed during iteration 1 (i = 2). We couldparallelizetheinnerloop,but, as wehavediscussedbefore,thisgenerates finegrained parallelism, which is less efficient. We can solve this problem by t r a n s f o ~ n gthe code as follows:
e inner and outer loops have been exchanged. We can now parallelize the outer 00 tasks that each execute the inner loop se~uentially, leli~ingcompiler must of course make sure that the original and transformed equivalent. The compiler is only allowed to do transformations such as scalar d loop interchange that do not change the result of the programs. We do not alysis that is required for this.
~ultiprocessor, auto~atic p~alle~ization becomes even more difficult, because the com-
for i := l to 5 0 0 do for j := 1 to 1 0 0 0 do C [ i , jl := A [ i , jl + B [ i , j l ; 0d; od :
~ e ~ o machine, r y the three arrays can be stored in the s ~ a r e d m e ~ owhere ry, accessed by all processors. On a multicomputer, this no longer is the case. e ~ o 1 ~ t i owould n be to store the arrays in the memory of one processor, say CPU 0. All ors can then access these data by sending ~ e s s a g e sto CPU 0, asking it to certain elements of the arrays. CPU 0 would thus implement a (very slow) is approach would generate a huge amount of co~munication utweigh the gainsof parallelism and thus be unacceptable. cceptable solution, avoiding a large part o f the commu~icationoverhead, can be i , s t ~ i the ~ ~arrays t ~ ~over ~ the different processors, in such a way that each ains those elements that it accesses most frequently. In the above example, actually is quite simple. ~ssumingthat we parallelize the outer loop over accesses 50 rows of the three arrays. For example, CPU 0 accesses rows 51 to 100, and so on, as shown in Fi the mays row-wise we can reducethe c o ~ u n i c ~ t i o n . n~ortunately,thisexample is overlysimplistic.Ingeneral,pr o illustrate this,supposethematrixaddition code
l
CPU 0 50
51
CPU 1
100 I
l
451
I "
CPU 10
500
"
re 9.11 An array distributed over 10 CPUs.
A [ 1 l, ]
:= A [ 1 2 5 ,
l]
+
A[175,
11;
With the row-wise distribution described above, the two operands are stored on CPUs 2 and 3, respectively. If CPU 0, which contains A [1 11 ,executes the statement, it thus has to communicate with these two processors to get the current values of the two elements. A slightly more complicated exampleis the following code: I
f o r i := 2 t o 499 do f o r j := 1 t o 1 0 0 0 do A [ i , j ] := B [ i - l ,j ]
+
B[i+l,
jl;
od; od;
Most of the assign~entscan be executed without doingc o ~ u n i c a t i o nusing , the d i s ~ i b ~ " tiondescribedabove.Forexample,processor 0 canassign B E 1 11 -t- B 3 l 1 1 to A [2 , l],since it contains all these elements, For some assignments, however, data from a remoteprocessorareneeded.Forexample,theassignmentto A [5 0 l1 requires B [5 1 11 ,which is stored on CPU 1. In general, each processor requires the last row of its left neighbor and the first row of its right neighbor (except for the leftmost and rightmost processor). The task of thecompiler is togeneratethe c o ~ u n i c a t i o nstatementsneededto implement these data transfers. In simple cases like the one above, the compiler can compute precisely which elementseach processor needs, so it can insert send and receive statements in the code to transfer these elements. Figure 9.12 shows in a pseudo-notation what the code generated by the compiler might look like. Each processor is assigned a number of rows of the matrices, ranging from lower-bound to upper-bound, A processor allocates space for these parts of the array, and some extra space to contain two additio~al rows of B; the initia~izationof the arrays is not shown. At the start of the e~ecution,each processor checks to see if it should send its first and last rows to its neighbors; we use the I
I
I
notation E3 E i , *1 to denote a row of B. Next, each processor checks if it should receive rows from its neighbors. The send commands can be synchronous or asynchronous, the receive commandsaresynchronous.Finally, each processorperformsitsshare of the work.
INPUT : ID: number of this processor (ranging from o to P-I.) lower-bound: number of first row assigned to this processor upper-bound: number of last row assigned to this processor real Arlower-bound. .upper-bound, 10003, Bllower-bound-l .. upper-bound+l, 10001;
Figure 9.12 Generated (pseudo-)code.
In more complicated cases, the data to be transferred can only be computed time. As an example, for the statement
at run
All, l] := B[X, l] ;
it depends on the run-time value of X from which processor the element must be fetched. In this case, the compiler generates code to determine this processor (at run time) and to send a request message toit; the run-time system on this processor will then reply with the requested value. Note that this exchange needs two messages over the network, a request and a reply, so it is less efficient than theearlier example. Finally, there are many important opti~izationsa compiler can apply to reduce the c o ~ u n i c a t i o noverhead. For example, messa~ecom~inin e s s a ~ ea ~ ~ r ~ ~ a t i o n combines multiple small messages into a single large message, which is more efficient on mostsystems.Anotheroptimization is toexploitspecialhardwareprimitives,suchas broadcasting. We do not discuss such opti~izationshere. Suffice it to say that the task of ~eneratingefficientmessagepassing code makesthealreadyverychallengingtask of automatic ~arallelizationeven more difficult. The data distribution has a huge impact on the performance of the parallel program, since it determines to a large extent which data transfers will take place. The optimal distribution depends on how the arrays are used in the mosttime"consumin~parts of the program,Since it is verydifficultfor a compilertodeterminetheoptimaldistribution,
languages like High Performance Fortran allow thep r o g r ~ e to r help the compiler. HPF defines a collection of compiler directives that programmers can use to specify distributions of arrays. For example, the following HPFcode fragment !HPF$ PROCESSORS pr(4) real A(100, 1 0 0 ) !HPF$ DISTRIBUTE A(BLOCK, *) ONTO
pr
declares a two-dimensional arrayA that is distributed row-wise overfour processors. Many other types of distributions, for example cyclic or column-wise, can also be expressed in HPF. The compiler can use this i n f o ~ a t i o nas hints about how to dist~bute the data structures, The speed-ups achieved by supplying hints depend completely on the nature of the problem and the quality of the hints; they can easily make the difference between an infeasible computation anda readily doable one.
This concludes our discussion of Compilation and processing techniques for parallel and distributed programs. We have seen that such programs may express theparallel is^ explicitly or may rely on automatic p~allelizationby some language processor. Five models for explicit parallelismhavebeenintroducedandtheirimplementationsdiscussed;the Linda Tuple Space can profit more from compiler support than the other four. Automatic parallelization still is a challenging problem, especially for distributed-memory machines. Some languages therefore use a combination of compiler analysis and directives from the user.
-The goal of parallel systems is to solve a given problem as fast as possible, using multi-
ple processors. A distributed system contains multiple autonomous processors that are connected bya network and that work on many different tasks from many users. -A process is an abstraction of a physical processor. A process contains an address space that stores data. -Manymodelsandlanguages exist for programmingparallelanddistributedsystems. The programming models discussed in this chapter are shared variables, message passing, shared objects, TupleSpace, and data-parallelism. -Shared variables are variables that can be accessedby multiple processes. Mutual exclusionsynchronization ~echanisms(forexamplelocks)andconditionsynchronization mechanisms (for example monitors) are used to synchronize access to the shared variabies. -Message passing uses two primitives, send and receive, to transfer data from one process to another process. Languages differ in the way processes are addressed, and the way messages are sent (synchronous or asynchronous) and received (explicitly or implicitly). -Parallel object-oriented languages encapsulate shared data in objects.
s a globally shared data structure called the Tuple
ams, all processors execute the same algorithm, but operate on ent parts of the data set. Automatically parallelizing compilers usually try to generate data-parallel programs. parallel and dist~butedlanguages are implemented mainly using a run-time systher than extensive compiler support. ting system processes, but a ~ e d - ~ ~ultiprocessor e ~ o ~ ly, and without using busy on top of an indivisible ruction, such as test-and-set, Busy waiting is avoided by ~aintaininga list lementation issues. Many different communiTCPBP, and custom user-level protocols). ng an administration of where each lan are used to convert between data structures and a linear can allel language over a library is that i t s i~plementa~ion er control over the order in which ~ e s s a g e sare serd a more complicated run-time system, which maintains a list of incoming that i s inspected whenever the application wants to receive a message. el object-oriented languages, one issue is how to locate objects; if objects can inter chasing techniques can be used. ct replication must use a protocol for keeping the replicas of en an object is modified, such a protocol can either invalidate of the object or it can update all copies, for example using total~y-ordered Linda compilers reduce the overhead by selecting the right storage representation for each tuple. buted-~emorymulticomputer, Tuple Space can use a hash-based partition in^ cheme (replicating or p ~ i t i o n i n gthe Tuple Space). The most scheme depends on the app~icationand the underlying hardware. a challengin~problem. Paralle~izingcompilers try to find s that are in~ependentfrom each other. To e li~ in a teloop encies, they can apply loop rest~cturingtransfo~ations. For machines without ~ ~ mtheocompiler ~ , also has to dis~ibutethe data structures over the different rocessors need remote data. c o ~ ~ i n i nisgan important optimization to reduce the c o ~ ~ u n i c a t i o formance Fortran (HPF) also uses i n f o ~ a t i o n(hints) from the pro
introductions to parallel processing ( A l ~ a s and i Gottlieb, 1994; al., 1994; Wilson,1995).Goodbooksondistributedsystems include Tanenbaum (1995) and Coulouris, Rollimore and Kindberg (19 papersaboutparallelanddistributedprogramminglanguagesareby T a ~ e ~ b (la 989) u ~ and Skillicorn andTalia (1998). A survey of dist~butedp r o g r a ~ i n gmodels is given by Andrews (1991). The book by Wilson and Lu (1996) focuses on object-o~entedparallel languages based on C++. An extensiveoverview of distributedsharedmemorysystems is includedinTanenbaum’s ~ Zhou (l 990) and Nitzb book (1995). More concise survey papers are by S t u and and Lo (1991). 0 ct migration and replication are discussed extensively in papers by et al. (1988)and Z et al. (1998)respectively.Linda is describedinseveralresearch papers (Ahuja et al., 1986; and Celernter, 1989). A technical report describes imp~ementationissues of Linrnson et al., 1988). The literature contains overviews of compiler an sf or mat ions for automatic parallelization (Wolfe, 3 99 et al. 1994). Exampl ( al., 1996) and SUIF compilers include Polaris mance Fortran i s described by Loveman (1993).
essage passingcanbeimplementedonboth shared- memo^ anddistributed~ e m o r ymachines, whereas shared variables are difficult to implement on machines withoutshared m e ~ o r y .Roes thisimplythatsharedvariablesandmonitorsare super~uous? The Linda Tuple pace can be used to simulate both message passing and shared variables.Describehowboth se~ding/receiving messagesand readin~/writin~ shared variables can be simulated using Linda. Explain why the costsof a thread switch depend very much on the CPU architecture. As discussed in Section 9.3, busy waiting can be avoidedby using an ad mini st ratio^ (a list of control blocks) for blocked threads. This ad mini st ratio^, however, also has
to be protected by locks, to make the operations on it indivisible. So, to implement a lock ~ i t h o u tbusywaiting, weneedanotherlock.Willthisresultinanendless implementation loop?
Somerun-timesystemsacquirelocksusing a combination of busywaitingand thread switching, to avoid wasting CPU cycles while still reducing the overhead of thread switches. Can you thinkof a useful combination? plain what happens if two processes simultaneousl~try to execute a monitor st reads and returnsdata protected by the monitor (but without modi-
(b) Design an -optimized -implementation of monitors that will allow multiple readers, while preserving the original monitor semantics as perceived by the programmer,
. ' 7
Manymessagepassinglanguagesthatuseportsas expl~nedin Section 9.1 allow only a single process at a time to receive from any given port. Why would this restriction be imposed? Write marshaling and unmarshaling routinesfor the following procedure: procedure f(x: float; y " : integer; z: array[1..1001 of integer);
9.9
Somemessagepassinglanguages(forexampleAda, SR) support a 'select statement', with which a receiver can wait for different types of messages from different senders,dependingonwhichonearrives first. Veryfewlanguagessupportthe opposite: a statement with which a sender can wait until one of the receivers is willing to accept its message (is waiting in a receive state~ent).Can you think of a reason for this?
9.10 Section 9.4'4 explained that message handlers that may block should be executed by a newly created thread rather than by a daemon thread. Someone suggests the following idea: let the daemon thread simply start executing the message handler. Only if the handler actually needs to block, create a new thread dynamically, copy the current stack of the daemon to the newly created thread, andlet the new thread continue the executionof the handler. Why is this hard to implement? 9.11 In Section 9.5.2 we discussed the possibility of gratin^ an object from one processor to another processor. Suppose an object X invokes an operation on an object Y that is located on a different processor. Further suppose X contains a thread of control, but y does not, Two different options for object migration now exist: X could of X. What are be moved to the processor ofY, or Y could be moved to the processor the advantages and disadvantagesof both options? 9.12 Someone suggests a newupdateprotocol for implementing replicated objects that does not need totally-ordered group communication. The scheme works as follows. For each replicated object, one processor (containing a replica) is designated as the primary copy. If a processor wants to do a write operation on a replicated object, it sends the operation to the primary copy of the object. The primary copy broadcasts theoperation to all other processorsthatcontain a copy. Why is thisscheme incorrect? 9.13 The operation Maximum is defined as follows. Each processor performs the following procedure call: result = Maximum
(value) ;
where value may be different for each processor, Each processor waits (blocks) until every processor has started this call. Next, every processor gets as result the
€xercises
maximum of all the value parameters. Figure 9.13 shows how the operation can number of processors P:
697
be implemented in C/Linda, for a fixed
int Maximum (int value){ if (mycpu == 1) { /* the first CPU creates the tuple*/ out ( "max1I , value, 1 ) ; } else { /* other CPUs wait their turn*/ in(l'maxll, ? &max, mycpu); if (value > max) max = value; out ( llmax" , max, mycpu-t 1) ;
1
/* wait until allCPUs are done */
1
read (flmaxl', ? &max, P) ; return max;
Figure 9.13 CLinda code for the operationMaximum.
(a)Analyzetheperformance of theoperation for thethreedifferentdistributed implementations of Tuple Spacedescribed in Section 9.6.2. (b)Designanoptimizedoperation,usinghandwrittenmessagepassing.Compare the optimized implementation with the Linda version.
9.14 Some imple~entationsof Tuple Spacesupport an extra primitive f e t ch-a l1 (t ) withwhichalltuplesthatmatch t canbeobtainedinoneindivisibleoperation. Explain why this primitiveis difficult to implement.
p
9.15 Does the following code contain a loop dependency? If so, what kind of dependency is it? for i := 2 to 9 do A[7*i] := i*i; A[9*i - 21 := i*j*i. A[11* (i-1)+7] := i*i*i*i* od;
9.16 Loop interchanges as described in Section 9.7.3 may also have Performance disadvantages, Consider the following example:
for i := 1 to 1 0 0 0 do for j := 1 t o 1 0 0 0 do A[i,
j] := A [ i , jl
-I-
1;
ad; 0d;
The sequential perfomance of this code may decrease su~stantiallyif the two loops are interchan~ed.Explain,
is to^ of parallel and d i s t ~ i ~ u~t ae n~ g u ~ gi e~ p l e ~ e n t ~Study t i o ~K~q ’ s paper (1987), which points out the difficu~tiesof parallel p r o ~ r a ~ and i n introduces ~ the concept of SPMD parallelism, and s u m ~ ~its z main e points.
A compiler can be viewed as a set of algorithms for tasks that apply actions to data that representsyntacticconstructs.Examplesareparsing(thetext of) a function call, (the AST of) an expression, ge~eratingcode for (the AST of) a for-loop, etc. shows someof these tasks ina table. In Section 1.2 we have shown how this table can be sliced up into rows, each row corresponding to a module or pass. Figures 1.12 through 1.19 showed four modules: lexical analysis, parsing, code generation, and inte~retatio~. This ment emphasizes theactions p e ~ o ~ on e the d data, and is representative of the ive p r o g r ~ i n g m (but see Section 1.41). It is also possible to slice this table along the columns, This results in modules for syntactic data objects, with the actions attached to them, and leads to an object-oriented approach to compiler construction.
declarations,expressions,function calls, statements, for-loops, etc., each with eth hods ‘Parse’, ‘Handle context” ‘Interpret’,‘Generatecode’,and a fewothers.Thishasthe advantage that all information about a syntactic construct is collected in one place in the compiler, in the code for that syntactic construct. For example, each instance of the class F ~ n ~ ~ i o n - Cknows ~ l lhow to create itself by parsing, how to handle its contest, how optimize itself, how to generate code for itself, etc., in good object-oriented fashion. ore in p ~ t i c ~ l acalling r, the method Generate-Code ( ) of the top object of the AST will set the machine^ in motion and generate code for the entire tree. A disadvantage is of
1
ass~gnments declarations for-loops simple expressions
l
lexical analysis
lexical analysis of declarations
parsing
parsing of simple expressions
context handling context handling of declarations optimization
optimization of for-loops code generation for assignments
code generation
Figure A.1 Table of tasks in a compiler with some tasks filled in.
course, as with all strong structuring, that information needed for sophisticated decisions may not be easily available, because it happens to reside in distant and poorly accessible places. But then again, imperative compiler constructionis not easy either. For the best match between syntax features and object-oriented features, we have to streamline our grammar of Figure 1.9, as follows. The object-oriented paradigm has two fun~amentalways to create new classes:
-class e~tension,in which
a new class, the subclass, is formed by extending an existing class, the superclass; the new class can be seen as a specialization of its superclass. -c o ~ ~ o s i t i oinn ,which a new class is formed by combining one or more existingclasses; an instance of the new class then contains instances of the compos~ngclasses, or references to them.
Class extension corresponds toa grammar rule of the form A "+ AI
I
A2
I ... I
An
with the restriction that Al through A, are either all non-terminals or all t e ~ n a l s .From an object-oriented point of view, this rule expresses the idea that objects of classes Al through A, are special kindsof As. Composition correspondsto a g r rule of~the form ~ A
+ A1AnA2
with the restriction that there are no other rules for A. From an object-oriented point of view, this rule expresses the idea that objects of class A are composed of objects of classes Al through A,. The object-oriented view does not impose the proper order Al ... A, of the components, however; this order must be insured separately by thecode of the parsing routines. To find a natural assignmentof the non-terminals in a grammar to extended and composed classes, we have to restrict our grammar to rulesof the above forms; such a grammar form. It is easy to bring a grarnmar into AND/OR form by introducing some new non-terminals. A possible AND/OR form of our grammar for fully ~arenthesi~ed expressions is shown in Figure A.2. The non-terminals expression and operator areORnon-terminals, digit and parenthesized-expression are AND non-terminals. The rule for expression says that digit and parenthesized-expression are both special kinds of expressions,which fits our intuition. The classes Digit and Parenthesized-Expression will therefore be extensions of theclass Expression. Since each expression is eithera digit or a parenthesized expression, there will be no objects with dynamic classExpression,only Digit and Parenthesized-Expression. Thesyntaxtreeitselfwillconsist of nodes of static types Expression and Operator only. expression -+ digit I parenthesized-expression digit -+ DIGIT parenthesized-expression -+ '(/ expression operator expression')' operator "+ '-I I" Figure A.2 The grammarfor simple fully parenthesized expressions in AND/OR form.
Figures A.3 through A.8 show the demo compiler in Java. The class Expression in features theabstractmethods Print ( ) , Figure A S hasbeendeclaredabstract,and Interpret () ,and Generate-Code ( ) ;this makes these methods obligatory for the classes Digit (Figure A.6) and Parenthesized-Expression (Figure A.7). Indeed, both Digit and Parenthesized-Expression contain non-abstract imple~ e n t a ~ o of n sthe methods. The same applies to the abstract class Operator. It is now easy to verify that if the top node is the object Program of class Expression,the method call Program. Print () will print the expression, Program. Interpret () will evaluate the expression, and Program. Generate-Code () will generate code for the expression. Making the syntax tree create itself by parsing turns out to be more difficult, however,
class Demo-Compiler { public static void main(String[ I args) { // Get the intermediate code Lex.Get-Token ( ) ; // startthelexicalanalyser Expression icode = Expression.Parse-Or(); if (Lex.Token-Class != Lex. EOF){ System.err.println(liGarbage at end of program ignored");
1
if (icode == null) { System.err.println("No intermediate code producedii); return;
1
// Print it 'I) ; System.out,print (I1Expression: icode, Print (1 ; System.out.println0; // Interpret it System.out .print (IiInterpreted:; System.out.println(icode.Interpret0);
1
// Generate code for it System.out.println('lCode:'l); icode .Generate-Code (1 ; System.out.println('lP~INT'l) ;
Figure A.3 Class Demo-Compiler, driver for the object-oriented demo compiler.
It would be nice to rig the constructors that are called when an object is created so they would do theparsing;thenthesimpledeclaration Expression Program = new ression ( ) ;would create a new object Program of class Expression,initialized with the parse tree. ~ n f o ~ u n a t e l this y , seems impossible for technical reasons (see Exercise A.3 ). The next best thing is to have self-parsing methods for all nodes; this is the approach shown in Figures A S through A.8. The parsing templates for AND and OR non-te~inals A.9, is similar to that differ. The template for anANDnon-terminal,showninFigure shown for the recursive descent parser in Section 1.2.5, except that there is only one alternative to parse, The template for an OR non-te~inal,shown in Figure A.10, is less obvious. OR templates are considerable. The The differences between the AND and Parse-And () method is applied to an already allocated object of an extended class co~espondingto an AND non-terminal. Its task is to see if the required components can
import java.io.IOException; class Lex { public static public static public static public static
int Token-Class; final int EOF = 256; final int DIGIT = 257; char Token-Char-Value;
public static voidGet-Token0 char ch;
{
// Get non-whitespace character do {int ch-or-EOF; try {ch-or-EOF = System.in.read();} catch (IOException ex){ch-or-EOF = -l;} if (ch-or-EOF -c 0) {Token-Class = EOF; return;} ch = (char)eh-or-EOF; // cast safe is } while (Character.is~hitespace(ch)); // Classify it if (Character. isDigit ) (ch) { Token-Class = DIGIT; Token-Char-Value = ch;
1
else { Token-Class = ch;
1
1
1
~ i ~ u A.4 r e Class Lax,lexical analyzerfor the object"o~enteddemo compiler.
be found in the input and, if so, to initialize the object with them, To do so, it makes the same assumption as the recursive descent parser in Section 1.2.5, which is that if the first member of the alternative is found, the others haveto be present too. The Parse-Or method is astaticmethod; it is appliedtoanabstractclass co~espondingto an OR non-te~inal,before any object has been allocated. Its task is to l be found inthe input and,if so, see if any of the specializations of the OR n o n - t e ~ n a can To do so, it tries the possible subclasses in textual toreturn the c o ~ e s p o n d i nobject. ~ order, as follows. To try a subclass Ai,it tentatively declares an object ai of that subclass and calls its method ai .Parse-And ( ) ,to see if a syntactic construct c o ~ e s p o n d i nto~ Ai is present in the input. If it is, the call ai .Parse-And ( ) initializes the fields in the subclass object and succeeds. The object ai is then returned by the method Parse-Or ( ) ;this action changes the dynamic class of the object to Ai,but its static class remains A. If Parse-And ( ) cannot find a syntactic construct Ai in the input, the object ai is abandoned to thegarbagecollector,andthenextsubclass is tried. If all attemptsto find apropersubclass fail, there is nothinginthe input atthispointthat
abstract class Expression{ public static ExpressionParse-Or0 {
// Try to parse a parenthesized-expression Parenthesized-Expression pe = new if (pe.Parse-And()) return pe;
Parenthesized-Expression();
// Try to parse a digit Digit d= new Digit () ; if (d.ParseAnd0 ) return d;
1 1
return null;
abstract public void Print(); abstract public int Interpret(); abstract public void Generate-CodeO;
Figure A.5 Class Expression for the object-oriented demo compiler.
class Digit extends Expression { private char digit= '\Or; public boolean Parse-And() { if (Lex.Token-Class == Lex.DIGIT) { digit = Lex.Token-Char-Value; Lex. Get-Token () ; return true;
1
1
return false;
public void Print() {System.out.print(digit);} public
int
Interpret ( ) {return
digit- ' 0' ;}
public void Generate-Code0 { System,out.print('!PUSH + digit
1
1
+ ft\n'!) ;
Figure A.6 Class Digit for the object-oriented demo compiler.
corresponds to anA,and the routine returns the null pointer. boolean and Anotherdifference is that Parse-And () methodsreturna Parse-Or ( ) methods return a value that statically belongs to an OR class but dynami-
class Parenthesized-Expression extends Expression { private Expression left; private Operator oper; private Expression right; public boolean Parse-AndO { if (Lex,Token-Class == ' ( ' 1 { Lex.Get-Token ( 1 ; if ((left = Expression,Parse-Or()) == null) { S y s t e m . e r r . p r i n t l n ( " M i s s i n g left expression"); return false;
1
if
(
(oper = 0perator.Parse-Or ( ) ) == null) { Syst~m.err.println(ltMissing operator"); return false;
1
if ((right = Expression,Parse-Or()) == null) { System.err .println( lfMissingright expression"); return false;
1
if (Lex,Token-Class ! = ' 1 ' ) { System,err.println("Missing ) added");
1
else { Lex.Get-Token ( 1 ;} return true;
1
return false;
} public void Print 0 { System.out.print(' ( ' ) ; left .Print ( ) ; oper.Print 0 ; right.Print ( 1 ; System.out.print(')');
1 public int Interpret ( 1 { return oper.Interpret(left.Interpret0, right,Interpret());
1 public void Generate-Code ( ) { left .Generate-Code 0 ; right.Generate-Code ( 1 ; oper .Generate-Code ( ) ;
1
1 Figure A.7 Class Parenthesized-Expres sion for the object-oriented demo compiler.
abstract class Operator{ public static OperatorParse-Or0 { // Try to parse a‘ S ‘ Actual-Operator co-plus = new Actual-Operator(’+’); if (co-plus.Parse-And () 1 return co-plus;
// Try to parse a’*‘ Actual-Operator eo-times = new Actual-Operator(’*‘); if (Go-tirnes.Parse-And()) return co-times;
1
1
return null;
abstract public void Print(); abstract public int Interpret(int e-left’ int e-right); abstract public void Generatemcode();
class Actual-Operator extends Operator{ private char oper; public Actual-Operator(char op) {oper = op;} public boolean Parse-And() { if (Lex.Token-Class == oper) {Lex.Get-Token(); return true;} return false;
1
public
void
Print 0 {System.out
.print
(oper) ; }
public int Interpret (int e-left int e-right) { switch (oper) { case +‘, return e-left + e-right; case ’*’: return e-left * e-right; I
1
1
return 0;
public void Generate-Code0 { switch (oper) { case ’+ ’: System.out.print ( llADD\nll ) ;break; case ’ : System. out .print ( llMULT\nll ) ;break;
1
* l
Figure A.8 Class Operator for the object-oriented demo compiler.
class A extends... { private A, al; // fields for the components private A2a2; f
.
.
private A, a,; boolean Parse-And (1 { // Try to parse the alternative A1 A2 ... An: if ((a, = ~,.~arse-oro)!= null) {
public
if ( (a2 = A2.Parse-Or 0 1 == null) error ("Missing A2II ) ;
(I..
1
1
1
if ( (a, = An. Parse-Or 0 == null) error ( "Missing An") ; return true;
return
false;
Figure A.9 The templatefor an AND non-terminal.
abstract class A{ public staticA Parse-Or 0 {
// Try to parse an Al: A, al = new A1 (1 ; if (al.Parse-And (11 return al; ... // Try to parse an A,:
An , a new An() ; if (a,. Parse-And (11 return an;
1
1
return null;
Figure A.10 The templatefor an OR non-terminal.
cally to an AND class whichis an extension of that OR class. This is reasonable, since the Parse-And () method only has to determine if a syntax structure that is known in detail is present in the input, whereas the Parse-Or ( ) method has to determine specialization applies. One result of the different templates shown above is that the nodes in the AS?: are always instancesof OR classes statically but of AND classes ~ynamically. ~~~~~
The class Demo-Compiler (Figure A.3), which contains the method main () ,and the static class Lex: (Figure AA), which provides the stream of tokens, require no further explanation,
The technique shown here is only one way to do object-oriented compiler construction. It has been chosen because it employs the characteristic features of object-oriented programming, class extension and method overriding with dynamic binding, to improve compiler structure, But it also leaves some problems unsolved. For one thing, it is not at all obvious how a more advanced parsing technique, which requires a global view of the grammar, could be incorporated within the above framework, in whichall knowledge about a syntactic construct S is localized explicitly in the class definition for S. For another, it is not (yet) clear how class extension and method overriding can help in machine code generation. It is difficult to assess the weightof these problems, since almost all our experience in compilerconstruction is intheimperativeparadigm,and all toolsandtechniquesare geared to it. But the approach demonstrated inthis section shows that at least occasionally an improvement in structure can be obtained. Holmes (1995) reports on the construction of a Pascal compiler using object-oriented techni~uessystematically wherever possible. The AND/OR form of grammars for objectoriented parsing was suggestedby Koskjmies (1991).
A.1 In Section A.3 we suggest that the natural way to do object-oriented parsing would be to let the object constructorsdo the parsing. (a) Find out why this cannot be done in Java, C++, and your favorite object-oriented language, (If it carz be done in any object-oriented language, we would like to hear about it!) (b)What feature(s) shouldanobject-orientedlanguagehavetoallowparsing by object constructors?
Additional answers to exercises are available through JohnWiley’s Web page.
1.1
Advuntuges: Assuming the language is still being designed, writing a major piece of software in it is an excellent shake-down for the language design. Compiling the compiler may be a good way to debug the compiler (but there is a problem here: how defensible is debugging by using not well debugged tools?) Any improvement to the compiler benefits the compiler writers themselves, which gives them an incentive to improve the compiler more. ~ ~ ~ ~ d v u n t u~goeost s; ~ a p p i nproblems: g there is no compilertocompilethefirstversionwith. Any changes to the language may necessitate many modifications to the compiler, as both the implementation and the source language change. The compiler may inadvertently be tuned to cons~ctionsused specifically in the compiler.
1.9
In data structures outside the while statement, as withany while statement.
1.11 See Figure Answers.l. 1.13 Firstasubset
is created by takingawaysomefeatures;thelanguage is thenextended by adding new features. An example would be a C compiler which does not implement floating point numbers but does havebuilt-in infinite-le~gthintegerarithmetic.Thesarcasmcomesfromthefactthateverything is an extended subsetof everything else, which makes the term meaningless.
1.19 The g r ~ m a t i c a production l process stops when the sentential form consists of terminals only; to test this situation, we have to be able to tell terminals and non-terminals apart. Actually, this is not entirely true:we can scan the ~ ~ ~declare a all r symbols , in left-hand sides asn o n - t e ~ n a l sand all other symbols as terminals. So context condition(l)actually provides redundancy that can be checked.
1.20 Suppose there were two different smallest setsof information items, S , and St. Then Siand S2 must have the same size (or one would not be the smallest) and each must contain at least one item the other does not contain (or they would not be different). Call one such differing item in SIX. Since both sets started with
nswers to exer~ises
SET
the
flag
There aischaracter -a- buffered TO False;
PROCEDURE Accept filtered character Ch from previous module: IF There is a character -a- buffered = True: // See if this is a second 'a': IF Ch = 'ar: // We have 'aa': SET Thereis a character -a- buffered TO False; Output character 'b' to next module; ELSE Ch /= ' a r : SET Thereis a character -a- buffered TO False; Output character * a r to next module; Output character Ch to next module; ELSE IF Ch= 'a': SET Thereis a character -a- buffered TO True; ELSE There is no character 'a' buffered AND /=Ch'a': Output character Ch to next module; PROCEDURE Flush: IF There isa character -a- buffered: SET Thereis a character -a- buffered Output character 'a* to next module; Flush nextmodule;
TO
False;
Elgure Answers.1 The filter aa "+b as a post-main module,
the same initial items,X cannot be an initial itembut must have been added by some application of an inference rule. This rule clearly did not applyin S,, so there must be at least one other itemY that is present in Siand absent from S,. By induction, all items inS, must differ from all iterns inS,, but this is impossible since both started with the same initial items.
2.7
They both mean the same as a*. They are not fund~entallyerroneous butmaydraw a warning from a processor, since they are probably not what the programmer intended. Ambiguity is not a concept in lexical analysis, so they are not ambiguous.
2.11 Let himher implement it and then feedan object file or jpeg picture as source file to the compiler; admire the crash, Or, more charitably, explain this intention to the inventor.
2.17 Each round adds at least one dotted itern to
Closure set and there is only a finite number of dotted
items. 2.23 Close cooperation between lexical and syntax analyzer is required. As a kludge, prelim in^ skipping of the dynamic expression based on counting nested parentheses could be considered. Error recovery is a nightmare,
2.26 It isn't as simple as that. It depends on the amount of interaction of the macro processing with the lexical analysis of larger units, for example strings and comments. InC the scheme is hare-brained since it would require the macro processor to do almost full lexical analysis, to avoid substi~tinginside strings and comments. But in a language in which macro names have an easily recognizable form (for example in PLh, in which macro names start witha %),there is no such interference, and a better st~cturingof the compiler is obtained by a separate phase. But the loss in speed and the large memory requirements remain. Also, with full macro processing preceding compilation, it is very difficult to reconstruct the source text as the compiler user seesit.
2.29 Figure Answers.=! shows an LALR( 1) suffix grammar for the grammarof Figure 2.84.
%token %token
IDENTIFIER EOF
%%
input-suffix : expression-suffix EOF I EOF ; expression : term I expression term ; expression-suffix : term-suffix I expression-suffixterm term : IDENTIFIER 1 ’(‘ expression ’) ‘ ; term-suffix : expression 0 ’ I expression-suffix l + ’
I
’ ) l
1
‘
term
l+‘
)
l
;
;
Figure Answers.2 An LALR(1) suffix grammar for the grammar of Figure 2.84.
2.33 See Hanson (1985). 2.37 (a) LL(1) and &-free.(b) Predictive is still more efficient. 2.41 The recursion stack consistsof a list of activation records, eachof which defines an active routine; only the top one is running. Each activation record contains a continuation address (often called return address) telling where the routine should continue when it becomes the top node. The code from the continuation address to theend of the routine consistsof zero or more routine calls. These calls represent what is being predicted and the co~espondinggrammar symbols are part of the prediction stack. Thus each activation record represents part of the re diction stack; the total prediction stack is the concatenation of all these parts, in the order of the activation records. Additional exercise: draw a picture that illustrates the above explanation ina clear way. 2.44 The acceptable setof a n o n - t e ~ n aNl is the union of HRST(N) and the acceptable setof the shortest alternative of N. So, the acceptable sets of all non-terminals can be precomputed using a closure algo~thm. Now, if the prediction stackis available directly (as an array or a linked list), we can traverse the stackand compute the union of the acceptable sets of the symbols in it. In LLgen, however, the prediction stack is just the C stack and is not available for traversal. LLgen keeps an integer array indexed by grammar symbols countinghow many times a given symbolis present on the stack. This informationis easily m ~ n t ~ n e d and sufficesto compute the acceptable set. 2.48 (a) When the ACTION table calls for a ‘reduce using rule N j a ’ , the item setco~espondjngto the stateon the top of the stack contains the itemN+a*, The dot can onlybe at the end of a when it has just passed over the last member of a,which must therefore be just below the top state on the stack. This reasoning on the stack. applies successively to all other members of a,which must therefore also be (b) The item set precedinga on the stack must contain the item N + w , or no a would be recognized and no item N+a* wouldeventually be found.Theitem N+*a musthaveoriginatedfromsomeitem P+P*Ny. The presence of this item guarantees that a transition on N is possible, leading to a state that includes P+PN*y. 2.49 A value ‘shift’ in an ACTION table entry does not conflict with another ‘shift’ value in that same entry, but a ‘shift’and a ‘reduce’ do. So do a ‘reduce’ and another ‘reduce’, since they are actually two different ‘reduces’: ‘reduce to M’ and ‘reduce toM. 2.91 Thetreehastheform (x 1 n- (x) 1n- andthelast x is thefirsthandle,inanybottom-upparser. So allthe (x 1n- mustbestacked.
~ n s ~ etor sexercises
2.55 After rule 2, add: 'If t and shift.'
U
are the same operator: if the operator
is left-associative, reduce, otherwise
2.58 In a pure bottom-up parser no such pointers exist: trees are constructed before their parents, and the only pointer to a tree is the one on the stack that is used to discard the tree; the stack entry that contains it is removed by the recovery process. If other pointers have been created outside the parsing mechanism, these must be found and zeroed.
3.2
For a on-terminal N,some of its production rules could set some attributes and other rules could set other attributes. Then the attributes in a tree with a node for N in it could be evaluable for one production (tree) of that N, alld not for another. This destroys the composability of context-free g r ~ m a r swhich , says that anywhere anN is specified, any production ofN is acceptable.
3.5
See FiguresAnswers.3andAnswers.4.
a
b Figure Answers3 Dependency graphs for S and A.
3.9
The fact that no intervening visits to other children are needed shows that the production rule already has all in all production rules that have the information for the second visit, This may, however, not be the case this type of child,so other production rules may require two non-consecutive visits.
3.15 See Figure Answem5, and note that the code is simpli~cation a over that from Figure 3.37. 3.18 We need two variables, the actual number needed here and a high-water mark. Simple symbolic interpretation suffices. 3.19 The successor of the then-part is the merge node at the end of the if-statement rather than its else-part, it and is correct that we enter that node with an empty list during symbolic interpretation, since we will never reach the end of the if-statement from the end of the then-part when the program is run, due to the intervenon the linear program text. ing jump. Full symbolic interpretation works on the threaded AST rather than 3.21 It violates requirement 4 in Section 3.2.2.1: the actions to be taken on constantsdo not subsume those taken on variables. Quite to the contrary, any constant can be handled by code generated for variables but not the other way around.
IS-SI graph set of A:
merged IS-SI graph of A:
Figure Ansvvers.4 IS-SI graph sets and IS-SI graph of A.
#include #include
Ifparser. hsl “thread.h1!
/ * for types AST-node and Expression * / / * for self check * /
/ * PRIVATE * / static AST-node *Thread-expression(Expressi0n *expr, AST-node *succ) { switch (expr->type){ case ‘ D ’ : expr->successor = succ; return expr; break; case ’ P’ : expr->successor = succ; return Thread-expression(expr->left, Thread-expression(expr->right,expr) 1; break;
1
1 / * PUBLIC * /
AST-node *Thread-start; void Thread-AST(AST-node *icode) { Thread-start = Thread-expression(icode,
0);
1 Figure Answers.5 Alternative threading code for the demo compiler kom Section 1.2.
x becomes initialized. Con: It sounds unreasonable and counterintuitive to get a variable initialized by assigning the value of an unini~iaIizedvariable to it. Pro: The error in the program is probably the lack of initiali~ationof y; the further usage of x is independent of this error. Since a warning is already given on the assign~ent,no further warnings on subsequent - probably correct - uses of x seem appropriate.
ns
Consider any routine with a flow-of-control graph that is a linear list from routine entry to routine exit. hat ever the contents of the KILL and GEN sets, the IN and OUT sets will be computed in one scan through the list, and there is no way to transport jnfo~ation the about the routine exit back to the last use of a variable.
A self-extracting archive works exactly like the ‘compiler’ of Section 4.2.1 :there, the executable tile contains both the AST and the interpreter. A self-extracting archive contains both the contents of the archive and the extraction code. Often, the archiveis compressed, and the extraction code also contains decompression code. The program: int Program[] = { / D ~ , 7 , ‘ D f , l , ‘ D ’ , 5 , ’ P ’ , ‘ + ’ , ’ P f , ‘ * ’ , ’ ! ’ , O } ;
The interpreter: int main(void) { int PC = 0 ; int Instruction; while ((Instruction = Program[PC++]) != 0 ) { switch (Instruction) { case ‘ D ’: Expression-D (Program [PC++]) ; break; case ’P‘ : Expression-P (Program [PC++] ) ; break; case ’!’: Print(); break;
1
1
return 0;
} See Figure Answers.6; the ‘then’ gets0.7, the ‘else’ 0.3; loop skipping gets 0.1, loop entering 0.9; the cases get 0.4, 0.4, 0.2. Traffic at routine entsy i s arbitrarily set to 1. The first column gives the 17 equations; all can be solved by simple substitu~ion,except those for e, f , and g, which need elimination. The results are given in the second column. Note that we predict that for every time the routine is called, the loop body A will be executed 6.3 times. Also note that the traffic out of the routine is again 1; what goes in must come out. These dependencies also express the requirement that all assignments to a variable are executed in sequential, left-to-right order. See Figures Answers.7 and Answers.8.
(a) See Figure Answers.9. (b) The input p of the second *p++ is dependent on the output p of the first *p++ and so its dependencies differ from thoseof the inputp of the first*p++ . S and N cannot be the same node, since that would make the dependency graph contain a cycle because S refers to N .
A ladder sequences starts at each graph root, except when that root has an incorning dependency. Not all roots can have incorning dependencies,or the dependency graph wouldbe cyclic. Suppose the token set { a, bcd,ab, c , d } and the input abcd. I~mediate~y re~rning the a yields 2 tokens, whereas 3 can be achieved, obviously. A s s u ~ ethe entire input is in memory. Record in each item its starting position and the number of tokens recognized so far. At each reduce item that says that N tokens have been recognized, add the T n i t i a l item set with token counter N+l and the present location as starting point. Having arrived at the end of
Equation a = l
Value 1.0
b = c = d = ~3 = f = g = h =
0.7
0.7
a 0.3 a b 0.1 (d+f)
0.3 0.7 0.7
6.3 (d+f) 6.3 c 0.3 i = 0.4 h 0 12 j = 0.4 h 0.12 k = 0.2 h 0.06 g
0.9
*
l = i j
0.12
m =
0.12
n = k
0.06
o = e
0.7
p
=L
l+m+n
g = o+p
0.30 1.0
Figure Answers.6 TrafFc equations and their solution for Figure 4.97.
+
2
X
Y
+
+ *
a
*
2
a
Figure Answers.7 The dependency graph before common subexpression elimination.
the input, find the reduceitem with the largest token counter and isolate the tokenit identifies. Work backwards, identifying tokens.
4.28 For the first inst~ctionin the sequence we have 20*2=40c o ~ b i n a t i o ~using s , R I , RI and R1, R2, or more compactly {R l } ,{ R I , R2 }. Forthesecondinstructionwehave20*2*3=120combinations,using {R 1 , R2 } ,{R 1 ,R2 ,R 3 };for the further i n s ~ c t i o n we s have 20*3*3=180 co~binationseach. In total 4800~18dv-~ Combinations. Estimating a weekend at 2.5 days,each of about 80000 seconds, we have about2 ~ 1 0 ~ ~ ~ s e c oorn d2x10" s, tests. So we want the largestN for which 4 8 0 0 ~ 1 8 p -is~smaller than 2x10". Now, 4800x1804~2=1.Sx10s and 4 8 ~ ~ 1 8 0 ~ - ~ = 2 . 7so~ N1 04 ' .~ ,
~ ~ s ~toeexercises r s
X
Y
Figure Answers.8 The dependency graph after common subexpression e~i~ninati~n.
Figure Answers9 The dependency graph of the expression *p++
4.34 Advantages of PC-relative addressing modes and instructions are: -they require no relocation, thus reducing the work of the linker; -they allow position-independent code,code that can be loaded anywhere in memory, without any rnodifications; -they may allow shorter instructions: an offset may fit in a byte whereas a full address usually does not, Even if it does, the assernbler still has to reserve space for a full address, because the linker may modify it.
5.2
The garbage collector will free chunks only when they are unreachable. If they are u n r ~ ~ ~ hthey a b cannot l~ So it is safe to call the garbage be freed by the user since the user does not have a pointer to them any more. collector from Mall o c () .
5.6
In the allocation of the arrays into which the code and data segments will be compiled; perhaps
inthe allo-
cation of the external symbol table. 5.7
In general, the garbage collection algorithms inspect pointer values, which will be (si~ultaneously)changed by the application. Some garbage collection algorithms (for example two-space copying and compaction)
copy data, which is dangerous if the application can access the data in the meantime. Some algorithms 5.2.2), assume that the garbage collection only becomes active at specific points in the program (see Section which is difficult to guarantee with a concurrent garbage collection.
.Y
If the assignment p :=p is not optimized away, if p points to a chunkP with reference count l , if P contains a pointer to another chunk Q and if the reference count of Q is also 1, then first decreasing the reference Q tobefreed. Subse~uentlyincreasingthereferencecountof countof P causes P tobefreed,ses P will not raise the reference CO again and the pointer to Q in P will be left dangling. Also, on some systemsfreeing P mightcauseiontotakeplace,afterwhichthechunk P wouldbegoneentirely and incrementing its reference count would overwrite an arbitrary memory location.
5.12 See Sikl6ssy (1972). The ‘overlapping lists’ in the paper are dags.
6.2
6.
6.7
In most languages we need to know if an identifier is a keyword or the name of a macro, long before its name space is known, If we want to postpone the identificati~nto the time that the proper name space is known, we will need othermechanisl~sto solve the keyword and macro name questions. We have rvalue?V: V -+ rvalue. In principle, rvalue?lvalue :lvalue could yield an lvalue, but ANSI C defines it to yield an rvalue. In GNU C an lvalue results, but a warning is given under the -pedantic flag. Can’t be. The last scope rule forbids the creation of such values.
by a class descriptor which contains, among others, the method 6.13 At run time, a class may be represented table of the class. Such a class descriptor could also contain a pointer to the class descriptor of its parent class. An object then contains a reference to its class descriptor instead of a reference to the method table. Then, the implementation of theinstanceof operator becomes easy, see Figure Answers.10.
FUNCTION Instance of (Obj, Class) RETURNING a boolean; SET Object Class TO Obj. Class: WHILE Object Class/= No class: IF Object Class= Class: RETURN true; ELSE Object Class/= Class: SET Object Class TO Object Class .Parent; RETURN false; Figure Answers.10 l ~ p l e l ~ ~ n t of a ~the o ninstanceof operator.
6.14 The code for these callsis: (*(e->dispatch-table[O] ) ) (e); (*(e->dispatch-table[2]) ) ( (class D *) ( (char (*(e->dispatch-table[ 3 1 ) ) ( (class D *) ( (char
*)e+ sizeof *)e+ sizeof
(class C) ) ) ; (class ) )C) ;
Note that althoughm4 is redefined in classE, it still requires a pointerto an object of classD.
6.16 For example, when the caller calls several routines consecutively, the ‘caller saves’ scheme allows saving and restoring only once, whereas the ‘callee saves’ scheme has no option but to doit for every call. Also, in the ‘callee saves’ scheme the callee has to save and restore all registers that might be needed by any caller, whereas the ‘caller saves’ scheme allows for saving and restoring only those registers that are needed for
this particular caller, at this particular call site, This may cause overflow, because the controlled variable may then be incremented beyond the upper bound (and thus possibly beyond the maximum representable value). 6.30 Pushing the last parameter first makes sure that the first parameter ends up on the top of the stack, regardless of the number of parameters. ~ n s t a c ~ i the n g parameters will therefore yield them in textual order,
2 Two new operations must be supported: instantiation of a generic routine whichis a parameter, and passing a generic routine as a parameter to an instantiation. The consequences from a compiler construction point of to be dealt view are modest. When implementing instantiation through expansion, the first instantiation with resides within a non-generic unit. Therefore,a generic routine parameteris available as an AST, which can be copied and processed just as the AST of an ordinary generic unit. One issue that must be dealt with, weather the language has generic routine parameters or not, is cycle detection: when one generic unit contains an instantiation of another generic unit, the result is a chain of not contain further instantiations, which must be terminated by the instantiation of a generic unit which does instaIltiations. If this is not the case, the implementation through expansion scheme will fail. Without generic routine parameters, detecting this situationis easy: when a generic unit occurs twice on the instantiation chain, there is a cycle. With generic routine parameters, when an identical instantiation (same generic unit and sameinstant~~tion para~eters) occurs twice onthe instantiation chain, there is a cycle. When implementing instantiation through dope vectors, a generic routine has a code address, and can be passed as a parameter, just like an ordinary routine.
(a) One should not writ not call a function with
'
that fail for some argu~ents(but sometimes one has to). (b) One should for whichit will fail (but thatis so~etimeshard to know).
1 The following program uses the value of the expression fac
20 2" times:
tree 0 v a l = val tree n v a l = let t = tree (n-1) val in t*t main = p r i n t (tree 1 0 (fac 2 0 ) )
Althoughthe ++ (append)operatorisassociative,the mount ofworkdiffers for a ++ (b ++ c) and (a ++ b) ++ c. This is a consequence of the append operator, which essentially creates a copy of its first ~ g u m e nto t 'replace' the null pointer with a pointer to the second list. In (a ++ b) ++ c the list a is copiedtwice,whereasonlyonecopyisrequired for a ++ (b ++ c). Transforming mappend into an accumulating ~ g u m e nfunction t would cause the suhlists to be copied many times (depending on their position). There is no reduction in heap space since all lazy expressions allocated in the heap consist of a (built-in) function with two arguments, whichis the break-even point. The benefit is in the ~ r a p hreducer, which does not needto unwind application spines,but can call the suspended function immediately.
The new paradigm implements breadth-first search rather than depth-first search. This usually takes (much) in an infinmore time, but will find a solution if one exists, unlike depth-first search which may work itself ite branch of the search tree. The discussed o~tim~zations are all applicabl~to some degree. The number of values returned in general is unbound, so an infinite list may haveto be returned. Theassertsandretractsshouldnotbeundonebybacktracking,Sincethedatastructureinwhichthe corresponding example, thecounters (for andkeptasserted are clauses number-of-clauses-added-at-end-for-parent-2 inFigure 8.25) areglobalvariables,no
backtracking will occur,as required.
9.1 No; shared variables are easier to use than message passing. Also, on a shared-memory machine, sharedvariable programs often get better performance than message passing programs, because message passing programs do more copying. 9.4
No; the lock that is used to protect thea~minis~ation can use spinning, because the operations on the list are very simple and will not block for a long time.
9.6
(a) The first operationthatsucceeds in acquiringthemonitorlockwillcontinue.Thesecondoperation, first operation has released the however, will block on the monitor lock, and thus cannot continue until the lock. Thus, although the two operations could in principle be executed simultaneously, the implementation of a monitor runs them sequentially. (b) 1. Use symbolic inte~retationto find those operations that are read-only. 2. Use a mu1ti"state lock to protect the monitor: nobody inside, some readersinside, some readers inside and one or more writers waiting, and one writer inside.
9.10 It is difficult to copy the stack of a thread, because theremay exist variables that point into the stack.In C, for example, it is possible to take the addressof a local variable (storedon the stack) andstore this value in a global variable. If the stack is copied to a new region in memory, the global variable will still point to the location in the original stack. So, migrating a stack from one memory region to another is hard to implement ~ansparently. 9.11 Since X contains a thread, it is harder to migrate, since threads always have a state that has to be migrated (for example, a stack); moreover, thread migration has the same implementationdif~cultyas stack migration, as described in the answer to Exercise 9.10. mi gratin^ object Y is easier and less expensive, since it does not contain a thread. On the other hand,if multiple active objects (likeX) located on different processors do operations on objectY, object Y will repeatedly be moving between these processors. In this case, it to the processorof Y. may be more efficient to move the active objects
A.1 (a) In Java, C++, and probably in most other object-oriented languages, the'consbxctor, is not a consbxctor at all but rather an initializer. The real c o n s ~ c t o is r the new,and when it has finished constructing the object (building the house), the constructor is called to initialize it (move in the furniture). In Expression Program= new Expression ( ) ;,the new allocates space for an Expression,which will not be large enough to hold any of its extended subtypes, and thec o n s ~ c t ocannot r stretch thisspace aftera pointer toit), not the constructor. wards. Also, it is the new that returns the object (or (b) Its consbxctor should be a real constructor, in which thep r o g r ~ m e rcan allocate the space, possibly for a subtype, initialize its fields, and return the constructed object.
Aho, A. V., Ganapathi, M., and Tjiang, S. W, K. (Oct. 1989). Code generation using tree pattern matching and dynamic p r o g r a ~ i nACM ~ . Trans. Program~ingLanguages and Systems, 11(4), 491-516 Aho,A. V. andJohnson, S. C.(March1976).Optimalcodegenerationforexpressiontrees. J .ACM, 23(3), 488-501 Aho, A. V., Johnson, S. C.,andUllman, J. D. (Jan. 1977).Codegeneration for expressionswithcommon subexpr~ssions.J. ACM, 24(1), 146-160 Aho, A. V.,Sethi, R., and Ullman, J. D. (1986). C o ~ p ~ l e rPrinciples, s: Tec~niquesand Tools. Addison-Wesley Aho, A. V.and Ullman, J,D. (1973). The Theory of Parsing, Translation and Co~piling,Vol. I:Parsing, Vol. 11: C ~ ~ ~ p i Z iPrentice ng. Hall Ahuja, S., Casriero, N., and Gelernter, D. (Aug. 1986). Linda and friends. IEEE Com~uter,19(8), 26-34 n Machine -A Tutorial Reconstruction. MIT Press Lit-Kaci, H. (1991). ~ ~ r r'SeAbstract ~y C~~zputing. 2nd Edn. B e n j a ~ ~ C u m ~ n g s Alrnasi, G.S. and Gottlieb, A. (1994). ~ i g h ParaZZel Anderson, J. P. (March 1964).A note on some compiling algorithms.Conzm. ACM, 7(3), 149-150 Anderson, T. E., Bershad, B. N., Lazowska, E. D., and Levy, H. M. (Oct. 1991). Scheduler activations: Effective kernel supportfor the user-level manage~entof parallelism. In 13th ACM Symposium on Operating Systems Prinei~les(Edward D. Lazowska,ed.), pp. 95- 109.ACM in distributedprogrants. ACM C o ~ z ~ u t ~ n g Andrews,G. R. (March1991).Paradigmsforprocessinteraction S ~ ~ ~23(YI ),S49-90 , Anonymous (1840). Des ~ r e ~ h e r rvon n ~ ~ n c h h a u wun~erbare se~ Reisen urzd Abent~euerzu Wasser und zu Lande. Dieterichsche Buchhandlung, Gottingen Appel, A, W. (1987). Garbage collection can be faster than stack allocation. I ~ ~ o r ~ aProcessing ~ion Letters, 25, 275-279 Appel, A. W. (1992). Co~pilingwith Continuations. Cambridge University Press r ~o ~ ~ iImp~eme~ta~ion ~er in C ~ M ~ J a vCa a. ~ b ~ d University ge Press Appel, A. W. (1997). M o ~ e C algorithm for register Appel, A. W. and Supowit, K. J. (June1987). Gener~izationsof the S e t h i - U l l ~ ~ allocation. Software -Practice and Experience, 17(6), 417-421 Assmann, W, (Oct. 1992). Anothersolution of scoping problems in symbol tables. In C o ~ ~ i l Construct~on, er 4th ZnternutionaZ Conference, CC'92 (U. Kastens and P. Pfahler, ed.), pp. 66-72. Springer-Verlag Austin, T.M,, Breach, S. E., and Sohi, G.S. (June 1994). Efficient detectionof all pointer andm ay access errors. ACM ~ ~ Notices, I 29(6), 290-3 ~ 12P ~ Bacon, D. F., G r ~ a S.~ L., , andSharp, 0. 3. (Dec. 1994).Compiler ans sf or mat ions for high-pe~ormance computing. ACM C o ~ p u t ~ nSurveys, g 26(4), 345-420 Bal, H. E., Bhoedjang, R., Hofman, R., Jacobs, C., Langendoen, K., Ruhl, T., and Kaashoek, M, F. (Feb. 1998). Performance evaluationof the Orca shared-object system.ACM Trans, C o ~ p u t e Systems, r 16(l), 1-40
72u
Bal, H. E. and Grune, D. (1 994). P r o g r a ~ ~ Language ~ng Esserzt~als.Addison-Wesley Bal, H. E., Kaashoek, M. I?., and Tanenbaum, A. S. (March 1992). Orca: A language for parallel programming of distributed systems. IEEE Trans. S o ~ ~ aEngineering, re 18(3), 190-205 Bal, H. E., Steiner, J. G., and Tanenbaum, A. S. (Sept. 1989). Programming languages for distributed c o ~ p u ~ n g systems. ACM Corn~utingSurveys, 21(3), 261-322 Barach, D. R., Taemer, D. H., and Wells, R. E. (May 1982). A technique for finding storage allocation errors in C-Ianguage programs. ACM SiGPUN Notices, 17(5), 1 6-23 Barrett, D. A. and Zorn, B. G. (June 1995). Garbage collection using a dynamic threatening boundary. A C ~ S I G P U ~Notice‘s, 30(6), 301-3 I4 Baskett, F. (April 1978). The best simple code ene era ti on technique for while, for, and do loops. ACM SIGPUN Notices, 13(4), 3 1-32 Bell, J. R. (June 1973). Threaded code. C o ~ ~ACM, ~ n 16(6), . 370-372 ges Ben-Ari, M. (July 1984). Algorithms for on-the-fly garbage collection. ACM Trans. P r o g r ~ ~ L~ ~~~~n ng ~gaand Systems, 6(3), 333-344 Bernstein, R. L. (Oct. 1985). Producing good code for the case statement. S o ~ w a r e- Practice and Exp~rience, 1 ~ ( 1 0 )1021-1024 , Bertsch, E. and Nederhof, M.-J. (Jan. 1999). On failure of the pruning technique in ‘Error repair in shift-reduce parsers’. ACM Trans. P r o g r ~ ~ r nknguages in~ and Systems, 21(l), 1-10 Bhoedjang, R. A. F., Ruhl, T., and Bal, H. E. (Nov. 1998). User-level network interface protocols. IEEE C o ~ p u t e r31( , 1 I), 53-60 Bird, R. J. (1998). Introd~ctionto ~unctionalP r o g r a ~ ~ i nusing g aske ell. 2nd Edn. Prentice-Hall Europe Birtwistle, G. M., Dahl, 0.-J., ~ y h r h a u gB., , and Nygaard, K. (1975). S I M ~ Ubegin. Petrocellj/Charter Bjornson, R., Carriero, N., Gelernter, D., and Leichter, J. (Jan. 1988). Linda, the Portable Parallel. Tec~nical Report RR-520, Yale University n , Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Blume, W., Doallo, R., ~ i g e n ~ a n R., Pottenger, B., Rauchwerger, L., and Tu, P. (Dec. 1996). Parallel p r o g r ~ m i n gwith Polaris. IEEE C o ~ p u t e r , 29( 12)’ 78-83 Boehm, €I.-J. (June 1993). Space-efficient conservative garbage collection. ACM SIGPUN ~ o t i c e s ,28(6), 197-206 Boehm, H.-J. and Weiser, M. (Sept. 1988). Garbage collection in an uncooperative environnient. S o f ~ a r ePractice and Experience, 18(9), 807-820 Boyd, M. R. and Whalley, D. €3. (June 1993). Isolation and analysis of optimization errors. ACM S I G P ~ N Notices, 28(6), 26-25 Boyland, J. and ~ a ~ ~G.~(Oct. n a1997). , Parasitic methods - Imple~entationof m u l t i - ~ e ~ o for d s Java. ACM SIGP~N Notices, 32( 1O), 66-76 Briggs, P., Cooper, K. D., Kennedy, K., and Torczon, L. (July 1989). Coloring heuristics for register allocation. ACM SiGPUN Notices, 2 Brooker, R. A., ~ a c c a l l u m I. , R., Morris, D., and Rohl, J. S. (1963). The compiler compiler. Annua~Review A u t [ ~ ~ a tPro,~r~rnrning, ic 3,229-322 Bruno, J. and Sethi, R. (July 1976). Code generation for a one-register machine. J. ACM, 23(3), 502-510 : Interpretation and Co~pilation.Pitman Burn, G. L. ( 1991). Lazy ~ ~ n c t i u n~arli g u a g e sAbstract Carriero, N. and Gelernter, D. (Sept. 1989). How to write parallel programs: A guide to the perplexed. ACM C o ~ ~ u tSurveys, j n ~ 21(3), 323-357 Chaitin, G. J., Auslander, M. A., Chandra, A. K., Cocke, J., Hopkins, M. E., and Markstein, I?. W. (1981). 6( l), 45-57 Register allocation via coloring. C o ~ ~ ~Language~s, t e r Chapman, N. P. (1 987). LR Parsing: Theory and Practice. C a ~ b r i d g eUniversity Press n . 13(11), 677-678 Cheney, C. J. (Nov. 1970). A non-recursive list compacting algorithm. C o ~ ~ uACM, Cocke, J. and Kennedy, K. (Nov. 1977). An algorithm for reduction of operator strength. C ~ ~ r n ~ACM, u n . 20(11j, 850-856 Cohen, J. (1981). Garbage collection of linked data structures. ACM C u ~ ~ u tSurveys, i ~ g 13(3), 341-367 Collins, G. E. (Dec. 1960). A method for overlapping and erasure of lists. Conz~un.ACM, 3(12), 655-65’7 Colomb, R. M. (March 1988). Assert, retract and external processes in Prolog. S o ~ ~ a r-e Practice and E ~ ~ e r i ~ r i18(3), c e , 205-220 Conway, M. E. (July 1963). Design of a separable t~~nsition-diagram compiler. Curnrnun. ACM, 6(7), 396-408
Cooper, K. D., Hall, M. W., and Torczon, L. (March 1992). Unexpected side effects of inline substitution: A case study. ACM Letters on ~ r o g r u ~ ~~nguage‘s ing and Syste?ns,1( l), 22-32 Coulouris, G., Dollimore, J., and Kindberg, T, (1994). Distr~~uted Systems - Co~ceptsand Design. 2nd Edn. Addison-Wesley Davidson, J. W. and Fraser, C. W. (June 1984a). Automatic generation of peephole optimizations.ACM S I G P ~ N Notices, 39(6), 1 11-116 Davidson, J. W. and Fraser, C. W. (Oct. 198410). Code selection throug~object code optimization. ACM Trans. r a ~ ~ i ~n ngg u a g eand s Systei~s,6(4), 505-526 Davidson, J. W. and Whalley, D. €3. (Jan. 1989). Quick compilers using peephole opti~izations.S ~ ~ ~ -u r e Practice and E ~ ~ e r ~ e19( n c1), ~ 79-97 , g - The quiche-eating approach. In Debray, S. K. (1994). Im~lementing logic p r o ~ a ~ i nsystems I ~ n p l ~ ~ n e n ~ aoft ~hognisc P r o g r a ~ m ~ nSy.ste~s g (E. Tick and G. Succi, ed.), pp. 65-88. Muwer Academic e g r ~ m ~Cs ( . ~ ~ ~ACM, u n .14(7), 453-460 DeRemer, F. L. (July 1971). S i ~ p l LR(k) DeRerner, F. L. (1974). Lexical analysis. In C o ~ p i l e rConstruction, An A d v a ~ c eCourse ~ (F.L. Bauer and J. Eickel, ed.), pp. 109-120. Springer-Verlag Dewar, R. (June 1975). Indirect threaded code. C o m ~ u nACM, . 18(6), 330-331 , L. (Nov. 1978). On-the-fly garbage collection: an exercise in cooperation. C[~mnzun. (March 1998). A syste~aticstudy of functional language impleinentations. T O P U S , zle, U. (Oct. 1995). ~ i ~ ~ n i ~row i t displacement ig dispatch tables. ACM SIGPLAN Notices, 30(10), 141-155 DuJardin, E., Amiel, E., and Simon, E. (Jan. 1998). Fast algorithms for compressed multime~oddispatch tables. ~C~ ~ r a ~Psr.o g r u ~ ~ ~ n g u u g e and s Sy~stem~s, 20(t), 116-165 Dybvig, R. K. ( 1 996). The S c ~ e P~ reo g r u ~ ~ i n~ gn g u ~ gANSI e : Scheme. 2nd Edn. Prentice Hall n . 13(2), 94-102 Earley, J. (Feb. 1970). An efficient context-freeparsing ~gorithm.C o ~ ~ uACM, Engel€~iet,J. (1984). Attribute grarnm~s:att~buteevaluation methods. In ~ e t ~ and o ~Tools s $or Co~~iZer C ~ n . s t r ~ c ~(B. i o nLorho, ed.), pp. 102-138. Cambridge ~niversityPress elfriet, J. and De Jong, W. (1990). Attribute storage optimization by stacks. Acta I n ~ ~ r ~ u t i27,567-581 ca, s . S I G P Not~ces, ~ 19(6), 70-80 (June 1 984). Sub-protocol-evaluatorsfor attribute g r ~ ~ a rACM and Yellin, D. (1986). A c o ~ p ~ i s oofn storage opti~zationsin automatica~lygenerated attribute tors. Acta Informatics, 23, 393-427 Feijs, L. M. C. and Van Ommering, R, C. (Aug. 1997). Abstract derivation of transitive closure ~ g o r i ~ ~ r n ~ . I ~ f o ~ a t i oProces.~ing iz Letters, 63(3), 159-164 Fortes GLilvez, J. (Oct. 1992). Generating LR(1) parsers of small size. In Com~iZerConstr~ction,4th I~ternational Confe~ence,CC’92 (U. Kastens and P.Pfahler, ed.), pp. 16-29. Springer-Verlag Fraser, C. W. and Hanson, D. R. (1995). A ~ e ~ a r g e t a ~Cl eC o ~ ~ i ~-e rDesign and l ~ ~ ~ l e ~ e n t a t ~ o n . Be~jamin/Cu~mings, Redwood City, Ca. Freeman, E., Hupfer, S., and Arnold, K. (1999). JuvaS~aces- Princi~les,Patterns, and Practice. AddisonWesley Freibu~g~ouse, R. A. (Nov. 1974). Register allocation via usage counts. C o ~ ~ uACM, n . 17(1l), 638-642 Geurts, L., Meertens, L., and Pemberton, S. (1989). The ABC Prograi~mer’s~ u n ~ ~Prentice o o ~ Hall . garbage collection for strong~y typed ~rogramminglanguages. ACM ). Eli~inatingbranches using a super-op~imizerand the GNU C compiler. A ~ M S l G P ~ NNotices, 27,341-352 . Hall Criswold, R. E. and Griswold, M. T. (1 983). The Icon P r ~ g r a ~~~n ig ~u ug g ePrentice Griswold, R. E, and Griswold, M. T. (1986). The l ~ p l e ~ e ~ t aofthe t i o ~Icon ~ r o g r u m ~ ~ n gg u u g e Princeton . U~iivers~ty Press Grune, D. and Jacobs, C. J. H. (1990). Pur.sin~~ec~ni4ue.s: a Practical Guide. Ellis Honvood Hall, M. W,, Anderson, J. M., Amarasinghe, S. P., Murphy, B, R., Liao, S.-W., Bugnion, E., and L m , M. S . (Dec. 1 996). ~ a ~ i m i z i n~ultiprocess~r g performance with the SUIF compiler. IEEE C o ~ ~ u t e29(1 r , a), 84-89 Hanson, D. R. (Dec. 1985). Compact recursive~descentparsing of expressions. S o ~ a r e- Practice and
3
H ~ m a n nA. , C. (1977). A Concurrent-Pascal compiler for ~ i n i c o ~ p u t e rSpringer-Verlag s. Fast detection ofmemory leaks and access errors. In inter ’92 Hastings, R. and Joyce, B. (1992). Purify USENIX Conference (Eric Allman,ed.), pp. 125-136. USENIX Association Hemerik, C. and Katoen, J.P. (Jan. 1990). Bottom-up tree acceptors. Science of C o ~ p u t e r P r o g r a ~ ~ i13, ng, 51-72 Hennessy, J. L. and Mendelsohn, N.(Sept. 1982). Compilationof the Pascal case statement. S o ~ a r e-Practice and Experience, 12(9), 879-882 Holmes, J.(1995). O ~ j e c t - O r i e ~ t e ~ C o ~Con~truction. piler ~entice-HallI ~ t e ~ a t i o n ~ a Languages, and Co~putatiun. Hopcroft, J. E. andUllman, J. D.(1979). Zntroduction to A u t o ~ a ~Theory, Addison-Wesley A frameworkfor data dependencetesting in the Hummel, J., Hendren, L. J.,andNicolau,A.(Aug.1994). presence of pointers. In 1994 Znte~ationalConference on Parallel Processing (Vol. 11) (KC. T i , ed.), pp. 216-224. CRC Press Hwu, W.-M. and Chang, P.P. (July 1989). Inline function expansion for compilingC programs. ACM S I G P ~ N
-
Jazayeri, M. and Pozefsky, D. (Oct. 1981). Space efficient storage ~anagementfor attribute g r a ~ a r s ACM . Trans. P r o g r a ~ ~ i n g ~ n g u and a g eSsy s t e ~ s3(4), , 388-404 Johnson, W. L.,Porter, J. S., Ackley, S. I., and Ross, D. T. (Dec. 1968). Automatic generationof efficient lexical processors usingfinite state techniques. C o ~ ~ uACM, n . 11(12), 805-813 Johnsson, T. (June 1984). Efficient compilation of lazy evaluation.ACM S I G P ~ NNotices, 19(6), 58-69 g In 3rd ~unctional Johnsson,T.(Sept.1987).Attribute g r ~ m a r sas a function^ p r o ~ a m ~ nparadigm. P r o g r a ~ ~ i n g ~ n g u a gand e s Computer Architect~reConference (G, Kahn,ed.),pp.154-173.SpringerVerlag Jones, N. D., Gomard, C, K., and Sestoft, P. (April 1993). Partial Evaluation and P r o g r a ~Generation. Prentice Hall Jones, R. and Lins, R. (1996). Garbage Collection -Algorith~s for Auto~atic ~ y n a ~ i c M e Manage~ent. ~ory John Wiley Jourdan, M. (June 1984). Strongly non-circular attributeg r a ~ a r and s their recursiveevaluation.ACM S Z G ~ ~ Notices, 19(6), 81-93 Jul, E., Levy, H., Hutchinson, N., and Black, A. (Feb. 1988). ~ i n e ~ ~ r mobility ~ n e d in the Emerald system.ACM 109-133 Trans. Computer S y s t e ~ s6(l), , for the case statement’. Kannan, S. andProebsting,T.A.(Feb.1994).Correctionto‘Producinggoodcode S o ~ a r e-Practice and E~perience,24(2), 233 K g ,A. H.(May 1987). P r o g r ~ i n gfor parallelism.ZEEE Co~puter,20(5), 43-57 Acta Znfo~atica,24,633-652 Kastens, U. (1987). Lifetime analysis for attributes. Kastens, U.,Hutt, B., and Zimme~ann,E. (1982). GAG: A Practical Co~pilerGenerator. Sprin~er-Verlag Katayama, T. (July1984).Translation of attribute g r ~ a r into s procedures. ACM Trans. P r o g r a ~ ~ ~ n g Languages and Systems, 6(3), 345-369 shared Keleher, P.,Cox, A.L., Dwwkadas, S., andZwaenepoel, W. (Jan. 1994). Tread~ar~s: ~istributed memory on standard workstations and operating systems. In inter 94 ~ S E N I XConference (Jeffrey Mogul, ed.), pp. l 15- 132. USENIX Association Kennedy, K.(1981). A survey of data flow analysis techniques.In P r o g r a ~Flow Analys~s(Steven S. ~uchnick and NeilD. Jones, ed.), pp.5-54. Prentice Hall C o ~ ~ uACM, n . 19(7), 385-394 King, J. (July1976). Symbolic execution and program testing. Klint, P. (Sept. 1981).Interpretation techniques.~ o f ~ a r Practice e and Experience, 11(9), 963-973 Z n ~ o Control, ~. 8,607-639 Knuth, D. E. (1965).On the translationof languages from left to right. Knuth, D.E. (1968). Semantics of context-freelanguages. Math. Syst. Theory, 2(2), 127-145 Knuth, D. E.(l 971). Semanticsof context-freelanguages -correction. Math. Syst. Theory, 5( l), 95-96 g 1:Fundamental Algorit~ms.2nd Edn. AddisonKnuth, D. E. (1973). The Art of C o ~ p u t e r P r o g r a ~ ~ i nVol Wesley Knuth, D. E. and Stevenson, F. R. (1973). Optimal measurement points for program frequency counts. BZT, 13, 3 13-322 Koskimies, K.(June 1991). Object-orien~ationin attribute g r ~ m a r s In . Attribute G r a ~ ~ a rApplicatiuns s, and S y s t e ~ s(H. Alblas& B. Melichar, ed.), pp. 297-329. Springer-Verlag
-
Kowaltowski, T. (1981). Parameter passing mechanisms and run-time data structures. S o f ~ a r e-Practiceand Experience, 11(7), 757-765 Kristensen, A. (1998). Template resolution in ~ L ~ T M Con~puter L , N e ~ o r k and , ~ ISDN Systems, 30(1-7), 239-249 Kurnar, V., Grama, A., Gupta, A., and Karypis, G. (1994). IntroductiontoParallelComputing -Designand Analysis o~Algorithms.B e n j ~ i ~ C u ~ n g s Kursawe, P. (1987). How to invent a Prolog machine. New Generation Computing, 5,97-l14 LaLonde, W. R., Lee, E. S., and Horning, J. J. (1971). An LALR(k) parser generator.In P I P Congress 71 (C.V. Freiman, ed.), pp. 153-157. North-Holland Landin, P. 5. (April 1964).The mechanical evaluationof expressions. Computer J., 6(4), 308-320 Lemkin, P. F. (Oct. 1988). PSAIL: a portable SAIL to C compiler -description and tutorial. ACM SIGPLAN Notices, 23(10), 149-171 Levine, J. R.(1999). Linkers and Loaders. Morgan-Kaufmann Levine, J.R., Mason,T., and Brown,D. (1992). Lex and Yacc. 2nd Edn. O’Reilly Lewis, H. R. and Papadimitriou, C.H. (1998). Elements of the Theory of Computation.Prentice Hall Lewis 11, P. M. and Steams, R. E. (1968). Syntax-directed~ansduction.J. ACM, 15(3),465-488 Li, K.and Hudak, P. (Nov. 1989). Memory coherencein shared virtual memory systems. ACM Trans. Computer Systems, 7(4), 321-359 Linz, P. (1997). An Introduction to Formal Lunguages and~utomata.2nd Edn. Jones and Bartlett Louden, K.C. (1997). Compiler Construction -Principles and Practice. PWS PubSishing Loveman, D. B. (Feb. 1993). High performance Fortran.IEEE Parallel and Dis~ributedTec~noZogy,l ( l), 25-42 Maassen, J.,Van Nieuwpoort, R., Veldema,R., Bal, H, E., and Plaat, A, (Aug. 1999).An efficient i~plementation Notices, 34(8), 173-182 of Java’s remote method invocation.ACM S I G P ~ N M ~ e l I i ,A. and M o n t a n ~ U. , (April 1982). An efficientunification ~ g o ~ ACM ~ m Trans. . Progr~mming ~ n g u a g eand s Systems, 4(2), 258-282 Massa~in,H. (Oct. 1987). S u p e ~ o p ti ~ zer -A look at the smallest program. ACM S I G P Notices, ~ 22(10), 122-1 26 ~ c C ~ h J.y (April , 1960).Recursivefunctions of symbolicexpressions and theircomputation by machine. Commun. ACM, 3(4), 184-195 McKenzie, B. J., Harries, R., and Bell, T. C. (Feb. 1990). Selecting a hashing algorithm.S o ~ a r e Practice and ~ x p e r i ~ n c20(2), e, 209-224 McKenzie, B. J., Yeatman, C., and De Vere, L. (July 1995). Error repair in shift-reduce parsers. ACM Trans. Programming ~ n g u a g eand s Systems, 17(4), 672-689 Milner, R., Tofte, M., Harper, R., and MacQueen, D. (1997). The ~ e ~ n i t i of o nStand~rdML. revised Edn. MIT Press Morel, E. (1984). Data flow analysis and global optimization. In eth hods and Tools for Compiler Con.struction (B. Lorho, ed.), pp. 289-315. Cambridge University Press Morgan, R.(1998). ~ u ~ ~ an d Optimizing ~ n g CompiEer. Digital ~ess/Buttenvorth-Heinemann Muchnick, S. (1997). Advanced Compiler Design andI~plementation.Morgan Kaufmann Muchnick, S. S.and Jones, N.D. (1981). Program FZow Analysis. Prentice Hall Naur, P. (1965). Checking of operand types inAL,GOL compilers. BIT, 5, 151-163 Nguyen, T.-T. and Raschner, E. W. (May 1982). Indirect threaded code used to emulate a virtual machine. AC34 S I G Notices, ~ ~17(5), ~ 80-89 Nitzberg, B. and Lo, V. (Aug. 1991). Distributed shared memory -A survey of issues and algorithms. ZEEE Computer, 24(8), 52-60 Noonan, R.E. (l 985). An algorithm for generating abstract syntax trees. Computer ~ n g u a g e s10(3/4), , 225-236 Informat~onProcessing Nuutiia, E. (Nov. 1994). An efficient transitive closure algorithm for cyclic digraphs. Letters, 52(4), 207-213 Op den Akker,R. and Sluiman, E.(June 1991). Storage allocationfor attribute evaluators using stacks and queues. Applications andSystems (H. Alblas & B. Melichar, ed.), pp. 140-150. SpringerIn Attribu~eGr~~inmars, Verlag Pagan, F. G. (June 1988). Converting interpreters into compilers. S o ~ a r e-Practice and ~xperience,18(6), 509-527 Pagan, F. G. (1991). Partial Computation and the Construction of Lunguage Processors. Prentice Hall
Pager, D. (1977). The lane-tracing algorithm for constructing LR(k) parsers and ways of enhancing its efficiency. Inform. Sci., 12, 19-42 Paige, R. and Koenig, S. (July 1982). Finite differencing of computable expressions. ACM Trans. P r o g r a ~ ~ i ~ g
LR: A predicated-LL(k) parser generator. So~ware-Practice and iler Construction. Computer Science Press Paulson, L. C. (1996). MLfor the ~ o r k ~ Progra~mer. ng 2nd Edn. Cambridge University Press ntsonanerror-recoveryscheme by Hastmann. Sqftware
- Practice
and
Pettersson,M.(Oct.1992). A termpattern-matchcompilerinspired by finite automatatheory. In C o ~ p i l e r Conference, CC92 (U.Kastens and P. Pfahler, ed.), pp. 258-270. SpringerConstruction, 4th Internati~~nal Verlag of ~unctionalP r o g r Q ~ ~ i ~ n gn g u a g e sPrentice . Hall Peyton Jones,S. L. (1987). The Imple~entati[)n PeytonJones, S. andHughes, J. (Feb.1999).Haskell98: A non-strict,purelyfunctionallanguage. Technicul Report ~ttp://www. haskell.org/onlinereport,Internet Peyton Jones, S. L. and Lester,D. R. (1992). Z~plementing~unctional~ n g u u g e sPrentice , Hall Plajsted, D. A. (1984). The occur-check problemin Prolog. New GenerfftionCo~puting,2,309-322 Poonen, G. (Aug. 1977). Error recovery for LR(k) parsers. In Znfor~ationProcessing 77 (Bruce Gilchrist, ed.), pp. 529-533. North Holland Proebsting, T. A. (May 1995). BURS automata generation. ACM Trans. P r o g r a ~ ~~i nn g~u u g e ~and s S~stems, 17(3), 461-486 Purdom Jr., P.(1970). A transitive closure algorithm.BIT, 10,76-94 Computing -Theory and Practice. McGraw-Hill Quinn, M. J. (1994). Parul~e~ c a 185-192 , Raiha, K.-J. and Saarinen,M. (1982). Testing attributeg r a ~ a r for s circularity,Acta I n ~ ~ r ~ a t i17, ~ ~ 32(5), R~alingaln,G. and S~nivasan,H. (1997). A member lookup algorithm forC++. ACM S Z G P Notices, 18-30 Reps, T.(March 1998). ~aximal-munchtokenization in linear time. ACM Trans. P r o g r a ~ ~ i ~ nn gguage~ and s S ~ s t e ~ .20(2), s , 259-273 RdvCsz, G. E. (1985). Introduction to ~ o r LanguQges. ~ a ~ McGraw-Hill Richter, H. (July 198s). Noncorrecting syntaxerror recovery. ACM Trans. P r o ~ r a ~ ~ i n g ~ n gand u a gSystems, es 7(3), 478-489 Ritter, T. andWalker, G. (Sept. 1980). Varieties of threaded code for language il~plementation.BYTE, 5(9), 206-227 Robinson, J. A (Jan. 1965). A machine-oriented logic basedon the resolution principle.J. ACM, 12(1), 23-41 Mach. Intell., 6,63-72 Robinson, J.A.(1971). Computational logic: The unification computation. Rohrich, J. (Feb. 1980). Methods for the automatic construction of error correcting parsers. Acta Z ~ f o ~ a t i c a , 13(2), l1 5-139 Sale, A. (Sept. 1981). The i~plementationof case statements in Pascal. S o f ~ a r e Practice and Experience, 11(9), 929-942 Saloman, D. (1992).A ~ s s e ~ ~ land e r sLoaders. Ellis Horwood C o ~ ~ uACM, n . 3(2), 76-83 Samelson,K. and Bauer, F.L. (Feb. 1960). Sequential formula translation. Sankaran, N. (Sept. 1994). A bibliography on garbage collection and related topics. ACM SIGPLAN ~otice,s, 29(9), 149-158 Sassa, M. and Goto, E. (1976). A hashing method for fast set operations. I ~ j ~ r ~ a t Processing ion Letters, 5(2), 3 1-34 Schnorr, C. P. (May 1978). An algorithm for transitive closure with linear expected time. SIAM J. Comput., 7(2), 127-133 Schorr, H. and Waite, W. NI. (Aug. 1967). An efficient ~achine-independentprocedure for garbage collection in various list structures.C o ~ n ~ uACM, n . 10(8), 501-506 Sedgewick, R. (1988).A l ~ ~ r i t h mAddison-Wesley s. J. ACM, 17(4), Sethi, R. and Ullman, J.D. (Oct. 1970).The generationof optimal code for arithmetic expressions. 7 1S-728 Sheridan, P. B. (Feb. 1959). The arithmetictranslator-co~pilerof the IBM FORTRAN automatic coding system. C a ~ ~ ACM, u ~2(2), ~ 9-21 .
-
Siklbssy, L. (June 1972).Fast and read-only algorithms for traversing trees withoutan auxiliary stack. Information Processing Letters, 1(4), 149-152 Sippu, S. and Soisalon-Soininen, E. (l988/1990). Parsing Theory, Vol. I: ~ n s u a g e and s Par.sing; Vol. 11: LL(k) and ~(~~Parsing. Sp~nger-Verlag Skill;cosn, D. B. and Talia, D.(June 1998). Models and languages for pasallel computation. ACM C o ~ ~ u ~ ~ n S ~ ~ e y 30(2), s, 123-169 Stirlil~g7 C. P. (March 198s). Follow set error recovery. S o f ~ a r e-Practice and E ~ p e r ~ e n c15(3), e, 239-257 shared memory. IEEE Co~puter, Stnmm, M. and Zhou, S. (May 1990). Algorithms imple~enting dis~buted 1997). ~ n s u a s e and s Machines -An Introduction to the Theory of C o ~ p u t eScience. r 2nd Edn. Addison-Wesley Syslo, M. M.andDzikiewicz,J. (1975). Computationalexperienceswithsometransitiveclosurealgorithms. C ~ ~ ~ u t i15,33-39 ng, Tanenbau~,A. S. (1995). Distr~butedO ~ e r a t ~Snys~ s t e ~Prentice s. Hall Tanenbaum, A. S., Van Staveren, H., Keizer, E. G., and Stevenson, J. W, (Sept. 1983). A practical toolkit for making portable compilers. C o ~ ~ u ACM, n . 26(9), 654-660 W. (Jan.1982).Usingpeephole0ptimizat;on on Tanenbaum,A. S., VanStaveren,H.,andStevenson,J. i n t e r ~ e ~ i acode. t e ACM Trans. P r o s r a ~ ~ i n~gn g u a s e and s S y . s ~ e ~4(1), s , 21-36 Tarditi, D. R., Lee, P,, and Acharya, A. (June 1992). No assembly required: compiling Standard ML to C. AGM Letters on P r o g r a ~ ~ i n~gn g u a g e and ~ s S y s ~ e ~ 1(2), s , 161-177 Tarjan, R. E. (April 1975).Efficiency of a good butnot linear set merging algorithm.J .ACM, 22(2), 215-225 Templ, J. (April 1993). A syste~atic approach to multipleinhe~tance.ACM S I G P Notices, ~ 2$(4)761-66 Terry, P. D.(1997). Co~pilersand Compiler Generators -An Int~oductionUsing C+ +.Intesnational Thomson . 11(6),419-422 T h o ~ p ~ oK. n ,(June 1968). Regular expression searcha~gorithm.C o ~ m u nACM, s r aEdn, m ~ iAddison-~esley ng. Thompson, S. (1999). ~a.skel1:The Craft of ~ u ~ ~ c t i o n a ~ P r ~ ~ 2nd T u ~ e r ,D. A. (1979). A new implementatio~technique for applicativelanguages. S o f ~ a r e ~racticeand Experience, 9,3 1-49 Uhl, J. S. and Horspool, R. N.(1994). Flow g r a ~-~a flow s analysis methodology. In C o ~ ~ i l Construction: er 5th International Conference, CC ’94 (Peter A. Fsitzson, ed.), pp. 203-217. Spsinger-Veslag Ungar, D. M. (May 1984). Generationscavenging:A non-dis~ptivehighpesformancestorage recl~ation a l g o ~ t ~ lACM n. SIGP~N Notices, 19(5), 157-1 67 Van Ro ,P. (1994). 1983-1993: The wonder years of sequential Prolog i~plementation.J .Logic P r o g r a ~ ~ i n g ,
-
Van Wijn~aarden,A.,Mailloux, B, J., Peck, J. E. L., Koster, C. H. A., Sintzoff, M., Lindsey, C. H,, Meertens, L. G. L. T., and Fisker,R. G. (1975). Revised report on the algorithnk language Algol 68. Acta Z n f o r ~ u ~ ~ 5,c a , 1-236 Verb~gge,C., Ca, P., and Hendren, L. (1996). Generalized constant propagation -a study in C. In C o ~ p ~ ~ e r C ~ n s t r ~ ~ t i6th o n Inte~ationaf : Conference, CC’96 (Tibor Gyimbthy, ed.), pp. 74-90. Springer-Verlag Vitels, J. and Horspool, R.N.(1996). Compact dispatch tablesfor dynamica~lytyped object-o~ente~ languages. In Co~pilerConstruction: 6th Zntern~tionalC~)nference,CC’96 (Tibor Gyimbthy, ed.), pp. 309-325. SpringerVerlag Waddle, V. E. (Jan. 1990). Productiontrees: a compactrepresentationofparsed progra~s.ACM Trans. P r o g ~ a ~ m~i n~ s~uga s e and s S y s t e ~ s12( ~ 3 ),61-83 W ~ e nD. , H. D.(Oct. 1983). An Abstract PrologInst~ctionSet. Technical Report Technical Note 309, Artificial Intelligence Center, SRI Wasshall, S. (1962). A theorem on Boolean matrices. J.ACM, 9, 11-12 Property extraction in well-founded property sets. IEEE Trans. S o f ~ a r e E n ~ i ~ e e r i n g , e 1990). Fast code generation using
au~omatic~ly generated decision trees. ACM S I G P ~ ~
90). .Pitfalls of conservative garbage collection. S o ~ a r e Practice and Experience, Wich~ann,B. A.(May-June1977).Howtocallprocedures,orsecondthoughts ~of~ar ePractice and E ~ ~ e r i e n c7(3), e , 317-329
on A c k e ~ a n n ’ sfunction.
Wilhelrn, R.and Maurer, D. (1995). Compiler Design,~ddison-Wesley Wilson, G. V.(1995). Practical ParallelProgrammin~.MIT Press Wilson, G. V.and Lu,P. (1996). Parallel Programming Using C++.MIT Press ce Parallel Computi~g.Addison-Wesley Wolfe, M, J. (1996). High P e ~ o ~ a nCompilers~or Academic Press Yershov, A. P. (1971). The Alpha auto ma ti^ Program~ingSyste~. Yuval, G. (July-Aug. 1977). The utility of the CDC 6000 registers. S o ~ a r e Practice and Experience, 7(4), 535-536
-
2-pass compiler 26 2-scan compiler 26 a posteriori type 455 a priori type 455 ABC 46 abstract class 473,701,703 abstract data type 471-472 abstract syntax tree 9,22,52,55, 194 acceptable partitioning 222,223-224 acceptable-set method 138-139 accumulating arguments 585, 595 acquiring a lock 660,669-670,674,695 action routine 616 ACTION table 160,170,191 activation record 32,412,462,482 active node 285-286 active routine 32,485,497,711 active-node pointer 285-287,297 actual parameterin Linda 664 Ada 447,696 address descriptor 3 19 address space 29,376,657,659,662,668,675 a~inistrationpart 483-484,514,522 aggregate node allocation 588 Algol 68 457,463-464,488,496,622 Algol 68 format 187 alignment requirements 379,399,435,465,467, 533 alternative 37, 113-1 14,702 ~ b i g u o u grammar s 38, I 1 1 , 172, 192,457 amortized cost 397,560 analysis-synthesis paradigm 6
annotated abstract syntax tree annotation 9,96, 194 anonymous type declaration 449 ti-depende~ce 687 M T L R 132 application spine 561,573,718 ~lpplicative-orderreduction 566,5 applied occurrence 199,380,440, ~ ~ t h ~ esequence t i c construct 541 arithmetic si~plifica~~on 367, 370, 586 arity 546,566,587,610,639 array descriptor 469 array type 450,467 may withoutarra 534 a s s e r t a () 630 a s s e r t z (1 630-631
assignl~ent248,277, 321,416,458,464,532 assignment under a pointer 334,688 associative addressing 664 AST 9 asynchronous message passing 66 atomic action 664 Attach clauses instruction 603 attribute 9, 96, 135,234, 547 attribute evaluation rule 196-197,211,223, 227, 274 attribute evaluator 196,202,204,23 1, 273 attribute g r ~ 7,195, m ~ 547 auto~aticpa~al~elization 684, 695 automatic programming 33
auxiliary register 309 available ladder sequence 328 back-end 3,6, 11,32 backpatch list 263, 380 backpatching 263,380 backtracking 77,117,596,607,654 Backus Normal Form 37 Backus-Naur Form 37 backward data-flow 262,277 bad pointer 462 bag399 base address 467-468 basic block 254,320,322,373 basic item 72-73, 80 basic operand 35 1 basic type 450,460,543,582 BC(k,m) parsing 152 binary relation 598 binding a logical variable 596,654 bison 178 bitset470 block 397,399 blocked thread 668-670,695-696 BNF 37,59,158,202 body 599 Boolean control expression 502 Boolean type 461 bootstrapping 2,709 bottom-up parser 114, 150, 178, 191,230,234-235, 712 bottom-up pattern matching 341,342-343,346 bottom-up rewriting system 341 bottom-up tree rewriting 320,357,365, 372 bounded stack 436 boxing analysis 582 broad compiler 26-27,96,244,320 bucket 100,102 BURS 337,341,349,357,494 busy waiting 669,695 C 25,117,375,442,549,558,597 C preprocessor 107 caching behavior 688 call by reference 5 14 call by result 514 call by value 5 14 call by value-result 514 callee 29,260,513,534,576,583 callee-saves scheme 515,534 caller 260,518,576,583 caller-saves scheme 515,534 calling graph 41,43 calling sequence 513,515,517,634
candidate set 261 case label 506,536 case statement 505-507,536 cast457 cell 41 l , 420 character move 73 child112 child routine 485,495 chunk 397,399 class471 class composition 700 class extension 700,708 clause 596,600 cloning 368,370-37 1 closed hashing with chaining 100 closure 439,498,589 closure algorithm 42,498 closure specification 43,5 1 CM-5657 coalesce 400,403,408 code addressof a routine 483,485,491,496,499, 587 code generatedfor aquery 621 code generation 19,290,295,297,302,320,337, 374,501,523,525,568 code generation module 24 code hoisting 277 code in-lining 24,369-370,395,582,623,638 code segment 375,396,716 code selection 293 code-generating scan 341 coercion 448,455 collision 100, 103,444 column-major order 468 comb algorithm 90 comment starter 68, 187 common subexpression 324-325,334-335, 391-392 common subexpression elimination3 10, 324-326, 549 communication daemon thread672,674 compaction 408,421,428,437,717 co~p.compi1ersnewsgroup 185 compilation by symbolic inte~retation3 18 compilation unit 31,357,450,523 compiler l compiler compiler 8, 178 composition operator 58, 3 86 compound result type 517 concurrent execution 663,675,685 concurrent garbage collection 409,435 condition code 286 condition synchronization 660 condition variable 660,670
conditional compilation 104 conditional text inclusion 104, 107 conservative garbage co~lection414 conservative parallelization 687 constant folding 24,49, 55,367,395, 549 constant propagation 250,253,369,395, 638 context condition 39, 195-196, 199,709 context handling 9, 19, 194, 244,440 context andl ling module 23,54,97,279 context (in coercions) 455 context switch 647-668,670 ,39,60,133, 194,712
directly derivable from 39 d i s c ~ ~ n a t eunion d 467 5-476,478-479,481, 512
continuation of a stack 139, 140, 175
dynamic cycle detection 211
conversion457 core of an LR(1) state 170-17 1 coroutine 29,33,414,482,484,485,500 coroutine call 29 ctype package 68 currying 489,497,540,546,SS 1,560,592-593 cut operator 597,601,628,654
dynamic programl~ing 337,347,350,357
dag 321
distributed system 656, 695 dist~butingan array 690 dope vector 526 dotted item 71-72, 110, 153,373 dotted tree 342 dvi format 8 dynamic allocation part 515,516 dynamic array 470, 5 14
E closure 77,79, 154, 166 E move
73,154,186 EBNF 38, 50, 59 elaborate a statel~lent284 element of an m a y 467 element selection 468 empty list 541
~ ~ n g e rcompiler ~us 29 dan~lingpointer 407,463-464 data dependency 686 data p~allelism658, 665-666,685 data segment 376,396,7 16 data-flow equations 253,262,265,267,277 data-flow paradigm 195 p r o g r ~ ~ n658,665, g 688 data-pa~~l~el dead code e l i ~ n a t i o n368,395,638 dead variable 32 1 debugging of garbage collectors 415 decl~ationof a method 473 defining occurrence 199,380,440,4 definition of a method 473 degree of a node 360 , dep~ndencygraph 2 0 ~274,321,326-328 dependent inheritance 478 dependent on 200,334 dereferencing a field selection 461 derivable from 40 derivation36 dese~alization672 d e s u g ~ n g553 dete~inistic110, 117, 134, 172,337 direct left-recursion l30 directed acyclic graph 321
enu~erationtype 461,467,475 equation541 error correction 116, 136, 144,175 error detection 115-116, 142, 190 error handling in lexical analyzers 93 error state 158 error-recove~ngnon-te~inal176 escape character 59 Escher, M.C. 301 Ethernet657 evaluate an expression 284 event queue 404 exception hand~er 415,522,537 exclusive or 436 executable code output module 24 Extended BNF 38 extended subset 30, 50 extensible may 98,287, 398,404,435 external entry point 377 external name 377 e x t e ~ reference a~ 377 external symbol 377 external symbol table 377 extra-logical predicate 630
failure in theWAM 625
field selection 466 file inclusion 104 filter 557 final m e ~ o d479 fine-grained p~allelism685 finite-state auto~aton80 finite-state tree automaton 347 FIRST set 12I first-class citizens, functions as 545 FIRST~IRSTconflict 127 first-fit 90 FIRST~OLLOWconflict 127 jlex 187 flow dependence 687 FOLLOWset 123 follow set 138 FOLLOW-set error recovery 139 follow-set error recovery 138 fork statement 659 formal attribute 195 formal parameterin Linda 664 for-statement 508 forward declaration 441,473 forward reference 450,594 forwarding address 676 FP 482,515,517-518 frame 412,482,489,585 frame pointer 482,489,493,559 free bit 399,420-421 free list 400,416 from-space 408,425 front-end 3,52,539 FSA 50,80,347,373 full symbolic inte~retation 247,2S 1, 277 full-LL(1) 124, 190 fully p~enthesizedexpression 12,701 function result register 5 17 functional core 549,558-559, S68, 575, 594 functor 607, 639 garbage collection 262,398,407,545,598 gabage set 407 gcc 297,333 general closure algorithm 44 general name space 442 generational garbage collection 409,429 generator 557 generic pointer type 462 generic unit 109,525 global routine 485 GNU C compiler 297 goal599 goal list stack 601 goto statement 25 1, 368,502
GOT0 table 158 g r a m ~ a r34,35 grammar symbol 35,195 graph coloring 92 graph reducer 560 handle152 hash func~ion100,682 hash table 300, 188,325,443,470,532,626,682 Haskell 540 head599 header file in C 22 heap 397,462,484,495,498,560 heap sort 404 heroic compiler 659 heuristic graph coloring 360 hidden left-recursion 130, 149 High Performance Fortran 693 higher-order function 545 HPF 693 Huffrnan compression 86
IC 23 Icon 29,33,389,488,496,615, 622 if-statement 368, 505 i~plementationlanguage 1, 100, 399,451 implicit parallelism 659 implicit receipt 662 imported scope 448 in operation in Linda 664,679 in situ replacement 595 inactive routine 485 incremental garbage collection 408,436 independent inheritance 478 indexing468 indirect left-recursion 130 indirect routine call 261,47 1,489 indivisible operation 660,664-665,669,695,697 inference rule 42 inference technique 601 inheritance472 inherited attribute 196 initial state 80, 343,351 in-lining368 input tree 339 inst~t~ation through dope vectors S26 ~ n s ~ n t ~ athrough ~ i o n expansion 526 instruction ordering 293 inst~ction-colIectingscan 341 inst~ction-selectingscan 341 inst~mentinga program 3 17 interface 480 interference graph 92,359 intermediate code 9,23
intermediate code generation module 23 inte~ediatecode opti~zationmodule 24 intermediate code tree 279 interpreter 3,281, 295, 560,603,630, 699 inte~roceduraldata-flow analysis 260,688 invalidate replicated copies 677 IS-dependency 2 12 IS-SIgraph 212,274 itern '71-72,110,153,373 iterative interpreter 285,297, 389,603 iterator 485, 500
I 673 Java M join operator 255 jump buffer 490 jump table 506 jumping out of a routine 488,496 kind449 ladder sequence 328 LALR( 1) automaton 172 LALR(1) parsing 152, 170, 176, 191,234 lambda lifting 499,559,594 language generatedby a grammar 40 last-def analysis 253 late evaluation 224,228, 328 latency 67 1 lattice 250, 277 L-attributed g r a ~ a 230, r 269,275 lazy Boolean operator 117-1 18,503 lazy evaluation 328,547,560,564,566,575 Ice 389 least fixed point 43 left-associative 153, 540,7 12 left-factoring 129 left-hand side 35 leftmost derivation 36 leftmost innermost redex 566 leftrnost outermost redex 566 left-recursion removal 129 left-recursive 38, 120, 127, 144 Lempel-Ziv compression 86 let-expression 543, 572,594 lex 83,94, 187 lexical analysis module 22 lexical analyzer 10, 61, 68, 83,9344,541 lexical identification phase 98 lexical pointer 483-484,491,493, 514,559 library 25,281,297, 376,658,672 lifetime 463 Linda 664 Linda kernel 678 Linda preprocessor 678
Linda Tuple Space 678 linear-time 31, 86, 131, 168, 224,277,406 linker 25,376, 381, 389 lint 247
list procedure 616, live analysis 262-263,265, 319, 333, 358 live variable 262 LL(1) conflict 127-128 LL(1) grammar 123, 127, 142 LL(1) parser generation 123 LL(1) parsing 121-122,139 LL(2) g r a ~ m a r132, 190 LLgen 132,142, 179,232,237,711 LLgen directive 144 loader 3'76,389 local attribute 203 local variable area 515 location458 lock variable 660,669 longjmp (1 490 look-ahead set 165-166, 190 loop interchange 689 loop r e s t ~ c t u ~ ntransformation g 688 loop unrolling 511 loop-carrieddependence 687 loop-exit list 248 LR automaton 159 LRitern 153, 193 LR reduce item 154 LR shift item 154 n ~ LR(0) g r ~ 162 LR(0) parsing 152-153, 191
machine code generation 374 machine code generation module 24 macro application 103 macrocall103 macro definition 103,444 makefile 32,296 mark and scan 408,420 mark and sweep 420 marked bit 420 marker rule 13 I marking 420 marking phase 420 marshal;ng672 Match iiistruc~ion603,609 matchmove134 matching a goal 599
matching in regular expressions 58 matrix467 maximal basic block 320 meaning of 'to compile' 3 meet operator 255 member35 memoization 285,348, 350,389,436 memory dump 520 memory fragmentation 408,419,427 memory management unit 462 message aggregation 692 message buffer 672 message combining 692 message handler 662 message passing 658,671 message selection 674 method663 method invocation 471,479,512,673 method overriding 472 Miranda593 module523 module initialization 524 monitor 660,669,695 monitor lock 670 most recent binding 607,613 MP1 658 ~ u l t i c o ~ p u t e657 r multiple inheritance 476,533 multiprocessor657 multi-visit attribute evaluation 219, 230 mutual exclusion synchronization 660 name equivalence 454 name generation for modules 524 name list 98,442 name space 442,526,532,675 narrow compiler 26,96,232,269,302,320-321, 444 nested routine 485, 558,597,618 newline 57, 185 non-correcting error recovery 116 non-cyclic attribute grammar 217,275 non-dete~inistic162,362 nondet~r~nistic-polynomial 90 non-local ~ssignment494 non-local goto 482,488,496,5 18 non-local label 488,491,496 n o n " t e ~ i nsymbol ~ 35,39 no-op instruction 379 normal-order reduction 566 NP-complete 89,90,293,320,360,373 N-phase compiler 27 N P D grammar 189 null sequence 367
nullable non-terrninal 38, 121-122, 190 object 471,658 object constructor 471,527,702,708 object destructor 471,527 object identifier 675 object location 675 object migration 676 object replication 677 object-oriented compiler construction 699 objec~-ol~ented parsing 702 occur check 608 offset table 479 offside rule 438,541 OID 675 one-shot garbage collection 408 one-space copying 437 on-the-fly garbage collection 408 operation on an object 663 opti~ization 32,64, 102,403,479,511,646,648 ordered attribute evaluator 224 ordered attribute grammar 224 out operation in Linda 664 outer loop 686 outline code 46 output dependence 687 overloading447 package523 panic-mode error correction 139 parallel loop statement 665 parallel system 656 parameter area 514 parameter passing mechanism5 14 parent class 472,476,481 parent routine 485 parse table 164 parse tree 9,148 parser 36,110,702 parsing 9, 110,540, 616 partial evaluation 297,300, 390,638 partial paramete~zation 488,499 partition (Tuple Space) 679,683 partitioning (of attributes) 222 passing a routine as a parameter 463,487,495,617 pattern matching 543,553, 593 pattern tree 337 PDA 134 peephole optimization 333,371 phase of a compiler 27 PLD preprocessor 104-105, 107-108,710 pointer chasing 677 pointer consistency 410 pointer layout 410
pointer problems 334 pointer (re)subtyping 474 pointer reversal 422 pointer supertyping pointer tagging 588 pointer type 46I pointer validity 410 polymo~hicfunction application 551 polymo~hictype S43 polymorphic type checking551 p o l y m o ~ h i s474 ~ po1yn~)mial time 90 pool of threads 674 port 662 portability32 portability of a compiler 32 position"independent code 716 post-order visit 112 precedence 59, 174 precedence parsing l 52 precomputation 66, 80, 120, 157, 191,285, 346, 469,475 prediction move 134 prediction rule 155 predictionstack 134 predictiveparser 122 predictive recursive descentparser 17, 121 pre-order visit 112 prerequisite to 200,210 priva~i~ation 689 process 658,659 process abstraction 667 production rule 35,39, 196, S47 produ~tiontree 35 program counter 35 1, 395,482,493,662,667 program data area 409,496 program generator 7, 66, 301 program text input module 22, 56 program-counter relative addressing 395 Prolog 43, 296,597,628,630,654 property list 247 pseud(~-register329, 362 pure register machine 304 pure stack machine 303 push-do~nautomaton 13 PVNI 658
rea~writemode 646 receive statement 662 reclamation434 record type 465 recursive descent parser 15, 117 recursive interpreter 28 1 redex 561 reduce item 72, 186,714 reduce move 160 reduce~reduceconflict 162 reducible expression 561 reduction engine S66 reduction (graph) 561 reduction (in parsing) 1S2 reduction stack 159 reference counting 407,415 referential t r ~ s p a r e n ~406, y 544 refine~ent482 register allocation 293,3 16, 332-333, 357 register allocationby graph coloring 320,357 register and variable descriptor 318 register descriptor 3 19 register interference graph 359 register spil~ingtechnique 3 14 register storage class 336 register tracking 320, 333 regi~ter- memo^ machine 327 regular description 56,60
regvar descriptor 318 relation 596,598 releasing a lock 660 relocation380 relocation in~ormation377 repeated inheritance 478 repeated variables S93 repetition operator 58, 72, 124, 142 replacement pattern 371 replicated Tuple Space 683 rest of program, grammar for 116 resume statement 485 resuming a routine 483,4 retargetability 32 return information 483 return list 247 r ~ ~ ~a routine i n g as a value 482,487,495,559 ri~ht-associative 117, 192,551, S93
qualifier557 root set 409,496 reachillg-definitionsana~ysis253 read operation in Linda 664
routine denotation 621
routine type 471 row displacement 87 row-major order 468 rule 596,599 runnable thread 668 ~ n n i n groutine 32,439,482,485 ~ n n i n thread g 668 run~timeerror 5 19 run-time system 25,281,396,499,545,560,659, 667 rvalue 458,462,470,5 14 safe pointer 464 S-attributed gr~nlnar230,235, 275-276 scalar expansion 689 scan phase 420 scan pointer 425 scheduler 668,686 scheduler activation 667 Schorr and Waite algorithm 422,436 languages 442 scope in block~s~uctured scope of a pointer 463,491,532,559 screening98 select statement 696 selection statement 505 self-descriptive chunk 41 1 self-organizing list 102 semantic checking 441 semantic representation 2 send s~a~ement 662 sentence 40 sentential form 35,40 serialization 672 set type 470 Sethi-Ull~annumbering 3 11 s e t jmp () 491 setjmp~on~m mechanism p 490 shadow memory 287 shared variable 658,659, 668 shift item 72, 166, 172, 186 shift move 159 shift-reduce conflict 162 shift-shift conflict 391 sho~-circuitinga function application 573,586 SI-dependency 2 12 signal handler 521, 537 signal operation 660,670 signal statement 521 signa~reof a computer virus 83 simple symbolic interpretation 247,277,712 simulation on the stack 245 single inheritance 476 Single Program Multiple Data parallelism 685 SLR(1) parsing 152,163, 190-191
S~alltalk33 sorted set 399 source language 1 SP 513 SP-2 657 sparse transition table 86 specialization 370 spineless tagless G-machine 592 spinning669 SPMD parallelism 685 SR 674 stack 513 stack, directionof growth 513 stack of input buffers 105 stack pointer 533 stack representation 245 stack segment 376,396 start symbol 35,39, 136, 144, 199 state 80 state ans sit ion 80 static array 470 static attribute evaluation 204, 218 static binding 474 static cycle checking 22 1 static link 484,491,5 14 status indicator 282 storage representationin Linda 680 strength reduction 367 strict ~ g u ~ e 565,574,576, nt 579,594 strictness analysis 576, 594 string39 strong-LL(1) parser 124 strongly LL( 1) 124 strongly non-cyclic 217,275 struct~ra1 equivalence 455 structure type 465 subclass472 subroutine485 subset algorithm 81, 358, 346,352 substitution in grammars 129 subtree 112 Succeed instruction 603,605 suffix grammar 116, 188 supercompilation363 suspend statement 485 suspended routine 439,482,485 switch statement 505 symbol 39 symbol table 98, 133, 188,284,442 symbolic interpretation 245 sync~onizationprimitive 660 synchronous message passing 662 syntactic sugar 540,553 syntax allalysis 9, 57, 110,440
syntax analysis module 22 syntax tree 9, I 1 0, 135,230 synthesized attribute 196 table compression 86 table lookup 68,285,347 tail call 582 target code optimization module 24 target language 1,375,523,549 target register 309 task 33,656 task parallelism 665 TCPDP 671 term 596 terminal 35 t e ~ n aproduction l 40 terminal symbol 3 5 3 9 terminated routine 32,462,485 test-and-set instruction 669 TeX 8 thread 658,662,667 thread control block 667 thread preemption 668 threaded code 297 threading (of an AST) 239,247,276,287 ~ o u g h p u t67 1 token 10,35 token description 61,70,77 tokenize 10 top-down parses 113, 120, 142,232 to-space 408,425 totally-ordered group c o ~ u n i c a t i o n678 trail 610, 634 transition diagram 82, 158, 161, 163 transition function 80 transitive closure 42,498 transitive closure algorithm 44 translation 1 traversing a node 112 traversing a tree 3 12 tree rewriting 290 triple 324,326 tuple matching 664 Tuple Space 658,664,678 Tuple Space server 678 two-scans assembly 380 two-space copying 408,420,425,437,545 type 449 type checking 449,673 type declaration 449,544,55 1 type equivalence 454-455,55 1 type extension 472 type table 450,552 type variable 544,551
typed pointer 462 UDP 671 unary relation 598 unbound variable 599,612,625 undisc~~inated union 467 ,596,607,609,612,634,654-455 in Linda 683 u n i f o distribution ~ uniform self-identifyillgdata sepresentation 28 1 Unify inst~ction603,609 union tag 467 union type 466 u n ~ ~ r s h a l i672 n~ unrolling factor 5 11 unwin~ing566,575,589,718 update replicated copies 678 URL syntax 185 usage count 317,362 useless non-terminal 38 user space 667
validity span 463 value 458 value oC a variable in an assembler 379 variable descriptor 319 vai~able-lengthgraph node 587 vector 467 vector apply node 588,595 very busy expression 277 visiting a node 112 visiting routine 220 vocabulary 39 void type 460 wait operation 660,670 "AM 597,623,636,638,648,653 Warren Abstract Machine 597,653 weight of a subtree 310-311,313-314,316,365, 390 weighted register allocation 3 10, 339, 349 while s ~ a t e ~ e 276,507,509 nt working stack 308,412,461,483,513,515,517 yacc 176, 178,234
yield state~ent485 zeroth element 469,533